Wei Zhang

Wei Zhang

Postdoctoral Researcher

Scientific Data Division

Lawrence Berkeley National Lab

Dr. Wei Zhang (张威) is a Computer Science Researcher at Lawrence Berkeley National Laboratory (LBNL), specializing in advancing data management for scientific applications in heterogeneous computing environments. His research focuses on bridging the gap between high-performance computing (HPC) and artificial intelligence (AI) by developing innovative solutions for managing and discovering large-scale scientific data. Dr. Zhang's key contributions include:

AI & HPC I/O Optimization: Most recently, he designed a Rust-based NDArray object store and I/O-aware DataLoader that remove storage bottlenecks for ensemble GNN training, boosting throughput by 31–135× and virtually eliminating I/O stalls on DOE leadership machines.
Scientific Metadata Discovery: Through DART, MIQS, and IDIOM, he built ultra-fast, distributed indexes for self-describing data, accelerating prefix and semantic searches by four to five orders of magnitude and making petascale datasets instantly discoverable on supercomputers like Perlmutter.
Graph-Based Metadata Management: He introduced GraphMeta and follow-up partitioning algorithms (AKIN, IOGP) that scale rich metadata graphs across distributed storage, providing the groundwork for efficient organization and retrieval on modern HPC systems.
Activeness-Based Data Retention: His ActiveDR framework uses user-activeness metrics to guide tiering and eviction, reducing unnecessary file recalls by 37 % and doubling the amount of “useful” data retained in large scientific archives.

Currently, Dr. Zhang is leading research initiatives across LBNL, OSU, and TTU on accelerating AI-powered data search, and LLM/RAG-powered scientific data discovery, etc.

Prior to joining LBNL, Dr. Zhang held positions as a Senior Member of Technical Staff at Oracle Corporation and a Research Assistant at Texas Tech University. He has authored numerous publications in top-tier conferences and journals, including SC, PACT, CCGRID, and IEEE TPDS. He actively serves as invited paper reviewer or program committee members in prestigious journals/conferences like TPDS, SC, IPDPS, CCGrid, and HiPC.

Dr. Zhang obtained his Ph.D. in Computer Science from Texas Tech University and his BSc in Computer Science from Hebei University of Science and Technology. With his strong expertise in data management, HPC, and AI, he is committed to advancing scientific computing infrastructure to support groundbreaking research and discovery.

Journal Articles

Naizhuo Zhao, Guofeng Cao, Wei Zhang, Eric L. Samson, Yong Chen, "A comparison study between nighttime lights and location-based social media at the 500 m spatial resolution", International Journal of Applied Earth Observation and Geoinformation, May 1, 2020, 87,

Dong Dai, Yong Chen, Philip Carns, John Jenkins, Wei Zhang, Robert Ross, "Managing Rich Metadata in High-Performance Computing Systems Using a Graph Model", IEEE Transactions on Parallel and Distributed Systems, July 1, 2019, 30, doi: 10.1109/TPDS.2018.2887380

Naizhuo Zhao, Guofeng Cao, Wei Zhang, Eric L. Samson, "Tweets or nighttime lights: Comparison for preeminence in estimating socioeconomic factors", ISPRS Journal of Photogrammetry and Remote Sensing, December 1, 2018, 146:1-10,

Naizhuo Zhao, Wei Zhang, Ying Liu, Eric L. Samson, Yong Chen, Guofeng Cao, "Improving Nighttime Light Imagery With Location-Based Social Media Data", IEEE Transactions on Geoscience and Remote Sensing, October 24, 2018, 2161, doi: 10.1109/TGRS.2018.2871788

Conference Papers

Wei Zhang, Khaled Ibrahim, Suren Byna, "Optimizing Distributed Object Storage I/O for Large-scale Parallel GNN Training on Atomistic Graphs", UnderReview, July 11, 2025,

Suben Kumar Saha, Houjun Tang, Wei Zhang, Suren Byna, "Distributed Metadata Querying on HPC Systems", Under Review, July 10, 2025,

Chenxu Niu, Wei Zhang, Yongjian Zhao, Yong Chen, "Energy Efficient or Exhaustive? Benchmarking Power Consumption of LLM Inference Engines", HotCarbon Workshop on Sustainable Computer Systems 2025, July 10, 2025,

Chenxu Niu, Wei Zhang, Mert Side, Yong Chen, "ICEAGE: Intelligent Contextual Exploration and Answer Generation Engine for Scientific Data Discovery", 37th International Conference on Scalable Scientific Data Management, June 23, 2025,

Hyunju Oh, Wei Zhang, Christopher D. Rickett, Sreenivas R. Sukumar, Suren Byna, "Evaluating Performance Trade-offs of Caching Strategies for AI-Powered Querying Systems", 2024 IEEE International Conference on Big Data (IEEE BigData 2024), Washington DC, USA, 2024, doi: 10.1109/BigData62323.2024.10825819

Download File: Evaluating_Performance_Trade-offs_of_Caching_Strategies_for_AI-Powered_Querying_Systems.pdf (pdf: 6.6 MB)

With the rapid growth of accumulated data from

various scientific domains, traditional data management systems

face challenges in supporting complicated queries, such as pattern

search, on massive amounts of data. To serve sophisticated

queries through capturing precise features from data, recent

data management systems seek to use artificial intelligence

(AI) within the querying process. However, the characteristic

of AI inference workflow within the querying process, such as

intensive computation and expensive requirements for computing

resources, becomes a bottleneck of the AI-powered query systems.

In this paper, we provide a generalization of AI inference

workflow in the context of AI-powered data discovery and we

introduce three different caching strategies corresponding to

each stage in the AI inference workflow. We provide in-depth

performance evaluation on the impact of these caching strategies

through a series of strong scaling experiments. Our experimental

results show that the AI-powered data querying performance can

be significantly improved by applying different caching strategies.

Wei Zhang, Houjun Tang, Suren Byna, "BULKI - Binary Unified Layout for Key-value Interchange", 9th International Parallel Data Systems Workshop (PDSW), 2024,

Wei Zhang, Houjun Tang, Suren Byna, "IDIOMS: Index-powered Distributed Object-centric Metadata Search for Scientific Data Management", The 24th IEEE/ACM international Symposium on Cluster, Cloud and Internet Computing. Philadelphia, 2024 (CCGrid 2024), Philadelphia, PA, USA, IEEE, May 9, 2024, doi: 10.1109/CCGrid59990.2024.00072

Download File: 956600a598.pdf (pdf: 782 KB)

Chenxu Niu, Wei Zhang, Suren Byna, Yong Chen, "PSQS: Parallel Semantic Querying Service for Self-describing File Formats", 2023 IEEE International Conference on Big Data (BigData), December 1, 2023, doi: 10.1109/BigData59044.2023.10386205

Chenxu Niu, Wei Zhang, Suren Byna, Yong Chen, "Kv2vec: A Distributed Representation Method for Key-value Pairs from Metadata Attributes", 2022 IEEE Conference on High Performance Extreme Computing (HPEC), September 19, 2022, doi: 10.1109/HPEC55821.2022.9926389

Wei Zhang, Suren Byna, Hyogi Sim, Sangkeun Lee, Sudharshan Vazhkudai, and Yong Chen,, "Exploiting User Activeness for Data Retention in HPC Systems", International Conference for High Performance Computing, Networking, Storage and Analysis (SC '21), November 21, 2021, doi: https://doi.org/10.1145/3458817.3476201

Download File: 3458817.3476201-2.pdf (pdf: 1.5 MB)

Wei Zhang, Suren Byna, Chenxu Niu, Yong Chen, "Exploring Metadata Search Essentials for Scientific Data Management", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 17, 2019,

Wei Zhang, Suren Byna, Houjun Tang, Brody Williams, Yong Chen, "MIQS: Metadata Indexing and erying Service for Self-Describing File Formats", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), November 19, 2019,

Download File: 3295500.3356146.pdf (pdf: 1 MB)

Wei Zhang, Houjun Tang, Suren Byna, Yong Chen, "DART: Distributed Adaptive Radix Tree for Efficient Affix-based Keyword Search on HPC Systems", Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, November 1, 2018, 24,

Download File: 3243176.3243207.pdf (pdf: 1.1 MB)

Wei Zhang, Yong Chen, Dong Dai, "AKIN: A Streaming Graph Partitioning Algorithm for Distributed Graph Storage Systems", 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), May 4, 2018,

Download File: AKIN.pdf (pdf: 314 KB)

Dong Dai, Wei Zhang, Yong Chen, "IOGP: An Incremental Online Graph Partitioning Algorithm for Distributed Graph Databases", HPDC '17: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, June 26, 2017,

Dong Dai, Yong Chen, Philip Carns, John Jenkins, Wei Zhang, Robert Ross, "GraphMeta: A Graph-Based Engine for Managing Large-Scale HPC Rich Metadata", 2016 IEEE International Conference on Cluster Computing, December 12, 2016, doi: 10.1109/CLUSTER.2016.50

Thesis/Dissertations

Efficient scientific data discovery over self-describing file formats, Wei Zhang, June 1, 2021,

Posters

Wei Zhang, Yong Chen, "Activeness-Based Data Retention Recommender for HPC Facilities", SC '20 ACM Graduate Student Research Competition Poster, November 20, 2020,

Wei Zhang, Yong Chen, "Efficient Metadata Search for Scientific Data", SC '20 Doctoral Showcase Poster, November 18, 2020,

Chenxu Niu, Wei Zhang, Suren Byna, Yong Chen, "Semantic Search for Self-Describing Scientific Data Formats", SC '20 Research Poster, November 18, 2020,

Wei Zhang, Houjun Tang, Suren Byna, Yong Chen, "Distributed Adaptive Radix Tree for Efficient Metadata Search on HPC Systems", SC '18 Research Poster, November 21, 2018,

Others

Wei Zhang, Software Release: ActiveDR v1.0.6, August 7, 2021, doi: 10.5281/zenodo.5168853

Suren Byna, Quincey Koziol, Houjun Tang, Wei Zhang, Yong Chen, Abstract: Searching metadata stored in self-describing file formats efficiently, December 1, 2020,

Wei Zhang, Suren Byna, Yong Chen, Software Release: MIQS v0.6, National Science Foundation, USDOE, National Science Foundation, and National Science Foundation, August 26, 2020, doi: 10.11578/dc.20210323.1

Wei Zhang

Journal Articles

Naizhuo Zhao, Guofeng Cao, Wei Zhang, Eric L. Samson, Yong Chen, "A comparison study between nighttime lights and location-based social media at the 500 m spatial resolution", International Journal of Applied Earth Observation and Geoinformation, May 1, 2020, 87,

Dong Dai, Yong Chen, Philip Carns, John Jenkins, Wei Zhang, Robert Ross, "Managing Rich Metadata in High-Performance Computing Systems Using a Graph Model", IEEE Transactions on Parallel and Distributed Systems, July 1, 2019, 30, doi: 10.1109/TPDS.2018.2887380

Naizhuo Zhao, Guofeng Cao, Wei Zhang, Eric L. Samson, "Tweets or nighttime lights: Comparison for preeminence in estimating socioeconomic factors", ISPRS Journal of Photogrammetry and Remote Sensing, December 1, 2018, 146:1-10,

Naizhuo Zhao, Wei Zhang, Ying Liu, Eric L. Samson, Yong Chen, Guofeng Cao, "Improving Nighttime Light Imagery With Location-Based Social Media Data", IEEE Transactions on Geoscience and Remote Sensing, October 24, 2018, 2161, doi: 10.1109/TGRS.2018.2871788

Conference Papers

Wei Zhang, Khaled Ibrahim, Suren Byna, "Optimizing Distributed Object Storage I/O for Large-scale Parallel GNN Training on Atomistic Graphs", UnderReview, July 11, 2025,

Suben Kumar Saha, Houjun Tang, Wei Zhang, Suren Byna, "Distributed Metadata Querying on HPC Systems", Under Review, July 10, 2025,

Chenxu Niu, Wei Zhang, Yongjian Zhao, Yong Chen, "Energy Efficient or Exhaustive? Benchmarking Power Consumption of LLM Inference Engines", HotCarbon Workshop on Sustainable Computer Systems 2025, July 10, 2025,

Chenxu Niu, Wei Zhang, Mert Side, Yong Chen, "ICEAGE: Intelligent Contextual Exploration and Answer Generation Engine for Scientific Data Discovery", 37th International Conference on Scalable Scientific Data Management, June 23, 2025,

Wei Zhang, Houjun Tang, Suren Byna, "BULKI - Binary Unified Layout for Key-value Interchange", 9th International Parallel Data Systems Workshop (PDSW), 2024,

Chenxu Niu, Wei Zhang, Suren Byna, Yong Chen, "PSQS: Parallel Semantic Querying Service for Self-describing File Formats", 2023 IEEE International Conference on Big Data (BigData), December 1, 2023, doi: 10.1109/BigData59044.2023.10386205

Chenxu Niu, Wei Zhang, Suren Byna, Yong Chen, "Kv2vec: A Distributed Representation Method for Key-value Pairs from Metadata Attributes", 2022 IEEE Conference on High Performance Extreme Computing (HPEC), September 19, 2022, doi: 10.1109/HPEC55821.2022.9926389

Wei Zhang, Suren Byna, Chenxu Niu, Yong Chen, "Exploring Metadata Search Essentials for Scientific Data Management", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 17, 2019,

Wei Zhang, Suren Byna, Houjun Tang, Brody Williams, Yong Chen, "MIQS: Metadata Indexing and erying Service for Self-Describing File Formats", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), November 19, 2019,

Wei Zhang, Houjun Tang, Suren Byna, Yong Chen, "DART: Distributed Adaptive Radix Tree for Efficient Affix-based Keyword Search on HPC Systems", Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, November 1, 2018, 24,

Wei Zhang, Yong Chen, Dong Dai, "AKIN: A Streaming Graph Partitioning Algorithm for Distributed Graph Storage Systems", 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), May 4, 2018,

Dong Dai, Wei Zhang, Yong Chen, "IOGP: An Incremental Online Graph Partitioning Algorithm for Distributed Graph Databases", HPDC '17: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, June 26, 2017,

Dong Dai, Yong Chen, Philip Carns, John Jenkins, Wei Zhang, Robert Ross, "GraphMeta: A Graph-Based Engine for Managing Large-Scale HPC Rich Metadata", 2016 IEEE International Conference on Cluster Computing, December 12, 2016, doi: 10.1109/CLUSTER.2016.50

Thesis/Dissertations

Efficient scientific data discovery over self-describing file formats, Wei Zhang, June 1, 2021,

Posters

Wei Zhang, Yong Chen, "Activeness-Based Data Retention Recommender for HPC Facilities", SC '20 ACM Graduate Student Research Competition Poster, November 20, 2020,

Wei Zhang, Yong Chen, "Efficient Metadata Search for Scientific Data", SC '20 Doctoral Showcase Poster, November 18, 2020,

Chenxu Niu, Wei Zhang, Suren Byna, Yong Chen, "Semantic Search for Self-Describing Scientific Data Formats", SC '20 Research Poster, November 18, 2020,

Wei Zhang, Houjun Tang, Suren Byna, Yong Chen, "Distributed Adaptive Radix Tree for Efficient Metadata Search on HPC Systems", SC '18 Research Poster, November 21, 2018,

Others

Wei Zhang, Software Release: ActiveDR v1.0.6, August 7, 2021, doi: 10.5281/zenodo.5168853

Suren Byna, Quincey Koziol, Houjun Tang, Wei Zhang, Yong Chen, Abstract: Searching metadata stored in self-describing file formats efficiently, December 1, 2020,

Wei Zhang, Suren Byna, Yong Chen, Software Release: MIQS v0.6, National Science Foundation, USDOE, National Science Foundation, and National Science Foundation, August 26, 2020, doi: 10.11578/dc.20210323.1

Wei Zhang, Patent: Method and Apparatus for Automatic Generation of API Interface, January 27, 2016,

Naizhuo Zhao, Guofeng Cao, Wei Zhang, Eric L. Samson, Yong Chen, "A comparison study between nighttime lights and location-based social media at the 500 m spatial resolution", International Journal of Applied Earth Observation and Geoinformation, May 1, 2020, 87,