Skip to navigation Skip to content
Careers | Phone Book | A - Z Index
Scientific Data Management Research

Wei Zhang

IMG 2701 1080p square
Wei Zhang
Postdoctoral Researcher
Scientific Data Division
Lawrence Berkeley National Lab

[OnlineCV]

Dr. Wei Zhang (张威) is a Computer Science Researcher at Lawrence Berkeley National Laboratory (LBNL), specializing in advancing data management for scientific applications in heterogeneous computing environments. His research focuses on bridging the gap between high-performance computing (HPC) and artificial intelligence (AI) by developing innovative solutions for managing and discovering large-scale scientific data. Dr. Zhang's key contributions include:

  1. Graph-Based Metadata Management: His work on GraphMeta, IOGP, and AKIN has laid the foundation for efficient metadata organization and retrieval in complex scientific computing environments.
  2. Scientific Data Discovery: He has made substantial advancements through projects like DART, MIQS, and IDIOMS, significantly improving metadata indexing and querying in parallel object-centric storage environments.
  3. Activeness-Based Data Retention: His novel approach, ActiveDR, optimizes storage based on user activity and access patterns, addressing long-term storage challenges in data-intensive, heterogeneous environments.

Currently, Dr. Zhang is leading research initiatives at LBNL on I/O optimization for GNN training, accelerating AI-powered data search, and LLM/RAG-powered scientific data discovery. 

Prior to joining LBNL, Dr. Zhang held positions as a Senior Member of Technical Staff at Oracle Corporation and a Research Assistant at Texas Tech University. He has authored numerous publications in top-tier conferences and journals, including SC, PACT, CCGRID, and IEEE TPDS. He actively serves as invited paper reviewer or program committee members in prestigious journals/conferences like TPDS, SC, IPDPS, CCGrid, and HiPC.

Dr. Zhang obtained his Ph.D. in Computer Science from Texas Tech University and his BSc in Computer Science from Hebei University of Science and Technology. With his strong expertise in data management, HPC, and AI, he is committed to advancing scientific computing infrastructure to support groundbreaking research and discovery.

Journal Articles

Naizhuo Zhao, Guofeng Cao, Wei Zhang, Eric L. Samson, Yong Chen, "A comparison study between nighttime lights and location-based social media at the 500 m spatial resolution", International Journal of Applied Earth Observation and Geoinformation, May 1, 2020, 87,

Dong Dai, Yong Chen, Philip Carns, John Jenkins, Wei Zhang, Robert Ross, "Managing Rich Metadata in High-Performance Computing Systems Using a Graph Model", IEEE Transactions on Parallel and Distributed Systems, July 1, 2019, 30, doi: 10.1109/TPDS.2018.2887380

Naizhuo Zhao, Guofeng Cao, Wei Zhang, Eric L. Samson, "Tweets or nighttime lights: Comparison for preeminence in estimating socioeconomic factors", ISPRS Journal of Photogrammetry and Remote Sensing, December 1, 2018, 146:1-10,

Naizhuo Zhao, Wei Zhang, Ying Liu, Eric L. Samson, Yong Chen, Guofeng Cao, "Improving Nighttime Light Imagery With Location-Based Social Media Data", IEEE Transactions on Geoscience and Remote Sensing, October 24, 2018, 2161, doi: 10.1109/TGRS.2018.2871788

Conference Papers

Hyunju Oh, Wei Zhang, Christopher D. Rickett, Sreenivas R. Sukumar, Suren Byna, "Evaluating Performance Trade-offs of Caching Strategies for AI-Powered Querying Systems", 2024 IEEE International Conference on Big Data (IEEE BigData 2024), Washington DC, USA, 2024,

Camera-ready in preparation 

Wei Zhang, Houjun Tang, Suren Byna, "BULKI - Binary Unified Layout for Key-value Interchange", 9th International Parallel Data Systems Workshop (PDSW), 2024,

Wei Zhang, Houjun Tang, Suren Byna, "IDIOMS: Index-powered Distributed Object-centric Metadata Search for Scientific Data Management", The 24th IEEE/ACM international Symposium on Cluster, Cloud and Internet Computing. Philadelphia, 2024 (CCGrid 2024), Philadelphia, PA, USA, IEEE, May 9, 2024, doi: 10.1109/CCGrid59990.2024.00072

Chenxu Niu, Wei Zhang, Suren Byna, Yong Chen, "PSQS: Parallel Semantic Querying Service for Self-describing File Formats", 2023 IEEE International Conference on Big Data (BigData), December 1, 2023, doi: 10.1109/BigData59044.2023.10386205

Chenxu Niu, Wei Zhang, Suren Byna, Yong Chen, "Kv2vec: A Distributed Representation Method for Key-value Pairs from Metadata Attributes", 2022 IEEE Conference on High Performance Extreme Computing (HPEC), September 19, 2022, doi: 10.1109/HPEC55821.2022.9926389

Wei Zhang, Suren Byna, Hyogi Sim, Sangkeun Lee, Sudharshan Vazhkudai, and Yong Chen,, "Exploiting User Activeness for Data Retention in HPC Systems", International Conference for High Performance Computing, Networking, Storage and Analysis (SC '21), November 21, 2021, doi: https://doi.org/10.1145/3458817.3476201

Wei Zhang, Suren Byna, Chenxu Niu, Yong Chen, "Exploring Metadata Search Essentials for Scientific Data Management", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 17, 2019,

Wei Zhang, Suren Byna, Houjun Tang, Brody Williams, Yong Chen, "MIQS: Metadata Indexing and erying Service for Self-Describing File Formats", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), November 19, 2019,

Wei Zhang, Houjun Tang, Suren Byna, Yong Chen, "DART: Distributed Adaptive Radix Tree for Efficient Affix-based Keyword Search on HPC Systems", Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, November 1, 2018, 24,

Wei Zhang, Yong Chen, Dong Dai, "AKIN: A Streaming Graph Partitioning Algorithm for Distributed Graph Storage Systems", 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), May 4, 2018,

Dong Dai, Wei Zhang, Yong Chen, "IOGP: An Incremental Online Graph Partitioning Algorithm for Distributed Graph Databases", HPDC '17: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, June 26, 2017,

Dong Dai, Yong Chen, Philip Carns, John Jenkins, Wei Zhang, Robert Ross, "GraphMeta: A Graph-Based Engine for Managing Large-Scale HPC Rich Metadata", 2016 IEEE International Conference on Cluster Computing, December 12, 2016, doi: 10.1109/CLUSTER.2016.50

Thesis/Dissertations

Posters

Wei Zhang, Yong Chen, "Activeness-Based Data Retention Recommender for HPC Facilities", SC '20 ACM Graduate Student Research Competition Poster, November 20, 2020,

Wei Zhang, Yong Chen, "Efficient Metadata Search for Scientific Data", SC '20 Doctoral Showcase Poster, November 18, 2020,

Chenxu Niu, Wei Zhang, Suren Byna, Yong Chen, "Semantic Search for Self-Describing Scientific Data Formats", SC '20 Research Poster, November 18, 2020,

Wei Zhang, Houjun Tang, Suren Byna, Yong Chen, "Distributed Adaptive Radix Tree for Efficient Metadata Search on HPC Systems", SC '18 Research Poster, November 21, 2018,

Others

Suren Byna, Quincey Koziol, Houjun Tang, Wei Zhang, Yong Chen, Abstract: Searching metadata stored in self-describing file formats efficiently, December 1, 2020,

Wei Zhang, Suren Byna, Yong Chen, Software Release: MIQS v0.6, National Science Foundation, USDOE, National Science Foundation, and National Science Foundation, August 26, 2020, doi: 10.11578/dc.20210323.1