Skip to navigation Skip to content
Careers | Phone Book | A - Z Index
Scientific Data Management Research

Houjun Tang

Tang Houjun
Dr. Houjun Tang
Computer Research Scientist
Berkeley Lab
1 Cyclotron Road, 50A-3135
Berkeley, California 94720 us

Houjun Tang (唐厚君) is currently a Computer Research Scientist in the Scientific Data Management Group at Berkeley Lab. His research interests include data management, storage systems, parallel I/O, and high performance computing. Tang received his Ph.D in Computer Science from North Carolina State University in 2016, and a B.Eng in Computer Science from Shenzhen University, China in 2012. He is currently working on projects funded by the DOE Office of Science and Office of Cybersecurity, Energy Security, and Emergency Response.

Google Scholar, ORCID

Journal Articles

M Scot Breitenfeld, Houjun Tang, Huihuo Zheng, Jordan Henderson, Suren Byna, "HDF5 in the Exascale Era: Delivering Efficient and Scalable Parallel I/O for Exascale Applications", The International Journal of High Performance Computing Applications, October 16, 2024, doi: 10.1177/10943420241288244

David McCallen, Arben Pitarka, Houjun Tang, Ramesh Pankajakshan, Anders Petersson, Mamun Miah, "Transformational Regional-Scale Earthquake Simulations with the DOE EarthQuake SIMulation Exascale Framework", Scientific Impact of the Exascale Computing Project (ECP), August 1, 2024, doi: 10.1109/MCSE.2024.3397768

D McCallen, A Pitarka, H Tang, R Pankajakshan, NA Petersson, M Miah, "Transformational Regional-Scale Earthquake Simulations with the DOE EarthQuake SIMulation (EQSIM) Exascale Framework", Computing in Science & Engineering, May 8, 2024, doi: 10.1109/MCSE.2024.3397768

David McCallen, Arben Pitarka, Houjun Tang, Ramesh Pankajakshan, N Anders Petersson, Mamun Miah, Junfei Huang, "Regional-scale fault-to-structure earthquake simulations with the EQSIM framework: Workflow maturation and computational performance on GPU-accelerated exascale platforms", Earthquake Spectra, May 3, 2024, 40(3):1615-1652, doi: 10.1177/87552930241246235

R. Han, M, Zheng, S. Byna, H. Tang, B. Dong, D. Dai, Y. Chen, D. Kim, J. Hassoun, D. Thorsley, M. Wolf, "PROV-IO: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems", IEEE Transactions on Parallel and Distributed Systems, March 14, 2024,

Jean Luca Bez, Houjun Tang, Scot Breitenfeld, Huihuo Zheng, Wei-Keng Liao, Kaiyuan Hou, Zanhua Huang, Suren Byna, "h5bench: Exploring HDF5 Access Patterns Performance in Pre-Exascale Platforms", Concurrency and Computation: Practice and Experience (CCPE), January 31, 2024,

Xiaoxia Zhang, Degang Chen, Hong Yu, Guoyin Wang, Houjun Tang, Kesheng Wu, "Improving nonnegative matrix factorization with advanced graph regularization", Information Sciences, June 1, 2022, 597:125-143, doi: 10.1016/j.ins.2022.03.008

Houjun Tang, Quincey Koziol, John Ravi, and Suren Byna,, "Transparent Asynchronous Parallel I/O using Background Threads", IEEE Transactions on Parallel and Distributed Systems, April 4, 2022, 33, doi: 10.1109/TPDS.2021.3090322

David McCallen, Houjun Tang, Suiwen Wu, Eric Eckert, Junfei Huang, N Anders Petersson, "Coupling of regional geophysics and local soil-structure models in the EQSIM fault-to-structure earthquake simulation framework", The International Journal of High Performance Computing Applications, May 25, 2021, doi: 10.1177/10943420211019118

David McCallen, Anders Petersson, Arthur Rodgers, Arben Pitarka, Mamun Miah, Floriana Petrone, Bjorn Sjogreen, Norman Abrahamson, Houjun Tang, "EQSIM—A multidisciplinary framework for fault-to-structure earthquake simulations on exascale computers part I: Computational models and workflow", Earthquake Spectra, May 1, 2021, 37:707-735, doi: 10.1177/8755293020970982

Suren Byna, M. Scot Breitenfeld, Bin Dong, Quincey Koziol, Elena Pourmal, Dana Robinson, Jerome Soumagne, Houjun Tang, Venkatram Vishwanath, and Richard Warren, "ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems", Journal of Computer Science and Technology 2020, 35(1): 145-160, February 2, 2020, doi: 10.1007/s11390-020-9822-9

Conference Papers

Rajeev Jain, Houjun Tang, Akash Dhruv, Suren Byna, "Enabling Data Reduction for Flash-X Simulations", 10th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD), 2024,

D.K. Sung, Y. Son, A. Sim, K. Wu, S. Byna, H. Tang, H. Eom, C. Kim, S. Kim, "A2FL: Autonomous and Adaptive File Layout in HPC through Real-time Access Pattern Analysis", 38th IEEE International Parallel & Distributed Processing Symposium (IPDPS2024), 2024,

Wei Zhang, Houjun Tang, Suren Byna, "IDIOMS: Index-powered Distributed Object-centric Metadata Search for Scientific Data Management", The 24th IEEE/ACM international Symposium on Cluster, Cloud and Internet Computing. Philadelphia, 2024 (CCGrid 2024), Philadelphia, PA, USA, IEEE, May 9, 2024, doi: 10.1109/CCGrid59990.2024.00072

Daoce Wang, Jesus Pulido, Pascal Grosset, Jiannan Tian, Sian Jin, Houjun Tang, Jean Sexton, Sheng Di, Kai Zhao, Bo Fang, Zarija Lukić, Franck Cappello, James Ahrens, Dingwen Tao, "AMRIC: A novel in situ lossy compression framework for efficient I/O in adaptive mesh refinement applications", SC23: International Conference for High Performance Computing, Networking, Storage and Analysis, November 12, 2023, doi: 10.1145/3581784.3613212

Md Kamal Hossain Chowdhury, Houjun Tang, Jean Luca Bez, Purushotham V. Bangalore, Suren Byna, "Efficient Asynchronous I/O with Request Merging", 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), St. Petersburg, FL, USA, IEEE, 2023, 628-636, doi: 10.1109/IPDPSW59300.2023.00107

John Ravi, Suren Byna, Quincey Koziol, Houjun Tang, Michela Becchi, "Evaluating Asynchronous Parallel I/O on HPC Systems", 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 15, 2023, doi: 10.1109/IPDPS54959.2023.00030

Sian Jin, Dingwen Tao, Houjun Tang, Sheng Di, Suren Byna, Zarija Lukic, Franck Cappello, "Accelerating parallel write via deeply integrating predictive lossy compression with HDF5", SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, November 13, 2022, doi: 10.1109/SC41404.2022.00066

Rajeev Jain, Houjun Tang, Akash Dhruv, J Austin Harris, Suren Byna, "Accelerating flash-x simulations with asynchronous I/O", https://ieeexplore.ieee.org/abstract/document/10026923/, November 13, 2022, doi: 10.1109/PDSW56643.2022.00008

Runzhou Han, Suren Byna, Houjun Tang, Bin Dong, and Mai Zheng,, "PROV-IO: An I/O-Centric Provenance Framework for Scientific Data on HPC Systems", HPDC 2022, June 23, 2022,

Huihuo Zheng, Venkatram Vishwanath, Quincey Koziol, Houjun Tang, John Ravi, John Mainzer, Suren Byna, "HDF5 Cache VOL: Efficient and scalable parallel I/O through caching data on node-local storage", 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), May 16, 2022, doi: 10.1109/CCGrid54584.2022.00015

Houjun Tang, Bing Xie, Suren Byna, Phillip Carns, Quincey Koziol, Sudarsun Kannan, Jay Lofstead, and Sarp Oral,, "SCTuner: An Auto-tuner Addressing Dynamic I/O Needs on Supercomputer I/O Sub-systems", 6th International Parallel Data Systems Workshop (PDSW) 2021, held in conjunction with SC21, November 21, 2021,

Cong Xu, Suparna Bhattacharya, Martin Foltin, Suren Byna, and Paolo Faraboschi, "Data-Aware Storage Tiering for Deep Learning", 6th International Parallel Data Systems Workshop (PDSW) 2021, held in conjunction with SC21, November 21, 2021,

Bing Xie, Houjun Tang, Suren Byna, Jesse Hanley, Quincey Koziol, Tonglin Li, Sarp Oral,, "Battle of the Defaults: Extracting Performance Characteristics of HDF5 under Production Load", CCGrid 2021, May 31, 2021,

Jean Luca Bez, Houjun Tang, Bing Xie, David Williams-Young, Rob Latham, Rob Ross, Sarp Oral, Suren Byna, "I/O Bottleneck Detection and Tuning: Connecting the Dots using Interactive Log Analysis", 2021 IEEE/ACM Sixth International Parallel Data Systems Workshop (PDSW), January 1, 2021, 15-22, doi: 10.1109/PDSW54622.2021.00008

Tonglin Li, Suren Byna, Quincey Koziol, Houjun Tang, Jean Luca Bez, Qiao Kang, "h5bench: HDF5 I/O Kernel Suite for Exercising HPC I/O Patterns", Cray User Group (CUG) 2021, January 1, 2021,

Houjun Tang, Suren Byna, Bin Dong, Quincey Koziol, "Parallel Query Service for Object-centric Data Management Systems", 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE, May 18, 2020, 406-415,

Richard Warren, Jerome Soumagne, Jingqing Mu, Houjun Tang, Suren Byna, Bin Dong, Quincey Koziol, "Analysis in the Data Path of an Object-centric Data Management System", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 18, 2019,

Houjun Tang, Suren Byna, Stephen Bailey, Zarija Lukic, Jialin Liu, Quincey Koziol, Bin Dong, "Tuning Object-centric Data Management Systems for Large Scale Scientific Applications", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 18, 2019,

Wei Zhang, Suren Byna, Houjun Tang, Brody Williams, Yong Chen, "MIQS: Metadata Indexing and erying Service for Self-Describing File Formats", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), November 19, 2019,

Houjun Tang, Quincey Koziol, Suren Byna, John Mainzer, Tonglin Li, "Enabling Transparent Asynchronous I/O using Background Threads", 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW 2019), November 19, 2019, doi: DOI 10.1109/PDSW49588.2019.00006

Tonglin Li, Quincey Koziol, Houjun Tang, Jialin Liu, Suren Byna, "I/O Performance Analysis of Science Applications Using HDF5 File-level Provenance", Cray User Group (CUG) 2019, May 10, 2019,

Jingqing Mu, Jerome Soumagne, Suren Byna, Quincey Koziol, Houjun Tang, Richard Warren, "Interfacing HDF5 with A Scalable Object-centric Storage System on Hierarchical Storage", Cray User Group (CUG) 2019, May 7, 2019,

Bin Dong, Kesheng Wu, Suren Byna, Houjun Tang, "SLOPE: Structural Locality-Aware Programming Model for Composing Array Data Analysis", International Conference on High Performance Computing, January 1, 2019, 61--80,

Jialin Liu, Quincey Koziol, Gregory Butler, Neil Fortner, Mohamad Chaarawi, Houjun Tang, Suren Byna, Glenn Lockwood, Ravi Cheema, Kristy Kallback-Rose, Damian Hazen, Prabhat, "Evaluation of HPC Application I/O on Object Storage Systems", 3rd Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems (PDSW-DISCS), November 12, 2018,

Wei Zhang, Houjun Tang, Suren Byna, Yong Chen, "DART: Distributed Adaptive Radix Tree for Efficient Affix-based Keyword Search on HPC Systems", Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, November 1, 2018, 24,

Kimmy Mu, Jerome Soumagne, Houjun Tang, Suren Byna, Quincey Koziol, Richard Warren, "A Server-managed Transparent Object Storage Abstraction for HPC", 2018 IEEE International Conference on Cluster Computing (CLUSTER), September 10, 2018,

Teng Wang, Suren Byna, Bin Dong, and Houjun Tang, "UniviStor: Integrated Hierarchical and Distributed Storage for HPC", IEEE Cluster 2018., September 1, 2018,

Houjun Tang, Suren Byna, Francois Tessier, Teng Wang, Bin Dong, Jingqing Mu, Quincey Koziol, Jerome Soumagne, Venkatram Vishwanath, Jialin Liu, and Richard Warren, "Toward Scalable and Asynchronous Object-centric Data Management for HPC", 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) 2018, May 1, 2018,

Bin Dong, Teng Wang, Houjun Tang, Quincey Koziol, Kesheng Wu, Suren Byna, "ARCHIE: Data analysis acceleration with array caching in hierarchical storage", 2018 IEEE International Conference on Big Data (Big Data), January 1, 2018, 211--220,

Houjun Tang, Suren Byna, Bin Dong, Jialin Liu, and Quincey Koziol, "SoMeta: Scalable Object-centric Metadata Management for High Performance Computing", IEEE Cluster 2017, September 5, 2017,

Jialin Liu, Quincey Koziol, Houjun Tang, François Tessier, Wahid Bhimji, Brandon Cook, Brian Austin, Suren Byna, Bhupender Thakur, Glenn Lockwood, Jack Deslippe, Prabhat, "Understanding the I/O Performance Gap Between Cori KNL and Haswell", Cray User Group Conference 2017 (CUG 2017), May 1, 2017,

Wenzhao Zhang, Houjun Tang, Xiaocheng Zou, Steven Harenberg, Qing Liu, Scott Klasky, Nagiza F Samatova, "Exploring Memory Hierarchy to Improve Scientific Data Read Performance", 2015 IEEE International Conference on Cluster Computing, 2016, 66--69,

Wenzhao Zhang, Houjun Tang, Stephen Ranshous, Surendra Byna, Daniel F Mart\ \in, Kesheng Wu, Bin Dong, Scott Klasky, Nagiza F Samatova, "Exploring memory hierarchy and network topology for runtime AMR data sharing across scientific applications", 2016 IEEE International Conference on Big Data (Big Data), January 1, 2016, 1359--1366,

Xiaocheng Zou, David A Boyuka II, Dhara Desai, Daniel F Martin, Suren Byna, Kesheng Wu, "AMR-aware in situ indexing and scalable querying", Proceedings of the 24th High Performance Computing Symposium, January 1, 2016, 26,

Wenzhao Zhang, Houjun Tang, Steve Harenberg, Surendra Byna, Xiaocheng Zou, Dharshi Devendran, Daniel F Martin, Kesheng Wu, Bin Dong, Scott Klasky, others, "Amrzone: A runtime amr data sharing framework for scientific applications", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), January 1, 2016, 116--125,

Houjun Tang, Suren Byna, Steve Harenberg, Xiaocheng Zou, Wenzhao Zhang, Kesheng Wu, Bin Dong, Oliver Rubel, Kristofer Bouchard, Scott Klasky, others, "Usage pattern-driven dynamic data layout reorganization", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), January 1, 2016, 356--365,

Houjun Tang, Suren Byna, Steve Harenberg, Wenzhao Zhang, Xiaocheng Zou, Daniel F Martin, Bin Dong, Dharshi Devendran, Kesheng Wu, David Trebotich, others, "In situ storage layout optimization for amr spatio-temporal read accesses", 2016 45th International Conference on Parallel Processing (ICPP), January 1, 2016, 406--415,

Xiaocheng Zou, Kesheng Wu, David A. Boyuka, Daniel F. Martin, Suren Byna, Houjun, Kushal Bansal, Terry J. Ligocki, Hans Johansen, and Nagiza F. Samatova, "Parallel In Situ Detection of Connected Components Adaptive Mesh Refinement Data", Proceedings of the Cluster, Cloud and Grid Computing (CCGrid) 2015, 2015,

David A Boyuka II, Houjun Tang, Kushal Bansal, Xiaocheng Zou, Scott Klasky, Nagiza F Samatova, "The hyperdyadic index and generalized indexing and query with PIQUE", Proceedings of the 27th International Conference on Scientific and Statistical Database Management, 2015, 20,

Xiaocheng Zou, Sriram Lakshminarasimhan, David A Boyuka II, Stephen Ranshous, Houjun Tang, Scott Klasky, Nagiza F Samatova, "Fast set intersection through run-time bitmap construction over pfordelta-compressed indexes", European Conference on Parallel Processing, 2014, 668--679,

Houjun Tang, Xiaocheng Zou, John Jenkins, David A Boyuka II, Stephen Ranshous, Dries Kimpe, Scott Klasky, Nagiza F Samatova, "Improving read performance with online access pattern analysis and prefetching", European Conference on Parallel Processing, 2014, 246--257,

John Jenkins, Xiaocheng Zou, Houjun Tang, Dries Kimpe, Robert Ross, Nagiza F Samatova, "Radar: Runtime asymmetric data-access driven scientific data replication", International Supercomputing Conference, 2014, 296--313,

Eric R Schendel, Steve Harenberg, Houjun Tang, Venkatram Vishwanath, Michael E Papka, Nagiza F Samatova, "A generic high-performance method for deinterleaving scientific data", European Conference on Parallel Processing, 2013, 571--582,

Presentation/Talks

Suren Byna, Houjun Tang, and Quincey Koziol,, Automatic and Transparent Scientific Data Management with Object Abstractions, PASC 2021, in a Minisymposium on "Data Movement Orchestration on HPC Systems", July 31, 2021,

Suren Byna, Quincey Koziol, Venkatram Vishwanath, Jerome Soumagne, Houjun Tang, Kimmy Mu, Richard Warren, François Tessier, Bin Dong, Teng Wang, and Jialin Liu, Proactive Data Containers (PDC): An object-centric data store for large-scale computing systems, AGU Fall Meeting, December 13, 2018,