Skip to navigation Skip to content
Careers | Phone Book | A - Z Index
Scientific Data Management Research

Houjun Tang

Tang Houjun
Houjun Tang
Computer Research Scientist
Berkeley Lab
1 Cyclotron Road, 50B-3245A
Berkeley, California 94720 us

Houjun Tang (唐厚君) is currently a Computer Research Scientist in the Scientific Data Management Group at Berkeley Lab. His research interests include data management, storage systems, parallel I/O, and high performance computing. Tang received his Ph.D in Computer Science from North Carolina State University in 2016, and a B.Eng in Computer Science from Shenzhen University, China in 2012. The projects that he is currently working on include: ECP-EQSIM: High-Performance, Multidisciplinary Simulations for Regional-Scale Earthquake Hazard/Risk Assessments, ECP-ExaIO: Advancing HPC I/O to Enable Scientific Discovery, and PDC: Proactive Data Containers for next generation HPC storage.

Link to my Google Scholar page.

Journal Articles

Jean Luca Bez, Houjun Tang, Scot Breitenfeld, Huihuo Zheng, Wei-Keng Liao, Kaiyuan Hou, Zanhua Huang, Suren Byna, "h5bench: Exploring HDF5 Access Patterns Performance in Pre-Exascale Platforms", Concurrency and Computation: Practice and Experience (CCPE), January 31, 2024,

Houjun Tang, Quincey Koziol, John Ravi, and Suren Byna,, "Transparent Asynchronous Parallel I/O using Background Threads", IEEE Transactions on Parallel and Distributed Systems, April 4, 2022, 33, doi: 10.1109/TPDS.2021.3090322

David McCallen, Houjun Tang, Suiwen Wu, Eric Eckert, Junfei Huang, N Anders Petersson, "Coupling of regional geophysics and local soil-structure models in the EQSIM fault-to-structure earthquake simulation framework", The International Journal of High Performance Computing Applications, May 25, 2021, doi: 10.1177/10943420211019118

David McCallen, Anders Petersson, Arthur Rodgers, Arben Pitarka, Mamun Miah, Floriana Petrone, Bjorn Sjogreen, Norman Abrahamson, Houjun Tang, "EQSIM—A multidisciplinary framework for fault-to-structure earthquake simulations on exascale computers part I: Computational models and workflow", Earthquake Spectra, May 1, 2021, 37:707-735, doi: 10.1177/8755293020970982

Suren Byna, M. Scot Breitenfeld, Bin Dong, Quincey Koziol, Elena Pourmal, Dana Robinson, Jerome Soumagne, Houjun Tang, Venkatram Vishwanath, and Richard Warren, "ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems", Journal of Computer Science and Technology 2020, 35(1): 145-160, February 2, 2020, doi: 10.1007/s11390-020-9822-9

Conference Papers

D.K. Sung, Y. Son, A. Sim, K. Wu, S. Byna, H. Tang, H. Eom, C. Kim, S. Kim, "A2FL: Autonomous and Adaptive File Layout in HPC through Real-time Access Pattern Analysis", 38th IEEE International Parallel & Distributed Processing Symposium (IPDPS2024), 2024,

Wei Zhang, Houjun Tang, Suren Byna, "IDIOMS: Index-powered Distributed Object-centric Metadata Search for Scientific Data Management", The 24th IEEE/ACM international Symposium on Cluster, Cloud and Internet Computing. Philadelphia, 2024 (CCGrid 2024), May 9, 2024,

Cong Xu, Suparna Bhattacharya, Martin Foltin, Suren Byna, and Paolo Faraboschi, "Data-Aware Storage Tiering for Deep Learning", 6th International Parallel Data Systems Workshop (PDSW) 2021, held in conjunction with SC21, November 21, 2021,

Houjun Tang, Bing Xie, Suren Byna, Phillip Carns, Quincey Koziol, Sudarsun Kannan, Jay Lofstead, and Sarp Oral,, "SCTuner: An Auto-tuner Addressing Dynamic I/O Needs on Supercomputer I/O Sub-systems", 6th International Parallel Data Systems Workshop (PDSW) 2021, held in conjunction with SC21, November 21, 2021,

Bing Xie, Houjun Tang, Suren Byna, Jesse Hanley, Quincey Koziol, Tonglin Li, Sarp Oral,, "Battle of the Defaults: Extracting Performance Characteristics of HDF5 under Production Load", CCGrid 2021, May 31, 2021,

Tonglin Li, Suren Byna, Quincey Koziol, Houjun Tang, Jean Luca Bez, Qiao Kang, "h5bench: HDF5 I/O Kernel Suite for Exercising HPC I/O Patterns", Cray User Group (CUG) 2021, January 1, 2021,

Jean Luca Bez, Houjun Tang, Bing Xie, David Williams-Young, Rob Latham, Rob Ross, Sarp Oral, Suren Byna, "I/O Bottleneck Detection and Tuning: Connecting the Dots using Interactive Log Analysis", 2021 IEEE/ACM Sixth International Parallel Data Systems Workshop (PDSW), January 1, 2021, 15-22, doi: 10.1109/PDSW54622.2021.00008

Houjun Tang, Suren Byna, Bin Dong, Quincey Koziol, "Parallel Query Service for Object-centric Data Management Systems", 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE, May 18, 2020, 406-415,

Houjun Tang, Suren Byna, Stephen Bailey, Zarija Lukic, Jialin Liu, Quincey Koziol, Bin Dong, "Tuning Object-centric Data Management Systems for Large Scale Scientific Applications", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 18, 2019,

Richard Warren, Jerome Soumagne, Jingqing Mu, Houjun Tang, Suren Byna, Bin Dong, Quincey Koziol, "Analysis in the Data Path of an Object-centric Data Management System", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 18, 2019,

Houjun Tang, Quincey Koziol, Suren Byna, John Mainzer, Tonglin Li, "Enabling Transparent Asynchronous I/O using Background Threads", 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW 2019), November 19, 2019, doi: DOI 10.1109/PDSW49588.2019.00006

Wei Zhang, Suren Byna, Houjun Tang, Brody Williams, Yong Chen, "MIQS: Metadata Indexing and erying Service for Self-Describing File Formats", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), November 19, 2019,

Tonglin Li, Quincey Koziol, Houjun Tang, Jialin Liu, Suren Byna, "I/O Performance Analysis of Science Applications Using HDF5 File-level Provenance", Cray User Group (CUG) 2019, May 10, 2019,

Jingqing Mu, Jerome Soumagne, Suren Byna, Quincey Koziol, Houjun Tang, Richard Warren, "Interfacing HDF5 with A Scalable Object-centric Storage System on Hierarchical Storage", Cray User Group (CUG) 2019, May 7, 2019,

Bin Dong, Kesheng Wu, Suren Byna, Houjun Tang, "SLOPE: Structural Locality-Aware Programming Model for Composing Array Data Analysis", International Conference on High Performance Computing, January 1, 2019, 61--80,

Jialin Liu, Quincey Koziol, Gregory Butler, Neil Fortner, Mohamad Chaarawi, Houjun Tang, Suren Byna, Glenn Lockwood, Ravi Cheema, Kristy Kallback-Rose, Damian Hazen, Prabhat, "Evaluation of HPC Application I/O on Object Storage Systems", 3rd Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems (PDSW-DISCS), November 12, 2018,

Wei Zhang, Houjun Tang, Suren Byna, Yong Chen, "DART: Distributed Adaptive Radix Tree for Efficient Affix-based Keyword Search on HPC Systems", Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, November 1, 2018, 24,

Kimmy Mu, Jerome Soumagne, Houjun Tang, Suren Byna, Quincey Koziol, Richard Warren, "A Server-managed Transparent Object Storage Abstraction for HPC", 2018 IEEE International Conference on Cluster Computing (CLUSTER), September 10, 2018,

Teng Wang, Suren Byna, Bin Dong, and Houjun Tang, "UniviStor: Integrated Hierarchical and Distributed Storage for HPC", IEEE Cluster 2018., September 1, 2018,

Houjun Tang, Suren Byna, Francois Tessier, Teng Wang, Bin Dong, Jingqing Mu, Quincey Koziol, Jerome Soumagne, Venkatram Vishwanath, Jialin Liu, and Richard Warren, "Toward Scalable and Asynchronous Object-centric Data Management for HPC", 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) 2018, May 1, 2018,

Bin Dong, Teng Wang, Houjun Tang, Quincey Koziol, Kesheng Wu, Suren Byna, "ARCHIE: Data analysis acceleration with array caching in hierarchical storage", 2018 IEEE International Conference on Big Data (Big Data), January 1, 2018, 211--220,

Houjun Tang, Suren Byna, Bin Dong, Jialin Liu, and Quincey Koziol, "SoMeta: Scalable Object-centric Metadata Management for High Performance Computing", IEEE Cluster 2017, September 5, 2017,

Jialin Liu, Quincey Koziol, Houjun Tang, François Tessier, Wahid Bhimji, Brandon Cook, Brian Austin, Suren Byna, Bhupender Thakur, Glenn Lockwood, Jack Deslippe, Prabhat, "Understanding the I/O Performance Gap Between Cori KNL and Haswell", Cray User Group Conference 2017 (CUG 2017), May 1, 2017,

Wenzhao Zhang, Houjun Tang, Xiaocheng Zou, Steven Harenberg, Qing Liu, Scott Klasky, Nagiza F Samatova, "Exploring Memory Hierarchy to Improve Scientific Data Read Performance", 2015 IEEE International Conference on Cluster Computing, 2016, 66--69,

Houjun Tang, Suren Byna, Steve Harenberg, Wenzhao Zhang, Xiaocheng Zou, Daniel F Martin, Bin Dong, Dharshi Devendran, Kesheng Wu, David Trebotich, others, "In situ storage layout optimization for amr spatio-temporal read accesses", 2016 45th International Conference on Parallel Processing (ICPP), January 1, 2016, 406--415,

Houjun Tang, Suren Byna, Steve Harenberg, Xiaocheng Zou, Wenzhao Zhang, Kesheng Wu, Bin Dong, Oliver Rubel, Kristofer Bouchard, Scott Klasky, others, "Usage pattern-driven dynamic data layout reorganization", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), January 1, 2016, 356--365,

Wenzhao Zhang, Houjun Tang, Steve Harenberg, Surendra Byna, Xiaocheng Zou, Dharshi Devendran, Daniel F Martin, Kesheng Wu, Bin Dong, Scott Klasky, others, "Amrzone: A runtime amr data sharing framework for scientific applications", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), January 1, 2016, 116--125,

Xiaocheng Zou, David A Boyuka II, Dhara Desai, Daniel F Martin, Suren Byna, Kesheng Wu, "AMR-aware in situ indexing and scalable querying", Proceedings of the 24th High Performance Computing Symposium, January 1, 2016, 26,

Wenzhao Zhang, Houjun Tang, Stephen Ranshous, Surendra Byna, Daniel F Mart\ \in, Kesheng Wu, Bin Dong, Scott Klasky, Nagiza F Samatova, "Exploring memory hierarchy and network topology for runtime AMR data sharing across scientific applications", 2016 IEEE International Conference on Big Data (Big Data), January 1, 2016, 1359--1366,

Xiaocheng Zou, Kesheng Wu, David A. Boyuka, Daniel F. Martin, Suren Byna, Houjun, Kushal Bansal, Terry J. Ligocki, Hans Johansen, and Nagiza F. Samatova, "Parallel In Situ Detection of Connected Components Adaptive Mesh Refinement Data", Proceedings of the Cluster, Cloud and Grid Computing (CCGrid) 2015, 2015,

David A Boyuka II, Houjun Tang, Kushal Bansal, Xiaocheng Zou, Scott Klasky, Nagiza F Samatova, "The hyperdyadic index and generalized indexing and query with PIQUE", Proceedings of the 27th International Conference on Scientific and Statistical Database Management, 2015, 20,

John Jenkins, Xiaocheng Zou, Houjun Tang, Dries Kimpe, Robert Ross, Nagiza F Samatova, "Radar: Runtime asymmetric data-access driven scientific data replication", International Supercomputing Conference, 2014, 296--313,

Houjun Tang, Xiaocheng Zou, John Jenkins, David A Boyuka II, Stephen Ranshous, Dries Kimpe, Scott Klasky, Nagiza F Samatova, "Improving read performance with online access pattern analysis and prefetching", European Conference on Parallel Processing, 2014, 246--257,

Xiaocheng Zou, Sriram Lakshminarasimhan, David A Boyuka II, Stephen Ranshous, Houjun Tang, Scott Klasky, Nagiza F Samatova, "Fast set intersection through run-time bitmap construction over pfordelta-compressed indexes", European Conference on Parallel Processing, 2014, 668--679,

Eric R Schendel, Steve Harenberg, Houjun Tang, Venkatram Vishwanath, Michael E Papka, Nagiza F Samatova, "A generic high-performance method for deinterleaving scientific data", European Conference on Parallel Processing, 2013, 571--582,

Md Kamal Hossain Chowdhury, Houjun Tang, Jean Luca Bez, Purushotham V. Bangalore, Suren Byna, "Efficient Asynchronous I/O with Request Merging", 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), St. Petersburg, FL, USA, IEEE, December 31, 1969, 628-636, doi: 10.1109/IPDPSW59300.2023.00107

Presentation/Talks

Suren Byna, Houjun Tang, and Quincey Koziol,, Automatic and Transparent Scientific Data Management with Object Abstractions, PASC 2021, in a Minisymposium on "Data Movement Orchestration on HPC Systems", July 31, 2021,

Suren Byna, Quincey Koziol, Venkatram Vishwanath, Jerome Soumagne, Houjun Tang, Kimmy Mu, Richard Warren, François Tessier, Bin Dong, Teng Wang, and Jialin Liu, Proactive Data Containers (PDC): An object-centric data store for large-scale computing systems, AGU Fall Meeting, December 13, 2018,