Bin Dong
Bin
Dong
Research Scientist
Scientific Data Division
Phone: 510-779-8060
Scientific Data Management Group
Lawrence Berkeley National Laboratory One Cyclotron Road
MS 50B 3239A
Berkeley,
California
94720
us
Bin Dong (董斌)
Position
Research Scientist , Scientific Data Management (SDM) Group, LBNL, 2016 - Present
Postdoctoral Research Fellow, Scientific Data Management (SDM) Group, LBNL, 2013 - 2016.
Research Interests
Bin's research interests include big scientific data management and analysis, parallel computing, and machine learning, among others
Currently, Bin is exploring new and scalable algorithms and data structures for sorting, organizing, indexing, searching, analyzing Big scientific data (mostly as array) with supercomputers.
Temporary repositories for the software I am working on:
- SDS framework (ask permission): https://code.lbl.gov/svn/sds/
- ArrayUDF: https://bitbucket.org/arrayudf/
- DataElevator: https://bitbucket.org/sbyna/dataelevator
- SDS-Sort: to come soon.
Publications
Following is a select list of publications. (»View Bin Dong's full publications list on Google Scholar.)
Journal Articles
R. Han, M, Zheng, S. Byna, H. Tang, B. Dong, D. Dai, Y. Chen, D. Kim, J. Hassoun, D. Thorsley, M. Wolf, "PROV-IO: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems", IEEE Transactions on Parallel and Distributed Systems, March 14, 2024,
Bin Dong, Alex Popescu, Veronica Rodriguez Tribaldos, Suren Byna, Jonathan Ajo-Franklin, Kesheng Wu, "Real-time and post-hoc compression for data from Distributed Acoustic Sensing", Computers \& Geosciences, June 24, 2022, 105181,
- Download File: wu2022.bib (bib: 22 KB)
Jonathan Ajo‐Franklin, Verónica Rodríguez Tribaldos, Avinash Nayak, Feng Cheng, Robert Mellors, Benxin Chi, Todd Wood, Michelle Robertson, Cody Rotermund, Eric Matzel, Dennise C. Templeton, Christina Morency, Kesheng Wu, Bin Dong, Patrick Dobson;, "The Imperial Valley Dark Fiber Project: Toward Seismic Studies Using DAS and Telecom Infrastructure for Geothermal Applications", Seismological Research Letters, June 24, 2022,
Suren Byna, M. Scot Breitenfeld, Bin Dong, Quincey Koziol, Elena Pourmal, Dana Robinson, Jerome Soumagne, Houjun Tang, Venkatram Vishwanath, and Richard Warren, "ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems", Journal of Computer Science and Technology 2020, 35(1): 145-160, February 2, 2020, doi: 10.1007/s11390-020-9822-9
Bin Dong, Xiuqiao Li, Limin Xiao, Li Ruan, "Towards minimizing disk I/O contention: A partitioned file assignment approach", Future Generation Computer Systems, Volume 37, July 2014, Pages 178-190, 2014,
Bin Dong, Xiuqiao Li, Qimeng Wu, Limin Xiao, Li Ruan, "A dynamic and adaptive load balancing strategy for parallel file system with large-scale I/O servers", Journal of Parallel and Distributed Computing (JPDC), Volume 72, Issue 10, October 2012, Pages 1254-1268, 2012,
Conference Papers
B. Dong, A. Nayak, K. Wu, V. Tribaldos, J. Ajo-Franklin, Q. Zhang, S. Byna, F. Guo, P. Dobson, A. Sim, "TensorSearch: Parallel Similarity Search on Tensors", IEEE International Conference on Big Data (BigData), 2024,
Bin Dong, Kesheng Wu, Suren Byna, "The Art of Sparsity: Mastering High-Dimensional Tensor Storage", 2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), May 27, 2024,
- Download File: sci_data_sparse_update.pdf (pdf: 473 KB)
Bin Dong, Jean Luca Bez, Suren Byna, "AIIO: Using Artificial Intelligence for Job-Level and Automatic I/O Performance Bottleneck Diagnosis.", In Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’23), June 16, 2023,
- Download File: IODiagnose-final.pdf (pdf: 1.9 MB)
Runzhou Han, Suren Byna, Houjun Tang, Bin Dong, and Mai Zheng,, "PROV-IO: An I/O-Centric Provenance Framework for Scientific Data on HPC Systems", HPDC 2022, June 23, 2022,
Bin Dong, Ver\ onica Rodr\ \iguez Tribaldos, Xin Xing, Suren Byna, Jonathan Ajo-Franklin, Kesheng Wu, "DASSA: Parallel DAS Data Storage and Analysis for Subsurface Event Detection", 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), July 14, 2020, 254--263,
- Download File: paper.pdf (pdf: 3.9 MB)
Houjun Tang, Suren Byna, Bin Dong, Quincey Koziol, "Parallel Query Service for Object-centric Data Management Systems", 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE, May 18, 2020, 406-415,
Houjun Tang, Suren Byna, Stephen Bailey, Zarija Lukic, Jialin Liu, Quincey Koziol, Bin Dong, "Tuning Object-centric Data Management Systems for Large Scale Scientific Applications", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 18, 2019,
Richard Warren, Jerome Soumagne, Jingqing Mu, Houjun Tang, Suren Byna, Bin Dong, Quincey Koziol, "Analysis in the Data Path of an Object-centric Data Management System", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 18, 2019,
Bin Dong, Patrick Kilian, Xiaocan Li, Fan Guo, Suren Byna, Kesheng Wu, "Terabyte-scale Particle Data Analysis: An ArrayUDF Case Study", Proceedings of the 31st International Conference on Scientific and Statistical Database Management, January 1, 2019, 202--205,
- Download File: VPIC-SSDBM-Camera-ready.pdf (pdf: 271 KB)
Bin Dong, Kesheng Wu, Suren Byna, Houjun Tang, "SLOPE: Structural Locality-Aware Programming Model for Composing Array Data Analysis", International Conference on High Performance Computing, January 1, 2019, 61--80,
- Download File: slope-cr.pdf (pdf: 623 KB)
Teng Wang, Suren Byna, Bin Dong, and Houjun Tang, "UniviStor: Integrated Hierarchical and Distributed Storage for HPC", IEEE Cluster 2018., September 1, 2018,
Houjun Tang, Suren Byna, Francois Tessier, Teng Wang, Bin Dong, Jingqing Mu, Quincey Koziol, Jerome Soumagne, Venkatram Vishwanath, Jialin Liu, and Richard Warren, "Toward Scalable and Asynchronous Object-centric Data Management for HPC", 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) 2018, May 1, 2018,
Kesheng Wu, Bin Dong, Surendra Byna, "Scientific Data Services Framework for Plasma Physics", APS, 2018, 2018:BM10--006,
Xin Xing, Bin Dong, Jonathan Ajo-Franklin, Kesheng Wu, "Automated Parallel Data Processing Engine with Application to Large-Scale Feature Extraction", 2018 IEEE/ACM Machine Learning in HPC Environments (MLHPC), January 1, 2018, 37--46,
- Download File: arrayudf-das.pdf (pdf: 2.7 MB)
Bin Dong, Teng Wang, Houjun Tang, Quincey Koziol, Kesheng Wu, Suren Byna, "ARCHIE: Data analysis acceleration with array caching in hierarchical storage", 2018 IEEE International Conference on Big Data (Big Data), January 1, 2018, 211--220,
- Download File: DataElevator-ARCHIE.pdf (pdf: 613 KB)
Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, Anna YQ Ho, Peter Nugent, "Distributed caching for processing raw arrays", Proceedings of the 30th International Conference on Scientific and Statistical Database Management, 2018, 1--12,
Houjun Tang, Suren Byna, Bin Dong, Jialin Liu, and Quincey Koziol, "SoMeta: Scalable Object-centric Metadata Management for High Performance Computing", IEEE Cluster 2017, September 5, 2017,
Tzuhsien Wu, Jerry Chou, Shyng Hao, Bin Dong, Scott Klasky, Kesheng Wu, "Optimizing the query performance of block index through data analysis and I/O modeling", Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, January 1, 2017, 1--10,
Bin Dong, Kesheng Wu, Surendra Byna, Jialin Liu, Weijie Zhao, Florin Rusu, "ArrayUDF: User-defined scientific data analysis on arrays", Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, January 1, 2017, 53--64,
- Download File: hpdc02.pdf (pdf: 921 KB)
Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, Peter Nugent, "Incremental view maintenance over array data", Proceedings of the 2017 ACM International Conference on Management of Data, January 1, 2017, 139--154,
Bin Dong, Surendra Byna, Kesheng Wu, "SDS-Sort: Scalable Dynamic Skew-aware Parallel", HPDC 16, New York, NY, USA, ACM, 2016, 57--68, doi: 10.1145/2907294.2907300
Wenzhao Zhang, Houjun Tang, Steve Harenberg, Surendra Byna, Xiaocheng Zou, Dharshi Devendran, Daniel F Martin, Kesheng Wu, Bin Dong, Scott Klasky, others, "Amrzone: A runtime amr data sharing framework for scientific applications", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), January 1, 2016, 116--125,
Wenzhao Zhang, Houjun Tang, Stephen Ranshous, Surendra Byna, Daniel F Mart\ \in, Kesheng Wu, Bin Dong, Scott Klasky, Nagiza F Samatova, "Exploring memory hierarchy and network topology for runtime AMR data sharing across scientific applications", 2016 IEEE International Conference on Big Data (Big Data), January 1, 2016, 1359--1366,
Houjun Tang, Suren Byna, Steve Harenberg, Wenzhao Zhang, Xiaocheng Zou, Daniel F Martin, Bin Dong, Dharshi Devendran, Kesheng Wu, David Trebotich, others, "In situ storage layout optimization for amr spatio-temporal read accesses", 2016 45th International Conference on Parallel Processing (ICPP), January 1, 2016, 406--415,
Bin Dong, Suren Byna, Kesheng Wu, Hans Johansen, Jeffrey N Johnson, Noel Keen, others, "Data elevator: Low-contention data movement in hierarchical storage system", 2016 IEEE 23rd international conference on high performance computing (HiPC), January 1, 2016, 152--161,
- Download File: 201612-DataElevator-HiPC2016-Bin-Byna.pdf (pdf: 765 KB)
Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, "Similarity Join over Array Data", SIGMOD, January 1, 2016, 2007--2022,
Tzuhsien Wu, Hao Shyng, Jerry Chou, Bin Dong, Kesheng Wu, "Indexing blocks to reduce space and time requirements for searching large data files", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), January 1, 2016, 398--402,
Houjun Tang, Suren Byna, Steve Harenberg, Xiaocheng Zou, Wenzhao Zhang, Kesheng Wu, Bin Dong, Oliver Rubel, Kristofer Bouchard, Scott Klasky, others, "Usage pattern-driven dynamic data layout reorganization", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), January 1, 2016, 356--365,
Xiaocheng Zou, David A Boyuka II, Dhara Desai, Daniel F Martin, Suren Byna, Kesheng Wu, "AMR-aware in situ indexing and scalable querying", Proceedings of the 24th High Performance Computing Symposium, January 1, 2016, 26,
Bin Dong, Surendra Byna, Kesheng Wu, "Sds-sort: Scalable dynamic skew-aware parallel sorting", Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, January 1, 2016, 57--68,
- Download File: SDS-Sort.pdf (pdf: 450 KB)