Careers | Phone Book | A - Z Index

SDM publications

Deb Agarwal

2019

P. Linton, W. Melodia, A. Lazar, D. Agarwal, L. Bianchi, D. Ghoshal, K. Wu, G. Pastorello, L. Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), 2019,

Payton A Linton, William M Melodia, Alina Lazar, Deborah Agarwal, Ludovico Bianchi, Devarshi Ghoshal, Kesheng Wu, Gilberto Pastorello, Lavanya Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", 2019,

2016

Deborah A Agarwal, Boris Faybishenko, Vicky L Freedman, Harinarayan Krishnan, Gary Kushner, Carina Lansing, Ellen Porter, Alexandru Romosan, Arie Shoshani, Haruko Wainwright, others, "A science data gateway for environmental management", Concurrency and Computation: Practice and Experience, 2016, 28:1994--2004,

2012

Karen L. Schuchardt, Deborah A. Agarwal, Stefan A. Finsterle, Carl W. Gable, Ian Gorton, Luke J. Gosink, Elizabeth H. Keating, Carina S. Lansing, Joerg Meyer, William A.M. Moeglein, George S.H. Pau, Ellen A. Porter, Sumit Purohit, Mark L. Rockhold, Arie Shoshani, and Chandrika Sivaramakrishnan, Akuna, "Integrated Toolsets Supporting Advanced Subsurface Flow and Transport Simulations for Environmental Management", XIX International Conference on Computational Methods in Water Resources (CMWR 2012), University of Illinois at Urbana-Champaign, June 2012,

Karen L. Schuchardt, Deborah A. Agarwal, Stefan A. Finsterle, Carl W. Gable, Ian Gorton, Luke J. Gosink, Elizabeth H. Keating, Carina S. Lansing, Joerg Meyer, William A.M. Moeglein, George S.H. Pau, Ellen A. Porter, Sumit Purohit, Mark L. Rockhold, Arie Shoshani, Chandrika Sivaramakrishnan, "Akuna-Integrated Toolsets Supporting Advanced Subsurface Flow and Transport Simulations for Environmental Management", XIX International Conference on Computational Methods in Water Resources (CMWR 2012), University of Illinois at Urbana-Champaign, June 17-22, 2012, 2012,

Brian Austin

2015

Suren Byna, Brian Austin, "Evaluation of Parallel I/O Performance and Energy Consumption with Frequency Scaling on Cray XC30", Cray User Group (CUG) meeting 2015, 2015,

2011

R. Ryne, B. Austin, J. Byrd, J. Corlett, E. Esarey, C. G. R. Geddes, W. Leemans, X. Li, Prabhat, J. Qiang, O. Rübel, J.-L. Vay, M. Venturini, K. Wu, B. Carlsten, D. Higdon and N. Yampolsky, "High Performance Computing in Accelerator Science: Past Successes, Future Challenges", Workshop on Data and Communications in Basic Energy Sciences: Creating a Pathway for Scientific Discovery, October 2011,

Jerry Chou, Mark Howison, Brian Austin, Kesheng Wu, Ji Qiang, E Wes Bethel, Arie Shoshani, Oliver R\ ubel, Rob D Ryne, "Parallel index and query for large scale data analysis", Proceedings of 2011 international conference for high performance computing, networking, storage and analysis, 2011, 1--11, LBNL 5317E,

Zhaojun Bai

2010

Ichitaro Yamazaki, Zhaojun Bai, Horst D. Simon Lin-Wang Wang, Kesheng Wu, "Adaptive Projection Subspace Dimension for the Lanczos Method", ACM Transactions on Mathematical Software, 2010, 37, doi: 10.1145/1824801.1824805

David H. Bailey

2015

David H Bailey, Stephanie Ger, Marcos L\ opez de Prado, Alexander Sim, "Statistical overfitting and backtest performance", Risk-Based and Factor Investing, 2015,

http://ssrn.com/abstract=2507040

2014

David H. Bailey, Jonathan M. Borwein, Marcos Lopez de Prado, Qiji Jim Zhu, "Pseudo-mathematics and financial charlatanism: The effects of backtest over fitting on out-of-sample performance", Notices of the American Mathematical Society, May 1, 2014, 458-471,

Recent computational advances allow investment managers to search for profitable investment strategies. In many instances, that search involves a pseudo-mathematical argument, which is spuriously validated through a simulation of its historical performance (also called backtest).

We prove that high performance is easily achievable after backtesting a relatively small number of alternative strategy configurations, a practice we denote “backtest overfitting”. The higher the number of configurations tried, the greater is the probability that the backtest is overfit. Because financial analysts rarely report the number of configurations tried for a given backtest, investors cannot evaluate the degree of overfitting in most investment proposals.

The implication is that investors can be easily misled into allocating capital to strategies that appear to be mathematically sound and empirically supported by an outstanding backtest. This practice is particularly pernicious, because due to the nature of financial time series, backtest overfitting has a detrimental effect on the future strategy’s performance.

David H. Bailey, Stephanie Ger, Marcos L\ opez Prado, Alexander Sim, Kesheng Wu, "Statistical Overfitting and Backtest Performance", http://ssrn.com/abstract2507040, ( January 1, 2014)

ISBN 978-1-78548-008-9

Mehmet Balman

2013

Mehmet Balman, "Advance Resource Provisioning in Bulk Data Scheduling", 27th IEEE International Conference on Advanced Information Networking and Applications (AINA), 2013, LBNL 6364E, doi: http://dx.doi.org/10.1109/AINA.2013.5

Today's scientific and business applications generate massive data sets that need to be transferred to remote sites for sharing, processing, and long term storage. Because of increasing data volumes and enhancement in current network technology that provide on-demand high-speed data access between collaborating institutions, data handling and scheduling problems have reached a new scale. In this paper, we present a new data scheduling model with advance resource provisioning, in which data movement operations are defined with earliest start and latest completion times. We analyze time-dependent resource assignment problem, and propose a new methodology to improve the current systems by allowing researchers and higher-level meta-schedulers to use data-placement as-a-service, so they can plan ahead and submit transfer requests in advance. In general, scheduling with time and resource conflicts is {NP-hard}. We introduce an efficient algorithm to organize multiple requests on the fly, while satisfying users' time and resource constraints. We successfully tested our algorithm in a simple benchmark simulator that we have developed, and demonstrated its performance with initial test results.

Keywords: scheduling with constraints, bulk data movement, time-dependent graphs, network reservation, Gale-Shapley algorithm

2012

Mehmet Balman, "MemzNet: Memory-Mapped Zero-copy Network Channel for Moving Large Datasets over 100Gbps Networks", technical poster in ACM/IEEE international Conference For High Performance Computing, Networking, Storage and Analysis (SC'12), LBNL 6175E, November 13, 2012, doi: http://doi.ieeecomputersociety.org/10.1109/SC.Companion.2012.294

High-bandwidth networks are poised to provide new opportunities in tackling large data challenges in today's scientific applications. However, increasing the bandwidth is not sufficient by itself; we need careful evaluation of future high-bandwidth networks from the applications' perspective. We have experimented with current state-of-the-art data movement tools, and realized that file-centric data transfer protocols do not perform well with managing the transfer of many small files in high-bandwidth networks, even when using parallel streams or concurrent transfers. We require enhancements in current middleware tools to take advantage of future networking frameworks. To improve performance and efficiency, we develop an experimental prototype, called MemzNet: Memory-mapped Zero-copy Network Channel, which uses a block-based data movement method in moving large scientific datasets. We have implemented MemzNet that takes the approach of aggregating files into blocks and providing dynamic data channel management. In this work, we present our initial results in 100Gbps networks.
http://dx.doi.org/10.1109/SC.Companion.2012.294               
http://dx.doi.org/10.1109/SC.Companion.2012.295

Mehmet Balman, "Streaming Exascale Data over 100Gbps Networks", IEEE Computing Now, November 8, 2012, LBNL 6173E,

Mehmet Balman, "Analyzing Data Movements and Identifying Techniques for Next-generation High-bandwidth Networks", LBNL Tech Report, 2012, LBNL 6177E,

High-bandwidth networks are poised to provide new opportunities in tackling large data challenges in today's scientific applications. However, increasing the bandwidth is not sufficient by itself; we need careful evaluation of future high-bandwidth networks from the applications’ perspective. We have investigated data transfer requirements of climate applications as a typical scientific example and evaluated how the scientific community can benefit from next generation high-bandwidth networks.  We develop a new block-based data movement method (in contrast to the current file-based methods) to improve data movement performance and efficiency in moving large scientific datasets that contain many small files. We implemented the new block-based data movement tool, which takes the approach of aggregating files into blocks and providing dynamic data channel management. One of the major obstacles in use of high-bandwidth networks is the limitation in host system resources. We have conducted a large number of experiments with our new block-based method and with current available file-based data movement tools.  In this white paper, we describe future research problems and challenges for efficient use of next-generation science networks, based on the lessons learnt and the experiences gained with 100Gbps network applications.

Mehmet Balman, Eric Pouyoul, Yushu Yao, E. Wes Bethel, Burlen Loring, Prabhat, John Shalf, Alex Sim, and Brian L. Tierney, "Experiences with 100G Network Applications", In Proceedings of the Fifth international Workshop on Data-intensive Distributed Computing, in conjunction with ACM High Performance Distributing Computing (HPDC) Conference, 2012, Delft, Netherlands, June 2012, LBNL 5603E, doi: 10.1145/2286996.2287004

100Gbps networking has finally arrived, and many research and educational in- stitutions have begun to deploy 100Gbps routers and services. ESnet and Internet2 worked together to make 100Gbps networks available to researchers at the Super- computing 2011 conference in Seattle Washington. In this paper, we describe two of the first applications to take advantage of this network. We demonstrate a visu- alization application that enables remotely located scientists to gain insights from large datasets. We also demonstrate climate data movement and analysis over the 100Gbps network. We describe a number of application design issues and host tuning strategies necessary for enabling applications to scale to 100Gbps rates. 

M. Balman, A. Sim, "Scaling the Earth System Grid to 100Gbps Networks", 2012, LBNL 5794E,

2011

Mehmet Balman, Suredra Byna, "Open Problems in network-aware data management in exa-scale computing and terabit networking era", In Proceedings of the First international Workshop on Network-Aware Data Management, in conjunction with ACM/IEEE international Conference For High Performance Computing, Networking, Storage and Analysis, 2011, Seattle, WA, November 11, 2011, LBNL 6176E, doi: http://dx.doi.org/10.1145/2110217.2110229

Accessing and managing large amounts of data is a great challenge in collaborative computing environments where resources and users are geographically distributed. Recent advances in network technology led to next-generation high- performance networks, allowing high-bandwidth connectiv- ity. Efficient use of the network infrastructure is necessary in order to address the increasing data and compute require- ments of large-scale applications. We discuss several open problems, evaluate emerging trends, and articulate our per- spectives in network-aware data management. 

T. Kosar, M. Balman, E. Yildirim, S. Kulasekaran, B. Ross, "Stork Data Scheduler: Mitigating the Data Bottleneck in e-Science", Philosophical Transactions of the Royal Society A, Vol.369 (2011), pp. 3254-3267, July 18, 2011, doi: 10.1098/rsta.2011.0148

In this paper, we present the Stork data scheduler as a solution for mitigating the data bottleneck in e-Science and data-intensive scientific discovery. Stork focuses on planning, scheduling, monitoring and management of data placement tasks and application-level end-to-end optimization of networked inputs/outputs for petascale distributed e-Science applications. Unlike existing approaches, Stork treats data resources and the tasks related to data access and movement as first-class entities just like computational resources and compute tasks, and not simply the side-effect of computation. Stork provides unique features such as aggregation of data transfer jobs considering their source and destination addresses, and an application-level throughput estimation and optimization service. We describe how these two features are implemented in Stork and their effects on end-to-end data transfer performance.

Dean N. Williams, Ian T. Foster, Don E. Middleton, Rachana Ananthakrishnan, Neill Miller, Mehmet Balman, Junmin Gu, Vijaya Natarajan, Arie Shoshani, Alex Sim, Gavin Bell, Robert Drach, Michael Ganzberger, Jim Ahrens, Phil Jones, Daniel Crichton, Luca Cinquini, David Brown, Danielle Harper, Nathan Hook, Eric Nienhouse, Gary Strand, Hannah Wilcox, Nathan Wilhelmi, Stephan Zednik, Steve Hankin, Roland Schweitzer, John Harney, Ross Miller, Galen Shipman, Feiyi Wang, Peter Fox, Patrick West, Stephan Zednik, Ann Chervenak, Craig Ward, "Earth System Grid Center for Enabling Technologies (ESG-CET): A Data Infrastructure for Data-Intensive Climate Research", SciDAC Conference, 2011,

T. Kosar, I. Akturk, M. Balman, X. Wang, "PetaShare: A Reliable, Efficient, and Transparent Distributed Storage Management System", Journal Scientific Programming archive Volume 19 Issue 1, January 2011 Pages 27-43, 2011,

Modern collaborative science has placed increasing burden on data management infrastructure to handle the increasingly large data archives generated. Beside functionality, reliability and availability are also key factors in delivering a data management system that can efficiently and effectively meet the challenges posed and compounded by the unbounded increase in the size of data generated by scientific applications. We have developed a reliable and efficient distributed data storage system, PetaShare, which spans multiple institutions across the state of Louisiana. At the back-end, PetaShare provides a unified name space and efficient data movement across geographically distributed storage sites. At the front-end, it provides light-weight clients the enable easy, transparent and scalable access. In PetaShare, we have designed and implemented an asynchronously replicated multi-master metadata system for enhanced reliability and availability, and an advanced buffering system for improved data transfer performance. In this paper, we present the details of our design and implementation, show performance results, and describe our experience in developing a reliable and efficient distributed data management system for data-intensive science.

2010

Alex Sim, Mehmet Balman, Dean N. Williams, Arie Shoshani, Vijaya Natarajan, "Adaptive Transfer Adjustment in Efficient Bulk Data Transfer Management for Climate Datasets", The 22nd IASTED International Conference on Parallel and Distributed Computing and System, Marina Del Rey, CA, November 20, 2010, LBNL 3985E,

Many scientific applications and experiments, such as high energy and nuclear physics, astrophysics, climate observation and modeling, combustion, nano-scale material sciences, and computational biology, generate extreme volumes of data with a large number of files. These data sources are distributed among national and international data repositories, and are shared by large numbers of geographically distributed scientists. A large portion of the data is frequently accessed, and a large volume of data is moved from one place to another for analysis and storage. A challenging issue in such efforts is the limited network capacity for moving large datasets. A tool that addresses this challenge is the Bulk Data Mover (BDM), a data transfer management tool used in the Earth System Grid (ESG) community. It has been managing massive dataset transfers efficiently in the environment where the network bandwidth is limited. Adaptive transfer adjustment was studied to enhance the BDM to handle significant end-to-end performance changes in the dynamic network environments as well as to control the data transfers for the desired transfer performance. We describe the results from our hands-on data transfer management experience in the climate research community. We study a practical transfer estimation model and state our initial results from the adaptive transfer adjustment methodology. 

Mehmet Balman, Evangelos Chaniotakis, Arie Shoshani, Alex Sim, "A Flexible Reservation Algorithm for Advance Network Provisioning", ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, November 2010 (SC'10)., New Orleans, LA, IEEE Computer Society Washington, DC, USA ISBN: 978-1-4244-7559-, November 14, 2010, LBNL 4017E, doi: http://dx.doi.org/10.1109/SC.2010.4

Many scientific applications need support from a communication infrastructure that provides predictable performance, which requires effective algorithms for bandwidth reservations. Network reservation sys- tems such as ESnet’s OSCARS, establish guaranteed bandwidth of secure virtual circuits for a certain bandwidth and length of time. However, users currently cannot inquire about bandwidth availability, nor have alternative suggestions when reservation requests fail. In general, the number of reservation options is exponential with the number of nodes n, and current reservation commitments. We present a novel approach for path finding in time-dependent networks taking advantage of user-provided parameters of total volume and time constraints, which produces options for earliest completion and shortest duration. The theoretical complexity is only O(n2r2) in the worst-case, where r is the number of reservations in the desired time interval. We have implemented our algorithm and developed efficient methodologies for incorporation into network reservation frameworks. Performance measurements confirm the theoretical predictions. 

M. Balman, E. Chaniotakis, A. Shoshani, A. Sim, "A New Approach in Advance Network Reservation and Provisioning for High-Performance Scientific Data Transfers", 2010, LBNL 4091E,

Mehmet Balman, Tevfik Kosar, "Error Detection and Error Classification: Failure Awareness in Data Transfer Scheduling,", International Journal of Autonomic Computing 2010 - Vol. 1, No.4 pp. 425 - 446, DOI: 10.1504/IJAC.2010.037516, 2010, doi: http://dx.doi.org/10.1504/IJAC.2010.037516

Data transfer in distributed environment is prone to frequent failures resulting from back-end system level problems, like connectivity failure which is technically untraceable by users. Error messages are not logged efficiently, and sometimes are not relevant/useful from users' point-of-view. Our study explores the possibility of efficient error detection and reporting system for such environments. Prior knowledge about the environment and awareness of the actual reason behind a failure would enable higher level planners to make better and accurate decisions. It is necessary to have well defined error detection and error reporting methods to increase the usability and serviceability of existing data transfer protocols and data management systems. We investigate the applicability of early error detection and error classification techniques and propose an error reporting framework and a failure-aware data transfer life cycle to improve arrangement of data transfer operations and to enhance decision making of data transfer schedulers.

John B. Bell

2009

E. W. Bethel, C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, M. Day, E. Deines, T. Fogal, C. Garth, C. G. R. Geddes, H. Hagen, B. Hamann, C. Hansen, J. Jacobsen, K. Joy, J. Kruger, J. Meredith, P. Messmer, G. Ostrouchov, V. Pascucci, K. Potter, Prabhat, D. Pugmire, O. Rubel, A. Sanderson, C. Silva, D. Ushizima, G. Weber, B. Whitlock, K. Wu, "Occam's Razor and Petascale Visual Data Analysis", SciDAC 2009, J. of Physics: Conference Series, San Diego, California, July 2009, LBNL 2210E,

E. Wes Bethel

2016

Utkarsh Ayachit, Andrew Bauer, Earl PN Duque, Greg Eisenhauer, Nicola Ferrier, Junmin Gu, Kenneth E Jansen, Burlen Loring, Zarija Lukic, Suresh Menon, others, "Performance analysis, design considerations, and applications of extreme-scale in situ infrastructures", SC 16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2016, 921--932, LBNL 1007264,

2013

Kesheng Wu, Wes Bethel, Ming Gu, David, Oliver R\ ubel, Testing VPIN on Big Data, Available at SSRN 2318259, 2013,

Kesheng Wu, E Bethel, Ming Gu, David Leinweber, Oliver R\ ubel, "A big data approach to analyzing market volatility", Algorithmic Finance, 2013, 2:241--267, LBNL LBNL-6382E,

Understanding the microstructure of the financial market requires the processing of a vast amount of data related to individual trades, and sometimes even multiple levels of quotes. Analyzing such a large volume of data requires tremendous computing power that is not easily available to financial academics and regulators. Fortunately, public funded High Performance Computing (HPC) power is widely available at the National Laboratories in the US. In this paper we demonstrate that the HPC resource and the techniques for data-intensive sciences can be used to greatly accelerate the computation of an early warning indicator called Volume-synchronized Probability of Informed trading (VPIN). The test data used in this study contains five and a half year's worth of trading data for about 100 most liquid futures contracts, includes about 3 billion trades, and takes 140GB as text files. By using (1) a more efficient file format for storing the trading records, (2) more effective data structures and algorithms, and (3) parallelizing the computations, we are able to explore 16,000 different ways of computing VPIN in less than 20 hours on a 32-core IBM DataPlex machine. Our test demonstrates that a modest computer is sufficient to monitor a vast number of trading activities in real-time -- an ability that could be valuable to regulators.

Our test results also confirm that VPIN is a strong predictor of liquidity-induced volatility. With appropriate parameter choices, the false positive rates are about 7% averaged over all the futures contracts in the test data set. More specifically, when VPIN values rise above a threshold (CDF > 0.99), the volatility in the subsequent time windows is higher than the average in 93% of the cases.

E Wes Bethel, Prabhat Prabhat, Suren Byna, Oliver R\ ubel, K John Wu, Michael Wehner, "Why high performance visual data analytics is both relevant and difficult", Visualization and Data Analysis 2013, January 2013, 8654:86540B, LBNL LBNL-6063E,

2012

Mehmet Balman, Eric Pouyoul, Yushu Yao, E. Wes Bethel, Burlen Loring, Prabhat, John Shalf, Alex Sim, and Brian L. Tierney, "Experiences with 100G Network Applications", In Proceedings of the Fifth international Workshop on Data-intensive Distributed Computing, in conjunction with ACM High Performance Distributing Computing (HPDC) Conference, 2012, Delft, Netherlands, June 2012, LBNL 5603E, doi: 10.1145/2286996.2287004

100Gbps networking has finally arrived, and many research and educational in- stitutions have begun to deploy 100Gbps routers and services. ESnet and Internet2 worked together to make 100Gbps networks available to researchers at the Super- computing 2011 conference in Seattle Washington. In this paper, we describe two of the first applications to take advantage of this network. We demonstrate a visu- alization application that enables remotely located scientists to gain insights from large datasets. We also demonstrate climate data movement and analysis over the 100Gbps network. We describe a number of application design issues and host tuning strategies necessary for enabling applications to scale to 100Gbps rates. 

E. W. Bethel and D. Leinweber and O. Rubel and K. Wu, "Federal Market Information Technology in the Post Flash Crash Era: Roles of Supercomputing", The Journal of Trading, 2012, 7:9-24, LBNL 5263E, doi: 10.3905/jot.2012.7.2.009

E. W. Bethel, Surendra Byna, Jerry Chou, Cormier-Michel, Cameron G. R. Geddes, Howison, Fuyu Li, Prabhat, Ji Qiang, R\ ubel, Rob D. Ryne, Michael Wehner, Wu, "Big Data Analysis and Visualization: What Do LINACS Tropical Storms Have In Common?", 11th International Computational Accelerator Physics ICAP 2012, Germany, 2012,

Surendra Byna, Jerry Chou, Oliver Rubel, Homa Karimabadi, William S Daughter, Vadim Roytershteyn, E Wes Bethel, Mark Howison, Ke-Jou Hsu, Kuan-Wu Lin, others, "Parallel I/O, analysis, and visualization of a trillion particle simulation", SC 12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, January 2012, 1--12,

Oliver R\ ubel, Surendra Byna, Kesheng Wu, Fuyu Li, Michael Wehner, Wes Bethel, others, "Teca: A parallel toolkit for extreme climate analysis", Procedia Computer Science, Elsevier, January 2012, 9:866--876, LBNL 5352E,

We present TECA, a parallel toolkit for detecting extreme events in large climate datasets. Modern climate datasets expose parallelism across a number of dimensions: spatial locations, timesteps and ensemble members. We design TECA to exploit these modes of parallelism and demonstrate a prototype implementation for detecting and tracking three classes of extreme events: tropical cyclones, extra-tropical cyclones and atmospheric rivers. We process a modern TB-sized CAM5 simulation dataset with TECA, and demonstrate good runtime performance for the three case studies.

E. Wes Bethel, David Leinweber, Oliver Rübel Kesheng Wu, Federal Market Information Technology in the Crash Era: Roles for Supercomputing, The Journal of Trading, Pages: 9--25 2012, doi: 10.3905/jot.2012.7.2.009

2011

E. Wes Bethel, David Leinweber, Oliver Rübel, Kesheng Wu, "Federal Market Information Technology in the Post Flash Crash Era: Roles of Supercomputing", Workshop on High Performance Computational Finance at SC11, Seattle, WA, USA, November 2011, LBNL 5263E,

Jerry Chou, Mark Howison, Brian Austin, Kesheng Wu, Ji Qiang, E Wes Bethel, Arie Shoshani, Oliver R\ ubel, Rob D Ryne, "Parallel index and query for large scale data analysis", Proceedings of 2011 international conference for high performance computing, networking, storage and analysis, 2011, 1--11, LBNL 5317E,

M Prabhat, S Byna, C Paciorek, G Weber, K Wu, T Yopes, MF Wehner, G Ostrouchov, D Pugmire, R Strelitz, others, "Pattern Detection and Extreme Value Analysis on Large Climate Data", AGUFM, Pages: IN41C--03 January 2011,

J. Chou, K. Wu, O. R\ ubel, M. Howison, Qiang, Prabhat, B. Austin, E. W. Bethel, D. Ryne, A. Shoshani, "Parallel Index and Query for Large Scale Data", SC11, 2011, doi: 10.1145/2063384.2063424

Prabhat, Quincey Koziol, Karen Schuchardt, E. Wes Bethel, Jerry Chuo, Mark Howison, Mike, Bruce Palmer, Oliver Ruebel, Kesheng, ExaHDF5: An I/O Platform for Exascale Data Analysis and Performance, SciDAC 2011, 2011,

2010

Oliver R\ ubel, Sean Ahern, E Wes Bethel, Mark D Biggin, Hank Childs, Estelle Cormier-Michel, Angela DePace, Michael B Eisen, Charless C Fowlkes, Cameron GR Geddes, others, "Coupling visualization and data analysis for knowledge discovery from multi-dimensional scientific data", Procedia computer science, Elsevier, January 2010, 1:1757--1764, LBNL 3669E,

Gunther Weber, "Recent advances in visit: Amr streamlines and query-driven visualization", 2010,

Oliver Rübel, Sean Ahern, E. Wes Bethel, D. Biggin, Hank Childs, Estelle, Angela DePace, Michael B. Eisen Charless C. Fowlkes, Cameron G. R. Geddes, Hagen, Bernd Hamann, Min-Yu Huang, Soile E. Keränen, David W. Knowles, Cris L. Hendriks, Jitendra Malik, Jeremy Meredith Peter Messmer, Prabhat, Daniela Ushizima, H. Weber, Kesheng Wu, "Coupling visualization and data analysis for knowledge from multi-dimensional scientific data", Procedia Computer Science, 2010, 1:1751--1758, doi: 10.1016/j.procs.2010.04.197

2009

E. W. Bethel, C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, M. Day, E. Deines, T. Fogal, C. Garth, C. G. R. Geddes, H. Hagen, B. Hamann, C. Hansen, J. Jacobsen, K. Joy, J. Kruger, J. Meredith, P. Messmer, G. Ostrouchov, V. Pascucci, K. Potter, Prabhat, D. Pugmire, O. Rubel, A. Sanderson, C. Silva, D. Ushizima, G. Weber, B. Whitlock, K. Wu, "Occam's Razor and Petascale Visual Data Analysis", SciDAC 2009, J. of Physics: Conference Series, San Diego, California, July 2009, LBNL 2210E,

Luke Gosink, Kesheng Wu, E. Wes Bethel, John D. Owens, Kenneth I. Joy, "Data Parallel Bin-based Indexing for Answering Queries on Multi-core Architecture", Proceedings of the 21st International Conference on Scientific and Statistical Database Management (SSDBM), June 2009, 5566:110-129, LBNL 2211E,

K Wu, S Ahern, EW Bethel, J Chen, H Childs, C Geddes, J Gu, H Hagen, B Hamann, J Lauret, others, "FastBit: Interactively Searching Massive Data", Proc. of SciDAC 2009, 2009, LBNL 2164E,

Luke J Gosink, Kesheng Wu, E Wes Bethel, John D Owens, Kenneth I Joy, "Data parallel bin-based indexing for answering queries on multi-core architectures", International Conference on Scientific and Statistical Database Management, 2009, 110--129,

 

 

Oliver R\ ubel, Cameron GR Geddes, Estelle Cormier-Michel, Kesheng Wu, Gunther H Weber, Daniela M Ushizima, Peter Messmer, Hans Hagen, Bernd Hamann, Wes Bethel, others, "Automatic beam path analysis of laser wakefield particle acceleration data", Computational Science \& Discovery, January 2009, 2:015005, LBNL 2734E,

E Bethel, "Modern Scientific Visualization is More than Just Pretty Pictures", January 2009, LBNL 1450E,

C. G. R. Geddes, E Cormier-Michel, E. H. Esarey, C. B. Schroeder, J.-L. Vay, W. P. Leemans, D. L.. Bruhwiler, J. R. Cary, B. Cowan, M. Durant, P. Hamill, P. Messmer, P. Mullowney, C. Nieter, K. Paul, S. Shasharina, S. Veitzer, G. Weber, O. Rübel, D. Ushizima, Prabhat, E. W.Bethel, K. Wu, Large Fields for Smaller Facility Sources, SciDAC Review, Pages: 13-21, 2009,

2008

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "High Performance Multivariate Visual Data Exploration for Extemely Large Data", Supercomputing (SC), Austin, Texas, USA, November 2008, LBNL 716E,

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "Application of High-performance Visual Analysis Methods to Laser Wakefield Particle Acceleration Data", IEEE Visualization 2008, October 2008,

Luke J Gosink, "Bin-hash indexing: A parallel method for fast query processing", 2008, LBNL 729E,

E. Wes Bethel, Oliver Rübel, Prabhat, Wu, Gunther H. Weber, Valerio Pascucci Hank Childs, Ajith Mascarenhas, Jeremy, Sean Ahern, "Modern Scientific Visualization is More than Just Pictures", Numerical Modeling of Space Plasma Flows: (Astronomical Society of the Pacific Series), St. Thomas, USVI, 2008, 301--317,

2006

K. Wu, K. Stockinger, A. Shoshani, Wes, "FastBit--Helps Finding the Proverbial Needle in a", 2006, LBNL LBNL-PUB/963,

Luke Gosink, John Shalf, Kurt Stockinger, Wu, Wes Bethel, "HDF5-FastQuery: Accelerating Complex Queries on Datasets using Fast Bitmap Indices", SSDBM 2006, Vienna, Austria, July 2006, IEEE Computer Society Press., 2006, 149--158,

2005

E. Wes Bethel, Scott Campbell, Eli Dart, Lee, Steven A. Smith, Kurt Stockinger, Tierney, Kesheng Wu, "Interactive Analysis of Large Network Data Collections Query-Driven Visualization", 2005,

Kurt Stockinger, John Shalf, Kesheng Wu, E Wes Bethel, "Query-driven visualization of large data sets", VIS 05. IEEE Visualization, 2005., 2005, 167--174,

Surendra Byna

2020

D. Bard, C. Snavely, L. Gerhardt, J. Lee, B. Totzke, K. Antypas, S. Byna, R. Cheema, S. Cholia, M. Day, B. Enders, A. Gaur, A. Greiner, T. Groves, M. Kiran, Q. Koziol, K. Rowland, C. Samuel, A. Selvarajan, A. Sim, D. Skinner, R. Thomas, G. Torok, The Superfacility project: automated pipelines for experiments and HPC, International Conference for High Performance Computing, Networking, Storage, and Analysis (SC20), State of the Practice (SOP), 2020,

B. Enders, D. Bard, C. Snavely, L. Gerhardt, J. Lee, B. Totzke, K. Antypas, S. Byna, R. Cheema, S. Cholia, M. Day, A. Gaur, A. Greiner, T. Groves, M. Kiran, Q. Koziol, K. Rowland, C. Samuel, A. Selvarajan, A. Sim, D. Skinner, R. Thomas, G. Torok, "Cross-facility science with the Superfacility Project at LBNL", 2nd Workshop on Large-scale Experiment-in-the-Loop Computing (XLOOP 2020), in conjunction with the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC 20), 2020,

Bin Dong, Ver\ onica Rodr\ \iguez Tribaldos, Xin Xing, Suren Byna, Jonathan Ajo-Franklin, Kesheng Wu, "DASSA: Parallel DAS Data Storage and Analysis for Subsurface Event Detection", 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), July 14, 2020, 254--263,

Sunggon Kim, Alex Sim, Kesheng Wu, Suren Byna, Yongseok Son, Hyeonsang Eom, "Towards hpc i/o performance prediction through large-scale log analysis", Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020), 2020, 77--88, doi: 10.1145/3369583.3392678

Jiwoo Bang, Chungyong Kim, Kesheng Wu, Alex Sim, Suren Byna, Sunggon Kim, Hyeonsang Eom, "HPC Workload Characterization Using Feature Selection and Clustering", ACM International Workshop on ​System and Network Telemetry and Analysis (SNTA 2020), in conjunction with The 29th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020), 2020, 33--40, doi: 10.1145/3391812.3396270

Suren Byna, M. Scot Breitenfeld, Bin Dong, Quincey Koziol, Elena Pourmal, Dana Robinson, Jerome Soumagne, Houjun Tang, Venkatram Vishwanath, and Richard Warren, "ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems", Journal of Computer Science and Technology 2020, 35(1): 145-160, February 2, 2020, doi: 10.1007/s11390-020-9822-9

2019

Richard Warren, Jerome Soumagne, Jingqing Mu, Houjun Tang, Suren Byna, Bin Dong, Quincey Koziol, "Analysis in the Data Path of an Object-centric Data Management System", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 18, 2019,

Houjun Tang, Suren Byna, Stephen Bailey, Zarija Lukic, Jialin Liu, Quincey Koziol, Bin Dong, "Tuning Object-centric Data Management Systems for Large Scale Scientific Applications", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 18, 2019,

Wei Zhang, Suren Byna, Chenxu Niu, Yong Chen, "Exploring Metadata Search Essentials for Scientific Data Management", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 17, 2019,

Tirthak Patel, Suren Byna, Glenn K. Lockwood, Devesh Tiwari, "Revisiting I/O Behavior in Large-Scale Storage Systems: The Expected and the Unexpected", Supercomputing 2019 (SC19), November 24, 2019, doi: 10.1145/3295500.3356183

Donghe Kang, Oliver Rübel, Suren Byna, Spyros Blanas, "Comparison of Array Management Library Performance - A Neuroscience Use Case", SC19 Poster, November 20, 2019,

Megha Agarwal, Divyansh Singhvi, Preeti Malakar, Suren Byna, "Active Learning-based Automatic Tuning and Prediction of Parallel I/O Performance", 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW), November 19, 2019, doi: DOI 10.1109/PDSW49588.2019.00007

Glenn K. Lockwood, Shane Snyder, Suren Byna, Philip Carns, Nicholas J. Wright, "Understanding Data Motion in the Modern HPC Data Center", 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW), November 19, 2019, doi: DOI 10.1109/PDSW49588.2019.00012

Houjun Tang, Quincey Koziol, Suren Byna, John Mainzer, Tonglin Li, "Enabling Transparent Asynchronous I/O using Background Threads", 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW 2019), November 19, 2019, doi: DOI 10.1109/PDSW49588.2019.00006

Wei Zhang, Suren Byna, Houjun Tang, Brody Williams, Yong Chen, "MIQS: Metadata Indexing and erying Service for Self-Describing File Formats", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), November 19, 2019,

S. Kim, A. Sim, K. Wu, S. Byna, T. Wang, Y. Son, H. Eom, "DCA-IO: A Dynamic I/O Control Scheme for Parallel and Distributed File System", 19th Annual IEEE/ACM International Symposium in Cluster, Cloud, and Grid Computing (CCGrid 2019), 2019, doi: 10.1109/CCGRID.2019.00049

Teng Wang, Suren Byna, Glenn Lockwood, Philip Carns, Shane Snyder, Sunggon Kim, Nicholas Wright, "A Zoom-in Analysis of I/O Logs to Detect Root Causes of I/O Performance Bottlenecks", IEEE/ACM CCGrid 2019, May 14, 2019,

Tonglin Li, Quincey Koziol, Houjun Tang, Jialin Liu, Suren Byna, "I/O Performance Analysis of Science Applications Using HDF5 File-level Provenance", Cray User Group (CUG) 2019, May 10, 2019,

Jingqing Mu, Jerome Soumagne, Suren Byna, Quincey Koziol, Houjun Tang, Richard Warren, "Interfacing HDF5 with A Scalable Object-centric Storage System on Hierarchical Storage", Cray User Group (CUG) 2019, May 7, 2019,

Babak Behzad, Suren Byna, Prabhat, and Marc Snir, "Optimizing I/O Performance of HPC Applications with Autotuning", ACM Transactions on Parallel Computing (TOPC), February 28, 2019,

Bin Dong, Kesheng Wu, Suren Byna, Houjun Tang, "SLOPE: Structural Locality-Aware Programming Model for Composing Array Data Analysis", International Conference on High Performance Computing, January 1, 2019, 61--80,

Bin Dong, Patrick Kilian, Xiaocan Li, Fan Guo, Suren Byna, Kesheng Wu, "Terabyte-scale Particle Data Analysis: An ArrayUDF Case Study", Proceedings of the 31st International Conference on Scientific and Statistical Database Management, January 1, 2019, 202--205,

Beytullah Yildiz, Kesheng Wu, Suren Byna, Arie Shoshani, "Parallel membership queries on very large scientific data sets using bitmap indexes", Concurrency and Computation: Practice and Experience, January 1, 2019, 31:e5157,

Many scientific applications produce very large amounts of data as advances in hardware fuel computing and experimental facilities. Managing and analyzing massive quantities of scientific data is challenging as data are often stored in specific formatted files, such as HDF5 and NetCDF, which do not offer appropriate search capabilities. In this research, we investigated a special class of search capability, called membership query, to identify whether queried elements of a set are members of an attribute. Attributes that naturally have classification values appear frequently in scientific domains such as category and object type as well as in daily life such as zip code and occupation. Because classification attribute values are discrete and require random data access, performing a membership query on a large scientific data set creates challenges. We applied bitmap indexing and parallelization to membership queries to overcome these challenges. Bitmap indexing provides high performance not only for low cardinality attributes but also for high cardinality attributes, such as floating‐point variables, electric charge, or momentum in a particle physics data set, due to compression algorithms such as Word‐Aligned Hybrid. We conducted experiments, in a highly parallelized environment, on data obtained from a particle accelerator model and a synthetic data set.

2018

Suren Byna, Quincey Koziol, Venkatram Vishwanath, Jerome Soumagne, Houjun Tang, Kimmy Mu, Richard Warren, François Tessier, Bin Dong, Teng Wang, and Jialin Liu, Proactive Data Containers (PDC): An object-centric data store for large-scale computing systems, AGU Fall Meeting, December 13, 2018,

Glenn Lockwood, Shane Snyder, Teng Wang, Suren Byna, Phil Carns, and Nicholas Wright, "A Year in the Life of a Parallel File System", International Conference for High Performance Computing, Networking, and Storage (SC'18), IEEE / ACM, November 15, 2018,

Fahim Chowdhury, Jialin Liu, Quincey Koziol, Thorsten Kurth, Steven Farrell, Suren Byna, Prabhat, Weikuan Yu,, Initial Characterization of I/O in Large-Scale Deep Learning Applications, 3rd Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems (PDSW-DISCS), November 13, 2018,

Wei Zhang, Houjun Tang, Suren Byna, Yong Chen, "DART: Distributed Adaptive Radix Tree for Efficient Affix-based Keyword Search on HPC Systems", Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, November 1, 2018, 24,

Teng Wang, Suren Byna, Glenn Lockwood, Nicholas Wright, Phil Carns, and Shane Snyder,, "IOMiner: Large-scale Analytics Framework for Gaining Knowledge from I/O Logs", IEEE Cluster 2018, September 10, 2018,

Teng Wang, Suren Byna, Bin Dong, and Houjun Tang, "UniviStor: Integrated Hierarchical and Distributed Storage for HPC", IEEE Cluster 2018., September 1, 2018,

Houjun Tang, Suren Byna, Francois Tessier, Teng Wang, Bin Dong, Jingqing Mu, Quincey Koziol, Jerome Soumagne, Venkatram Vishwanath, Jialin Liu, and Richard Warren, "Toward Scalable and Asynchronous Object-centric Data Management for HPC", 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) 2018, May 1, 2018,

Haoyuan Xing, Sofoklis Floratos, Spyros Blanas, Suren Byna, Prabhat, Kesheng Wu, and Paul Brown,, "ArrayBridge: Interweaving declarative array processing with imperative high-performance computing", 34th IEEE International Conference on Data Engineering (ICDE) 2018, April 17, 2018,

Bharti Wadhwa, Suren Byna, Ali R. Butt, "Toward Transparent Data Management in Multi-layer Storage Hierarchy for HPC Systems", IEEE International Conference on Cloud Engineering 2018 (IC2E 2018), April 17, 2018,

Haoyuan Xing, Sofoklis Floratos, Spyros Blanas, Suren Byna, M Prabhat, Kesheng Wu, Paul Brown, "ArrayBridge: Interweaving declarative array processing in SciDB with imperative HDF5-based programs", 2018 IEEE 34th International Conference on Data Engineering (ICDE), 2018, 977--988,

Bin Dong, Teng Wang, Houjun Tang, Quincey Koziol, Kesheng Wu, Suren Byna, "ARCHIE: Data analysis acceleration with array caching in hierarchical storage", 2018 IEEE International Conference on Big Data (Big Data), January 1, 2018, 211--220,

2017

Glenn Lockwood, Shane Snyder, Wucherl Yoo, Kevin Harms, Zachary Nault, Suren Byna, Philip Carns, Nicholas Wright, "UMAMI: A Recipe for Generating Meaningful Metrics through Holistic I/O Performance Analysis", 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS), 2017 (Held in conjunction with SC17), November 14, 2017,

Houjun Tang, Suren Byna, Bin Dong, Jialin Liu, and Quincey Koziol, "SoMeta: Scalable Object-centric Metadata Management for High Performance Computing", IEEE Cluster 2017, September 5, 2017,

Suren Byna, Mohamad Chaarawi, Quincey Koziol, John Mainzer, and Frank Willmore,, "Tuning HDF5 subfiling performance on parallel file systems", Cray User Group (CUG) meeting 2017, May 10, 2017,

Cong Xu, Shane Snyder, Omkar Kulkarni, Vishwanath Venkatesan, Philip Carns, Suren Byna, Robert Sisneros, and Kalyana Chadalavada,, "DXT: Darshan eXtended Tracing", Cray User Group (CUG) meeting 2017, May 10, 2017,

Bin Dong, Kesheng Wu, Surendra Byna, Jialin Liu, Weijie Zhao, Florin Rusu, "ArrayUDF: User-defined scientific data analysis on arrays", Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, January 1, 2017, 53--64,

2016

M. Bryson, S. Byna (Advisor), A. Sim (Advisor), K. Wu (Advisor), "The Search for Missing Parallel IO Performance on the Cori Supercomputer", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), ACM Student Research Competition (SRC), 2016,

Bin Dong, Surendra Byna, Kesheng Wu, "SDS-Sort: Scalable Dynamic Skew-aware Parallel", HPDC 16, New York, NY, USA, ACM, 2016, 57--68, doi: 10.1145/2907294.2907300

Md. Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Narayanan Sundaram, Jialin Liu, Peter Sadowski, Evan Racah, Suren Byna, Craig Tull, Wahid Bhimji, Prabhat, and Pradeep Dubey,, "PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures", 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS) 2016, Chicago, May 23, 2016,

Wahid Bhimji, Debbie Bard, Melissa Romanus, David Paul, Andrey Ovsyannikov, Brian Friesen, Matt Bryson, Joaquin Correa, Glenn K. Lockwood, Vakho Tsulaia, Suren Byna, Steve Farrell, Doga Gursoy, Chris Daley, Vince Beckner, Brian Van Straalen, Nicholas Wright, Katie Antypas, Prabhat,, "Accelerating Science with the NERSC Burst Buffer Early User Program", Cray User Group (CUG) 2016, May 10, 2016,

Cong Xu, Suren Byna, Vishwanath Venkatesan, Robert Sisneros, Omkar Kulkarni, Mohamad Chaarawi, and Kalyana Chadalavada, "LIOProf: Exposing Lustre File System Behavior for I/O Middleware", Cray User Group (CUG) 2016, May 10, 2016,

Dharshi Devendran, Suren Byna, Bin Dong, Brian van Straalen, Hans Johansen, Noel Keen, and Nagiza Samatova,, "Collective I/O Optimizations for Adaptive Mesh Refinement Data Writes on Lustre File System", Cray User Group (CUG) 2016, May 10, 2016,

Harinarayan Krishnan, Burlen Loring, Suren Byna, Michael F. Wehner, Travis A. O'Brien, Prabhat, Chris Paciorek, and Daithi Stone, "Enabling End-to-End Climate Science Workflows in High Performance Computing Environments", The AMS (American Meteorological Society) 96th Annual Meeting, January 6, 2016,

Burlen Loring, Suren Byna, Prabhat, Junmin Gu, Hari Krishnan, Michael Wehner, and Oliver Ruebel, "TECA an Extreme Event Detection and Climate Analysis Package for High Performance Computing", The AMS (American Meteorological Society) 96th Annual Meeting, January 6, 2016,

Xiaocheng Zou, David A Boyuka II, Dhara Desai, Daniel F Martin, Suren Byna, Kesheng Wu, "AMR-aware in situ indexing and scalable querying", Proceedings of the 24th High Performance Computing Symposium, January 1, 2016, 26,

Bin Dong, Surendra Byna, Kesheng Wu, "Sds-sort: Scalable dynamic skew-aware parallel sorting", Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, January 1, 2016, 57--68,

Houjun Tang, Suren Byna, Steve Harenberg, Xiaocheng Zou, Wenzhao Zhang, Kesheng Wu, Bin Dong, Oliver Rubel, Kristofer Bouchard, Scott Klasky, others, "Usage pattern-driven dynamic data layout reorganization", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), January 1, 2016, 356--365,

Wenzhao Zhang, Houjun Tang, Steve Harenberg, Surendra Byna, Xiaocheng Zou, Dharshi Devendran, Daniel F Martin, Kesheng Wu, Bin Dong, Scott Klasky, others, "Amrzone: A runtime amr data sharing framework for scientific applications", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), January 1, 2016, 116--125,

Bin Dong, Suren Byna, Kesheng Wu, Hans Johansen, Jeffrey N Johnson, Noel Keen, others, "Data elevator: Low-contention data movement in hierarchical storage system", 2016 IEEE 23rd international conference on high performance computing (HiPC), January 1, 2016, 152--161,

Wenzhao Zhang, Houjun Tang, Stephen Ranshous, Surendra Byna, Daniel F Mart\ \in, Kesheng Wu, Bin Dong, Scott Klasky, Nagiza F Samatova, "Exploring memory hierarchy and network topology for runtime AMR data sharing across scientific applications", 2016 IEEE International Conference on Big Data (Big Data), January 1, 2016, 1359--1366,

2015

Hari Krishnan, Suren Byna, Michael Wehner, Junmin Gu, Travis O'Brien, Burlen Loring, Daithi Stone, William Collins, Prabhat, Yunjie Liu, Jeffrey Johnson, and Christopher Paciorek, "Enabling Efficient Climate Science Workflows in High Performance Computing Environments", AGU Fall Meeting, 2015, December 13, 2015,

Soyoung Jeon, Prabhat, Suren Byna, Junmin Gu, William Collins, and Michael Wehner,, "Characterization of extreme precipitation within atmospheric river events over California", Advances in Statistical Climatology, Meteorology and Oceanography (ASCMO), November 21, 2015, 1:45-57, doi: 10.5194/ascmo-1-45-2015

Md. Mostofa Ali Patwary, Suren Byna, Nadathur Rajagopalan Satish, Narayanan Sundaram, Zarija Lukic, Vadim Roytershteyn, Michael J. Anderson, Yushu Yao, Mr Prabhat, and Pradeep Dubey, "BD-CATS: Big Data Clustering at Trillion Particle Scale", Supercomputing 2015 (SC15), Supercomputing 2015 (SC15), November 17, 2015,

Babak Behzad, Suren Byna, Prabhat and Marc Snir, "Pattern-driven Parallel I/O Tuning", 10th Parallel Data Storage Workshop (PDSW) 2015, held in conjunction with SC15, 10th Parallel Data Storage Workshop (PDSW) 2015, to be held in conjunction with SC15, November 16, 2015,

Shane Snyder, Philip Carns, Robert Latham, Misbah Mubarak, Chris Carothers, Babak Behzad, Huong Vu Thanh Luu, Suren Byna, and Prabhat, "Techniques for Modeling Large-scale HPC I/O Workloads", the 6th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS15), in conjunction with SC15, the 6th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performa, November 15, 2015,

Jinoh Kim, Bin Dong, Suren Byna, and Kesheng Wu, "Security for the Scientific Data Service Framework", 2nd International Workshop on Privacy and Security of Big Data (PSBD 2015), in conjunction with IEEE BigData 2015, 2015,

Prabhat, Suren Byna, Venkat Vishwanath, Eli Dart, Michael Wehner, and William Collins,, "TECA: Petscale Pattern Recognition for Climate Science", 16th International Conference on Computer Analysis of Images and Patterns (CAIP) 2015, 2015,

Babak Behzad, Suren Byna, Stefan Wild, Prabhat and Marc Snir, "Dynamic Model-driven Parallel I/O Performance Tuning", IEEE Cluster 2015, 2015,

Xiaocheng (Chris) Zou, Suren Byna, Hans Johansen, Daniel Martin, Nagiza F. Samatova, Arie Shoshani, John Wu, "Six-fold Speedup of Ice Calving Detection Achieved by AMR-aware Parallel Connected Component Labeling", SciDAC PI Meeting, July 2015, 2015,

H. Luu, M. Winslett, W. Gropp, R. Ross, P. Carns, K. Harms, Prabhat, S. Byna, Y. Yao,, "A Multi-platform Study of I/O Behavior on Petascale Supercomputers", The 24th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC) 2015, 2015,

Xiaocheng Zou, Kesheng Wu, David A. Boyuka, Daniel F. Martin, Suren Byna, Houjun, Kushal Bansal, Terry J. Ligocki, Hans Johansen, and Nagiza F. Samatova, "Parallel In Situ Detection of Connected Components Adaptive Mesh Refinement Data", Proceedings of the Cluster, Cloud and Grid Computing (CCGrid) 2015, 2015,

Suren Byna, Robert Sisneros, Kalyana Chadalavada, Quincey Koziol, "Tuning Parallel I/O on Blue Waters for Writing 10 Trillion Particles", Cray User Group (CUG) meeting 2015, 2015,

Suren Byna, Brian Austin, "Evaluation of Parallel I/O Performance and Energy Consumption with Frequency Scaling on Cray XC30", Cray User Group (CUG) meeting 2015, 2015,

Bin Dong, Surendra Byna, Kesheng Wu, "Heavy-tailed distribution of parallel I/O system response time", Proceedings of the 10th Parallel Data Storage Workshop, 2015, 37--42,

Bin Dong, Surendra Byna, Kesheng Wu, "Spatially clustered join on heterogeneous scientific data sets", 2015 IEEE International Conference on Big Data (Big Data), 2015, 371--380,

2014

Soyoung Jeon, Christopher Paciorek, Prabhat, Surendra Byna, William Collins, Michael Wehner, "Uncertainty Quantification for Characterizing Spatial Tail Dependence under Statistical Framework", AGU, Fall Meeting 2014, 2014,

Babak Behzad, Surendra Byna, Stefan M. Wild, Mr. Prabhat, Marc Snir, "Improving Parallel I/O Autotuning with Performance Modeling", ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC 2014), New York, NY, USA, ACM, 2014, 253--256, doi: 10.1145/2600212.2600708

M Scot Breitenfeld, Kalyana Chadalavada, Robert Sisneros, Surendra Byna, Quincey Koziol, Neil Fortner, Prabhat, Venkat Vishwanath, "Recent Progress in Tuning Performance of Large-scale I/O with Parallel HDF5", The 9th Parallel Data Storage Workshop (PDSW) held in conjunction with SC14, 2014,

Bin Dong, Surendra Byna, Kesheng Wu, "Parallel query evaluation as a Scientific Data Service", 2014 IEEE International Conference on Cluster Computing (CLUSTER), January 1, 2014, 194--202,

Spyros Blanas, Kesheng Wu, Surendra Byna, Bin Dong, Arie Shoshani, "Parallel Data Analysis Directly on Scientific File Formats", SIGMOD 14, 2014, 385--396, doi: 10.1145/2588555.2612185

Hsuan-Te Chiu, Jerry Chou, Venkat Vishwanath, Surendra Byna, Kesheng Wu, "Simplifying index file structure to improve I/O performance of parallel indexing", 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS), 2014, 576--583,

Ted Habermann, Andrew Collette, Steve Vincena, Jay Jay Billings, Matt Gerring, Konrad Hinsen, Werner Benger, Filipe RNC Maia, Suren Byna, Pierre de Buyl, "The Hierarchical Data Format (HDF): A Foundation for Sustainable Data and Software", 2nd Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2), in conjunction with Supercomputing 2014 (SC14), 2014,

Jialin Liu, Surendra Byna, Bin Dong, Kesheng Wu, Yong Chen, "Model-driven data layout selection for improving read performance", 2014 IEEE International Parallel \& Distributed Processing Symposium Workshops, 2014, 1708--1716,

Jialin Liu, S. Byna, Bin Dong, Kesheng Wu, Chen, "Model-Driven Data Layout Selection for Improving Read", Parallel Distributed Processing Symposium Workshops 2014 IEEE International, 2014, 1708--1716, doi: 10.1109/IPDPSW.2014.190

2013

Babak Behzad, Huong Vu Thanh Luu, Joseph Huchette, Surendra Byna, Prabhat, Ruth Aydt, Quincey Koziol, and Marc Snir, "Taming parallel I/O complexity with auto-tuning", In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '13), 2013,

Babak Behzad, Joseph Huchette, Huong Vu Thanh Luu, Ruth Aydt, Surendra Byna, Yushu Yao, Quincey Koziol, and Prabhat, "A framework for auto-tuning HDF5 applications", Proceedings of the 22nd international symposium on High-performance parallel and distributed computing (HPDC), 2013,

Bin Dong, Surendra Byna, Kesheng Wu, "Expediting scientific data analysis with reorganization of data", 2013 IEEE International Conference on Cluster Computing (CLUSTER), January 1, 2013, 1--8,

E Wes Bethel, Prabhat Prabhat, Suren Byna, Oliver R\ ubel, K John Wu, Michael Wehner, "Why high performance visual data analytics is both relevant and difficult", Visualization and Data Analysis 2013, January 2013, 8654:86540B, LBNL LBNL-6063E,

Bin Dong, Surendra Byna, Kesheng Wu, "SDS: a framework for scientific data services", Proceedings of the 8th Parallel Data Storage Workshop, January 1, 2013, 27--32,

Kuan-Wu Lin, Surendra Byna, Jerry Chou, Wu, "Optimizing FastQuery performance on Lustre file", Proceedings of the 25th International Conference on and Statistical Database Management, 2013, 29,

2012

Babak Behzad, Joey Huchette, Huong Luu, Ruth Aydt, Quincey Koziol, Prabhat, Suren Byna, Mohamad Chaarawi, Yushu Yao, "Auto-Tuning of Parallel IO Parameters for HDF5 Applications", Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, 2012,

Y. Yin, S. Byna, H. Song, X.-H. Sun, and R. Thakur, "Boosting Application-Specific Parallel I/O Optimization Using IOSIG", IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Ottowa, Canada, May 13, 2012,

E. W. Bethel, Surendra Byna, Jerry Chou, Cormier-Michel, Cameron G. R. Geddes, Howison, Fuyu Li, Prabhat, Ji Qiang, R\ ubel, Rob D. Ryne, Michael Wehner, Wu, "Big Data Analysis and Visualization: What Do LINACS Tropical Storms Have In Common?", 11th International Computational Accelerator Physics ICAP 2012, Germany, 2012,

Surendra Byna, Jerry Chou, Oliver Rubel, Homa Karimabadi, William S Daughter, Vadim Roytershteyn, E Wes Bethel, Mark Howison, Ke-Jou Hsu, Kuan-Wu Lin, others, "Parallel I/O, analysis, and visualization of a trillion particle simulation", SC 12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, January 2012, 1--12,

Oliver R\ ubel, Surendra Byna, Kesheng Wu, Fuyu Li, Michael Wehner, Wes Bethel, others, "Teca: A parallel toolkit for extreme climate analysis", Procedia Computer Science, Elsevier, January 2012, 9:866--876, LBNL 5352E,

We present TECA, a parallel toolkit for detecting extreme events in large climate datasets. Modern climate datasets expose parallelism across a number of dimensions: spatial locations, timesteps and ensemble members. We design TECA to exploit these modes of parallelism and demonstrate a prototype implementation for detecting and tracking three classes of extreme events: tropical cyclones, extra-tropical cyclones and atmospheric rivers. We process a modern TB-sized CAM5 simulation dataset with TECA, and demonstrate good runtime performance for the three case studies.

2011

Mehmet Balman, Suredra Byna, "Open Problems in network-aware data management in exa-scale computing and terabit networking era", In Proceedings of the First international Workshop on Network-Aware Data Management, in conjunction with ACM/IEEE international Conference For High Performance Computing, Networking, Storage and Analysis, 2011, Seattle, WA, November 11, 2011, LBNL 6176E, doi: http://dx.doi.org/10.1145/2110217.2110229

Accessing and managing large amounts of data is a great challenge in collaborative computing environments where resources and users are geographically distributed. Recent advances in network technology led to next-generation high- performance networks, allowing high-bandwidth connectiv- ity. Efficient use of the network infrastructure is necessary in order to address the increasing data and compute require- ments of large-scale applications. We discuss several open problems, evaluate emerging trends, and articulate our per- spectives in network-aware data management. 

Surendra Byna, Michael F Wehner, Kesheng John Wu, "Detecting atmospheric rivers in large climate datasets", Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities, 2011, 7--14,

Extreme precipitation events on the western coast of North America are often traced to an unusual weather phenomenon known as atmospheric rivers. Although these storms may provide a significant fraction of the total water to the highly managed western US hydrological system, the resulting intense weather poses severe risks to the human and natural infrastructure through severe flooding and wind damage. To aid the understanding of this phenomenon, we have developed an efficient detection algorithm suitable for analyzing large amounts of data. In addition to detecting actual events in the recent observed historical record, this detection algorithm can be applied to global climate model output providing a new model validation methodology. Comparing the statistical behavior of simulated atmospheric river events in models to observations will enhance confidence in projections of future extreme storms. Our detection algorithm is based on a thresholding condition on the total column integrated water vapor established by Ralph et al. (2004) followed by a connected component labeling procedure to group the mesh points into connected regions in space. We develop an efficient parallel implementation of the algorithm and demonstrate good weak and strong scaling. We process a 30-year simulation output on 10,000 cores in under 3 seconds.

M Prabhat, S Byna, C Paciorek, G Weber, K Wu, T Yopes, MF Wehner, G Ostrouchov, D Pugmire, R Strelitz, others, "Pattern Detection and Extreme Value Analysis on Large Climate Data", AGUFM, Pages: IN41C--03 January 2011,

Kesheng Wu, Surendra Byna, Doron Rotem, Arie, "Scientific Data Services -- A High-Performance I/O with Array Semantics", HPCDB, IEEE, 2011, doi: 10.11v45/2125636.2125640

Henry Childs

2012

Allen R Sanderson, Brad Whitlock, H Childs, GH Weber, K Wu, others, "A system for query based analysis and visualization", January 2012, LBNL 5507E,

2010

Oliver R\ ubel, Sean Ahern, E Wes Bethel, Mark D Biggin, Hank Childs, Estelle Cormier-Michel, Angela DePace, Michael B Eisen, Charless C Fowlkes, Cameron GR Geddes, others, "Coupling visualization and data analysis for knowledge discovery from multi-dimensional scientific data", Procedia computer science, Elsevier, January 2010, 1:1757--1764, LBNL 3669E,

Gunther Weber, "Recent advances in visit: Amr streamlines and query-driven visualization", 2010,

Oliver Rübel, Sean Ahern, E. Wes Bethel, D. Biggin, Hank Childs, Estelle, Angela DePace, Michael B. Eisen Charless C. Fowlkes, Cameron G. R. Geddes, Hagen, Bernd Hamann, Min-Yu Huang, Soile E. Keränen, David W. Knowles, Cris L. Hendriks, Jitendra Malik, Jeremy Meredith Peter Messmer, Prabhat, Daniela Ushizima, H. Weber, Kesheng Wu, "Coupling visualization and data analysis for knowledge from multi-dimensional scientific data", Procedia Computer Science, 2010, 1:1751--1758, doi: 10.1016/j.procs.2010.04.197

2009

E. W. Bethel, C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, M. Day, E. Deines, T. Fogal, C. Garth, C. G. R. Geddes, H. Hagen, B. Hamann, C. Hansen, J. Jacobsen, K. Joy, J. Kruger, J. Meredith, P. Messmer, G. Ostrouchov, V. Pascucci, K. Potter, Prabhat, D. Pugmire, O. Rubel, A. Sanderson, C. Silva, D. Ushizima, G. Weber, B. Whitlock, K. Wu, "Occam's Razor and Petascale Visual Data Analysis", SciDAC 2009, J. of Physics: Conference Series, San Diego, California, July 2009, LBNL 2210E,

K Wu, S Ahern, EW Bethel, J Chen, H Childs, C Geddes, J Gu, H Hagen, B Hamann, J Lauret, others, "FastBit: Interactively Searching Massive Data", Proc. of SciDAC 2009, 2009, LBNL 2164E,

E Bethel, "Modern Scientific Visualization is More than Just Pretty Pictures", January 2009, LBNL 1450E,

2008

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "High Performance Multivariate Visual Data Exploration for Extemely Large Data", Supercomputing (SC), Austin, Texas, USA, November 2008, LBNL 716E,

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "Application of High-performance Visual Analysis Methods to Laser Wakefield Particle Acceleration Data", IEEE Visualization 2008, October 2008,

Shreyas Cholia

2020

D. Bard, C. Snavely, L. Gerhardt, J. Lee, B. Totzke, K. Antypas, S. Byna, R. Cheema, S. Cholia, M. Day, B. Enders, A. Gaur, A. Greiner, T. Groves, M. Kiran, Q. Koziol, K. Rowland, C. Samuel, A. Selvarajan, A. Sim, D. Skinner, R. Thomas, G. Torok, The Superfacility project: automated pipelines for experiments and HPC, International Conference for High Performance Computing, Networking, Storage, and Analysis (SC20), State of the Practice (SOP), 2020,

B. Enders, D. Bard, C. Snavely, L. Gerhardt, J. Lee, B. Totzke, K. Antypas, S. Byna, R. Cheema, S. Cholia, M. Day, A. Gaur, A. Greiner, T. Groves, M. Kiran, Q. Koziol, K. Rowland, C. Samuel, A. Selvarajan, A. Sim, D. Skinner, R. Thomas, G. Torok, "Cross-facility science with the Superfacility Project at LBNL", 2nd Workshop on Large-scale Experiment-in-the-Loop Computing (XLOOP 2020), in conjunction with the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC 20), 2020,

Marcus S. Day

2009

E. W. Bethel, C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, M. Day, E. Deines, T. Fogal, C. Garth, C. G. R. Geddes, H. Hagen, B. Hamann, C. Hansen, J. Jacobsen, K. Joy, J. Kruger, J. Meredith, P. Messmer, G. Ostrouchov, V. Pascucci, K. Potter, Prabhat, D. Pugmire, O. Rubel, A. Sanderson, C. Silva, D. Ushizima, G. Weber, B. Whitlock, K. Wu, "Occam's Razor and Petascale Visual Data Analysis", SciDAC 2009, J. of Physics: Conference Series, San Diego, California, July 2009, LBNL 2210E,

Dharshi Devendran

2016

Dharshi Devendran, Suren Byna, Bin Dong, Brian van Straalen, Hans Johansen, Noel Keen, and Nagiza Samatova,, "Collective I/O Optimizations for Adaptive Mesh Refinement Data Writes on Lustre File System", Cray User Group (CUG) 2016, May 10, 2016,

Xiaocheng Zou, David A Boyuka II, Dhara Desai, Daniel F Martin, Suren Byna, Kesheng Wu, "AMR-aware in situ indexing and scalable querying", Proceedings of the 24th High Performance Computing Symposium, January 1, 2016, 26,

Bin Dong

2020

Jonathan Blair Ajo-Franklin, Ver\ onica Rodr\ \iguez Tribaldos, Avinash Nayak, Nathaniel J Lindsey, Feng Cheng, Benxin Chi, Bin Dong, Kesheng Wu, Inder Monga, Distributed Acoustic Sensing (DAS) at the Plot to Basin Scale: Connecting Near-Surface Sensing and Seismology with a Common Observational Tool, AGU Fall Meeting 2020, 2020,

Bin Dong, Ver\ onica Rodr\ \iguez Tribaldos, Xin Xing, Suren Byna, Jonathan Ajo-Franklin, Kesheng Wu, "DASSA: Parallel DAS Data Storage and Analysis for Subsurface Event Detection", 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), July 14, 2020, 254--263,

Suren Byna, M. Scot Breitenfeld, Bin Dong, Quincey Koziol, Elena Pourmal, Dana Robinson, Jerome Soumagne, Houjun Tang, Venkatram Vishwanath, and Richard Warren, "ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems", Journal of Computer Science and Technology 2020, 35(1): 145-160, February 2, 2020, doi: 10.1007/s11390-020-9822-9

2019

Richard Warren, Jerome Soumagne, Jingqing Mu, Houjun Tang, Suren Byna, Bin Dong, Quincey Koziol, "Analysis in the Data Path of an Object-centric Data Management System", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 18, 2019,

Houjun Tang, Suren Byna, Stephen Bailey, Zarija Lukic, Jialin Liu, Quincey Koziol, Bin Dong, "Tuning Object-centric Data Management Systems for Large Scale Scientific Applications", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 18, 2019,

Bin Dong, Kesheng Wu, Suren Byna, Houjun Tang, "SLOPE: Structural Locality-Aware Programming Model for Composing Array Data Analysis", International Conference on High Performance Computing, January 1, 2019, 61--80,

Bin Dong, Patrick Kilian, Xiaocan Li, Fan Guo, Suren Byna, Kesheng Wu, "Terabyte-scale Particle Data Analysis: An ArrayUDF Case Study", Proceedings of the 31st International Conference on Scientific and Statistical Database Management, January 1, 2019, 202--205,

2018

Suren Byna, Quincey Koziol, Venkatram Vishwanath, Jerome Soumagne, Houjun Tang, Kimmy Mu, Richard Warren, François Tessier, Bin Dong, Teng Wang, and Jialin Liu, Proactive Data Containers (PDC): An object-centric data store for large-scale computing systems, AGU Fall Meeting, December 13, 2018,

Teng Wang, Suren Byna, Bin Dong, and Houjun Tang, "UniviStor: Integrated Hierarchical and Distributed Storage for HPC", IEEE Cluster 2018., September 1, 2018,

Houjun Tang, Suren Byna, Francois Tessier, Teng Wang, Bin Dong, Jingqing Mu, Quincey Koziol, Jerome Soumagne, Venkatram Vishwanath, Jialin Liu, and Richard Warren, "Toward Scalable and Asynchronous Object-centric Data Management for HPC", 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) 2018, May 1, 2018,

Bin Dong, Teng Wang, Houjun Tang, Quincey Koziol, Kesheng Wu, Suren Byna, "ARCHIE: Data analysis acceleration with array caching in hierarchical storage", 2018 IEEE International Conference on Big Data (Big Data), January 1, 2018, 211--220,

Xin Xing, Bin Dong, Jonathan Ajo-Franklin, Kesheng Wu, "Automated Parallel Data Processing Engine with Application to Large-Scale Feature Extraction", 2018 IEEE/ACM Machine Learning in HPC Environments (MLHPC), January 1, 2018, 37--46,

Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, Anna YQ Ho, Peter Nugent, "Distributed caching for processing raw arrays", Proceedings of the 30th International Conference on Scientific and Statistical Database Management, 2018, 1--12,

2017

Houjun Tang, Suren Byna, Bin Dong, Jialin Liu, and Quincey Koziol, "SoMeta: Scalable Object-centric Metadata Management for High Performance Computing", IEEE Cluster 2017, September 5, 2017,

Bin Dong, Kesheng Wu, Surendra Byna, Jialin Liu, Weijie Zhao, Florin Rusu, "ArrayUDF: User-defined scientific data analysis on arrays", Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, January 1, 2017, 53--64,

Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, Peter Nugent, "Incremental view maintenance over array data", Proceedings of the 2017 ACM International Conference on Management of Data, January 1, 2017, 139--154,

Tzuhsien Wu, Jerry Chou, Shyng Hao, Bin Dong, Scott Klasky, Kesheng Wu, "Optimizing the query performance of block index through data analysis and I/O modeling", Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, January 1, 2017, 1--10,

2016

Bin Dong, Surendra Byna, Kesheng Wu, "SDS-Sort: Scalable Dynamic Skew-aware Parallel", HPDC 16, New York, NY, USA, ACM, 2016, 57--68, doi: 10.1145/2907294.2907300

Xiaocheng Zou, David A Boyuka II, Dhara Desai, Daniel F Martin, Suren Byna, Kesheng Wu, "AMR-aware in situ indexing and scalable querying", Proceedings of the 24th High Performance Computing Symposium, January 1, 2016, 26,

Bin Dong, Surendra Byna, Kesheng Wu, "Sds-sort: Scalable dynamic skew-aware parallel sorting", Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, January 1, 2016, 57--68,

Houjun Tang, Suren Byna, Steve Harenberg, Xiaocheng Zou, Wenzhao Zhang, Kesheng Wu, Bin Dong, Oliver Rubel, Kristofer Bouchard, Scott Klasky, others, "Usage pattern-driven dynamic data layout reorganization", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), January 1, 2016, 356--365,

Wenzhao Zhang, Houjun Tang, Steve Harenberg, Surendra Byna, Xiaocheng Zou, Dharshi Devendran, Daniel F Martin, Kesheng Wu, Bin Dong, Scott Klasky, others, "Amrzone: A runtime amr data sharing framework for scientific applications", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), January 1, 2016, 116--125,

Bin Dong, Suren Byna, Kesheng Wu, Hans Johansen, Jeffrey N Johnson, Noel Keen, others, "Data elevator: Low-contention data movement in hierarchical storage system", 2016 IEEE 23rd international conference on high performance computing (HiPC), January 1, 2016, 152--161,

Tzuhsien Wu, Hao Shyng, Jerry Chou, Bin Dong, Kesheng Wu, "Indexing blocks to reduce space and time requirements for searching large data files", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), January 1, 2016, 398--402,

Wenzhao Zhang, Houjun Tang, Stephen Ranshous, Surendra Byna, Daniel F Mart\ \in, Kesheng Wu, Bin Dong, Scott Klasky, Nagiza F Samatova, "Exploring memory hierarchy and network topology for runtime AMR data sharing across scientific applications", 2016 IEEE International Conference on Big Data (Big Data), January 1, 2016, 1359--1366,

2015

Jinoh Kim, Bin Dong, Suren Byna, and Kesheng Wu, "Security for the Scientific Data Service Framework", 2nd International Workshop on Privacy and Security of Big Data (PSBD 2015), in conjunction with IEEE BigData 2015, 2015,

Bin Dong, Surendra Byna, Kesheng Wu, "Heavy-tailed distribution of parallel I/O system response time", Proceedings of the 10th Parallel Data Storage Workshop, 2015, 37--42,

Bin Dong, Surendra Byna, Kesheng Wu, "Spatially clustered join on heterogeneous scientific data sets", 2015 IEEE International Conference on Big Data (Big Data), 2015, 371--380,

2014

Bin Dong, Xiuqiao Li, Limin Xiao, Li Ruan, "Towards minimizing disk I/O contention: A partitioned file assignment approach", Future Generation Computer Systems, Volume 37, July 2014, Pages 178-190, 2014,

Bin Dong, Surendra Byna, Kesheng Wu, "Parallel query evaluation as a Scientific Data Service", 2014 IEEE International Conference on Cluster Computing (CLUSTER), January 1, 2014, 194--202,

Jialin Liu, S. Byna, Bin Dong, Kesheng Wu, Chen, "Model-Driven Data Layout Selection for Improving Read", Parallel Distributed Processing Symposium Workshops 2014 IEEE International, 2014, 1708--1716, doi: 10.1109/IPDPSW.2014.190

2013

Bin Dong, Surendra Byna, Kesheng Wu, "Expediting scientific data analysis with reorganization of data", 2013 IEEE International Conference on Cluster Computing (CLUSTER), January 1, 2013, 1--8,

Bin Dong, Surendra Byna, Kesheng Wu, "SDS: a framework for scientific data services", Proceedings of the 8th Parallel Data Storage Workshop, January 1, 2013, 27--32,

2012

Bin Dong, Xiuqiao Li, Qimeng Wu, Limin Xiao, Li Ruan, "A dynamic and adaptive load balancing strategy for parallel file system with large-scale I/O servers", Journal of Parallel and Distributed Computing (JPDC), Volume 72, Issue 10, October 2012, Pages 1254-1268, 2012,

Bin Dong, Xiuqiao Li, Limin Xiao, Li Ruan, "A New File-Specific Stripe Size Selection Method for Highly Concurrent Data Access", The 13th ACM/IEEE International Conference on Grid Computing (Grid 2012), 2012, 2012,

Vincent A. Dumont

2020

Veronica Rodr\iguez Tribaldos, Nathaniel J Lindsey, Shan Dou, Craig Ulrich, Michelle Robertson, Bin Dong, Vincent Dumont, Kesheng Wu, Inder Monga, Chris Tracy, others, Combining Ambient Noise and Distributed Acoustic Sensing (DAS) Deployed on Dark Fiber Networks for High-resolution Imaging at the Basin Scale, AGU Fall Meeting 2020, 2020,

V. Dumont, V. Rodriguez Tribaldos, J. Ajo-Franklin, K. Wu, "Deep Learning for Surface Wave Identification in Distributed Acoustic Sensing Data", IEEE BigData 2020, December 8, 2020,

V. Dumont, V. Rodriguez Tribaldos, J. Ajo-Franklin, K. Wu, "Deep Learning on Real Geophysical Data: A Case Study for Distributed Acoustic Sensing Research", NeurIPS "Machine Learning and the Physical Sciences" workshop, 2020,

H. Masia-Roig, J. A. Smiga, D. Budker, V. Dumont, Z. Grujic, D. Kim, D. F. Jackson Kimball, V. Lebedev, M. Monroy, S. Pustelny, T. Scholtes, P. C. Segura, Y. K. Semertzidis, Y. Chang Shin, J. E. Stalnaker, I. Sulai, A. Weis, A. Wickenbrock, "Analysis method for detecting topological defect dark matter with a global magnetometer network", Physics of the Dark Universe, Volume 28, 100494, May 2020, doi: 10.1016/j.dark.2020.100494

M. R. Wilczynska, J. K. Webb, M. Bainbridge, S. E. I. Bosman, J. D. Barrow, R. F. Carswell, M. P. Dabrowski, V. Dumont, A. C. Leite, C. Lee, K. Leszczynska, J. Liske, K. Marosek, C. J. A. P. Martins, D. Milakovic, P. Molaro, L. Pasquini, "Four direct measurements of the fine-structure constant 13 billion years ago", Science Advances, Volume 6, No. 17, eaay9672, April 24, 2020, doi: 10.1126/sciadv.aay9672

Devarshi Ghoshal

2019

P. Linton, W. Melodia, A. Lazar, D. Agarwal, L. Bianchi, D. Ghoshal, K. Wu, G. Pastorello, L. Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), 2019,

Payton A Linton, William M Melodia, Alina Lazar, Deborah Agarwal, Ludovico Bianchi, Devarshi Ghoshal, Kesheng Wu, Gilberto Pastorello, Lavanya Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", 2019,

Junmin Gu

2018

Junmin Gu, Scott Klasky, Norbert Podhorszki, Ji Qiang, Kesheng Wu, "Querying large scientific data sets with adaptable IO system ADIOS", Asian Conference on Supercomputing Frontiers, 2018, 51--69,

2016

Burlen Loring, Suren Byna, Prabhat, Junmin Gu, Hari Krishnan, Michael Wehner, and Oliver Ruebel, "TECA an Extreme Event Detection and Climate Analysis Package for High Performance Computing", The AMS (American Meteorological Society) 96th Annual Meeting, January 6, 2016,

David Pugmire, James Kress, Jong Choi, Scott Klasky, Tahsin Kurc, Randy Michael Churchill, Matthew Wolf, Greg Eisenhower, Hank Childs, Kesheng Wu, others, "Visualization and analysis for near-real-time decision making in distributed workflows", 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2016, 1007--1013,

Utkarsh Ayachit, Andrew Bauer, Earl PN Duque, Greg Eisenhauer, Nicola Ferrier, Junmin Gu, Kenneth E Jansen, Burlen Loring, Zarija Lukic, Suresh Menon, others, "Performance analysis, design considerations, and applications of extreme-scale in situ infrastructures", SC 16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2016, 921--932, LBNL 1007264,

D. Pugmire, J. Kress, J. Choi, S. Klasky, Kurc, R. M. Churchill, M. Wolf, G., H. Childs, K. Wu, A. Sim, J. Gu, J. Low, "Visualization and Analysis for Near-Real-Time Decision in Distributed Workflows", 2016 IEEE International Parallel and Distributed Symposium Workshops (IPDPSW), 2016, 1007--1013, doi: 10.1109/IPDPSW.2016.175

2014

A. L. Chervenak, A. Sim, J. Gu, R. Schuler, N. Hirpathak, "Adaptation and Policy-Based Resource Allocation for Efficient Bulk Data Transfers in High Performance Computing Environments", 4th International Workshop on Network-aware Data Management (NDM'14), 2014,

A. L. Chervenak, A. Sim, J. Gu, R. Schuler, N. Hirpathak, "Efficient Data Staging Using Performance-Based Adaptation and Policy-Based Resource Allocation", 22nd Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2014,

2012

Junmin Gu, David Smith, Ann L. Chervenak, Alex Sim, "Adaptive Data Transfers that Utilize Policies for Resource Sharing", The 2nd International Workshop on Network-Aware Data Management Workshop (NDM2012), 2012,

D. Yu, D. Katramatos, A. Shoshani, A. Sim, J. Gu, V. Natarajan, "StorNet: Integrating Storage Resource Management with Dynamic Network Provisioning for Automated Data Transfer", International Committee for Future Accelerators (ICFA) Standing Committee on Inter-Regional Connectivity (SCIC) 2012 Report: Networking for High Energy Physics, 2012,

2011

J. Gu, D. Katramatos, X. Liu, V. Natarajan, A. Shoshani, A. Sim, D. Yu, S. Bradley, S. McKee, "StorNet: Integrated Dynamic Storage and Network Resource Provisioning and Management for Automated Data Transfers", Journal of Physics: Conf. Ser., 2011, 331, doi: 10.1088/1742- 6596/331/1/012002

G. Garzoglio, J. Bester, K. Chadwick, D. Dykstra, D. Groep, J. Gu, T. Hesselroth, O. Koeroo, T. Levshina, S. Martin, M. Salle, N. Sharma, A. Sim, S. Timm, A. Verstegen, "Adoption of a SAML-XACML Profile for Authorization Interoperability across Grid Middleware in OSG and EGEE", Journal of Physics: Conf. Ser., 2011, 331, doi: 10.1088/1742-6596/331/6/062011

Junmin Gu, Dimitrios Katramatos, Xin Liu, Vijaya Natarajan, Arie Shoshani, Alex Sim, Dantong Yu, Scott Bradley, Shawn McKee, "StorNet: Co-Scheduling of End-to-End Bandwidth Reservation on Storage and Network Systems for High Performance Data Transfers", IEEE INFOCOM HSN 2011, 2011,

Dean N. Williams, Ian T. Foster, Don E. Middleton, Rachana Ananthakrishnan, Neill Miller, Mehmet Balman, Junmin Gu, Vijaya Natarajan, Arie Shoshani, Alex Sim, Gavin Bell, Robert Drach, Michael Ganzberger, Jim Ahrens, Phil Jones, Daniel Crichton, Luca Cinquini, David Brown, Danielle Harper, Nathan Hook, Eric Nienhouse, Gary Strand, Hannah Wilcox, Nathan Wilhelmi, Stephan Zednik, Steve Hankin, Roland Schweitzer, John Harney, Ross Miller, Galen Shipman, Feiyi Wang, Peter Fox, Patrick West, Stephan Zednik, Ann Chervenak, Craig Ward, "Earth System Grid Center for Enabling Technologies (ESG-CET): A Data Infrastructure for Data-Intensive Climate Research", SciDAC Conference, 2011,

2009

M. Riedel, E. Laure, Th. Soddemann, L. Field, J. P. Navarro, J. Casey, M. Litmaath, J. Ph. Baud, B. Koblitz, C. Catlett, D. Skow, C. Zheng, P. M. Papadopoulos, M. Katz, N. Sharma, O. Smirnova, B. Kónya, P. Arzberger, F. Würthwein, A. S. Rana, T. Martin, M. Wan, V. Welch, T. Rimovsky, S. Newhouse, A. Vanni, Y. Tanaka, Y. Tanimura, T. Ikegami, D. Abramson, C. Enticott, G. Jenkins, R. Pordes, N. Sharma, S. Timm, N. Sharma, G. Moont, M. Aggarwal, D. Colling, O. van der Aa, A. Sim, V. Natarajan, A. Shoshani, J. Gu, S. Chen, G. Galang, R. Zappi, L. Magnoni, V. Ciaschini, M. Pace, V. Venturi, M. Marzolla, P. Andreetto, B. Cowles, S. Wang, Y. Saeki, H. Sato, S. Matsuoka, P. Uthayopas, S. Sriprayoonsakul, O. Koeroo, M. Viljoen, L. Pearlman, S. Pickles, David Wallom, G. Moloney, J. Lauret, J. Marsteller, P. Sheldon, S. Pathak, S. De Witt, J. Mencák, J. Jensen, M. Hodges, D. Ross, S. Phatanapherom, G. Netzer, A. R. Gregersen, M. Jones, S. Chen, P. Kacsuk, A. Streit, D. Mallmann, F. Wolf, T. Lippert, Th. Delaitre, E. Huedo, N. Geddes, "Interoperation of world-wide production e-Science infrastructures", Concurrency and Computation: Practice and Experience, 2009, 21(8):961-990,

Arie Shoshani, Flavia Donno, Junmin Gu, Jason Hick, Maarten Litmaath, Alex Sim, "Dynamic Storage Management", Scientific Data Management: Challenges, Technology, and Deployment, edited by Arie Shoshani, Doron Rotem, (Chapman & Hall/CRC Computational Science: 2009)

K Wu, S Ahern, EW Bethel, J Chen, H Childs, C Geddes, J Gu, H Hagen, B Hamann, J Lauret, others, "FastBit: Interactively Searching Massive Data", Proc. of SciDAC 2009, 2009, LBNL 2164E,

2008

P. Jakl, J. Lauret, A. Hanushevsky, A. Shoshani, A. Sim, J. Gu, "Grid data access on widely distributed worker nodes using scalla and SRM", Journal of Physics: Conf. Ser., 2008, 119, doi: 10.1088/1742-6596/119/7/072019

Alex Sim, Arie Shoshani (Editors), Paolo Badino, Olof Barring, Jean‐Philippe Baud, Ezio Corso, Shaun De Witt, Flavia Donno, Junmin Gu, Michael Haddox‐Schatz, Bryan Hess, Jens Jensen, Andy Kowalski, Maarten Litmaath, Luca Magnoni, Timur Perelmutov, Don Petravick, Chip Watson, The Storage Resource Manager Interface Specification Version 2.2, Open Grid Forum, Document in Full Recommendation, GFD.129, 2008,

2007

L. Abadie, P. Badino, J. Baud, E. Corso, M. Crawford, S. De Witt, F. Donno, A. Forti, P. Fuhrmann,
G. Grosdidier, J. Gu , J. Jensen, S. Lemaitre, M. Litmaath, D. Litvinsev, G. Lo Presti, L. Magnoni, T. Mkrtchan, A. Moibenko, V. Natarajan, G. Oleynik, T. Perelmutov, D. Petravick, A. Shoshani, A. Sim, M. Sponza, R. Zappi,
"Storage Resource Managers: Recent International Experience on Requirements and Multiple Co-Operating Implementations", the 24th IEEE Conference on Mass Storage Systems and Technologies, 2007,

F. Donno, L. Abadie, P. Badino, J. Baud, E. Corso, M. Crawford, S. De Witt, A. Forti, P. Fuhrmann, G. Grosdidier, J. Gu , J. Jensen, S. Lemaitre, M. Litmaath, D. Litvinsev, G. Lo Presti, L. Magnoni, T. Mkrtchan, A. Moibenko, V. Natarajan, G. Oleynik, T. Perelmutov, D. Petravick, A. Shoshani, A. Sim, M. Sponza, R. Zappi, "Storage Resource Manager version 2.2: design, implementation, and testing experience", Journal of Physics: Conf. Ser., 2007, 119, doi: 10.1088/1742-6596/119/6/062028

2005

Kesheng Wu, Junmin Gu, Jerome Lauret, Arthur Poskanzer, Arie Shoshani, Alexander Sim, Zhang, "Grid Collector: Facilitating Efficient Selective from Data Grids", International Supercomputer Conference 2005, 2005,

2004

Alex Sim, Junmin Gu, Arie Shoshani, Vijaya Natarajan, "DataMover: Robust Terabytes-Scale Multi-file Replication over Wide-Area Networks", the 16th International Conference on Scientific and Statistical Database Management (SSDBM 2004), 2004,

2003

Arie Shoshani, Alexander Sim, Junmin Gu, "Storage Resource Managers: Essential Components for the Grid", Grid Resource Management: State of the Art and Future Trends, edited by Jarek Nabrzyski, Jennifer M. Schopf, Jan Weglarz, (Kluwer Academic Publishers: 2003)

A. Sim, J. Gu, A. Shoshani, E. Hjort, D. Olson, "Experience with Deploying Storage Resource Managers to Achieve Robust File Replication", Computing in High Energy Physics, 2003,

Arie Shoshani, Alex Sim, Junmin Gu, Storage Resource Managers: Essential Components for Grid Applications, Globus World, 2003,

Kesheng Wu, Wei-Ming Zlang, Alexander Sim, Junmin Gu, Arie Shoshani, "Grid collector: An event catalog with automated file management", 2003 IEEE Nuclear Science Symposium. Conference Record (IEEE Cat. No. 03CH37515), 2003, LBNL 55563,

Kesheng Wu, Wei-Ming Zhang, Alexander Sim, Gu, Arie Shoshani, "Grid Collector: An Event Catalog With Automated File", Proceedings of IEEE Nuclear Science Symposium 2003, 2003, doi: 10.1109/NSSMIC.2003.1351830

2002

A. Shoshani, A. Sim, J. Gu, "Storage Resource Managers: Middleware components for Grid Storage", the 19th IEEE Symposium on Mass Storage Systems, 2002,

Ming Gu

2013

Kesheng Wu, Wes Bethel, Ming Gu, David, Oliver R\ ubel, Testing VPIN on Big Data, Available at SSRN 2318259, 2013,

UC Berkeley, William Gu, Jaesik Choi, Ming Gu, Horst Simon, Kesheng Wu, "Fast Change Point Detection for Electricity Market Analysis", January 1, 2013, LBNL LBNL-6388E,

Kesheng Wu, E Bethel, Ming Gu, David Leinweber, Oliver R\ ubel, "A big data approach to analyzing market volatility", Algorithmic Finance, 2013, 2:241--267, LBNL LBNL-6382E,

Understanding the microstructure of the financial market requires the processing of a vast amount of data related to individual trades, and sometimes even multiple levels of quotes. Analyzing such a large volume of data requires tremendous computing power that is not easily available to financial academics and regulators. Fortunately, public funded High Performance Computing (HPC) power is widely available at the National Laboratories in the US. In this paper we demonstrate that the HPC resource and the techniques for data-intensive sciences can be used to greatly accelerate the computation of an early warning indicator called Volume-synchronized Probability of Informed trading (VPIN). The test data used in this study contains five and a half year's worth of trading data for about 100 most liquid futures contracts, includes about 3 billion trades, and takes 140GB as text files. By using (1) a more efficient file format for storing the trading records, (2) more effective data structures and algorithms, and (3) parallelizing the computations, we are able to explore 16,000 different ways of computing VPIN in less than 20 hours on a 32-core IBM DataPlex machine. Our test demonstrates that a modest computer is sufficient to monitor a vast number of trading activities in real-time -- an ability that could be valuable to regulators.

Our test results also confirm that VPIN is a strong predictor of liquidity-induced volatility. With appropriate parameter choices, the false positive rates are about 7% averaged over all the futures contracts in the test data set. More specifically, when VPIN values rise above a threshold (CDF > 0.99), the volatility in the subsequent time windows is higher than the average in 93% of the cases.

Daniel Gunter

2010

A. Sim, D. Gunter, V. Natarajan, A. Shoshani, D. Williams, J. Long, J. Hick, J. Lee, E. Dart, "Efficient Bulk Data Replication for the Earth System Grid", Data Driven E-science: Use Cases and Successful Applications of Distributed Computing Infrastructures (ISGC 2010), (Springer-Verlag New York Inc: 2010) Pages: 435

Raj Kettimuthu, Alex Sim, Dan Gunter, Bill Allcock, Peer T. Bremer, John Bresnahan, Andrew Cherry, Lisa Childers, Eli Dart, Ian Foster, Kevin Harms, Jason Hick, Jason Lee, Michael Link, Jeff Long, Keith Miller, Vijaya Natarajan, Valerio Pascucci, Ken Raffenetti, David Ressman, Dean Williams, Loren Wilson, Linda Winkler, "Lessons learned from moving earth system grid data sets over a 20 Gbps wide-area network", HPDC 10, New York, NY, USA, ACM, 2010, 316--319, doi: 10.1145/1851476.1851519

Mark Howison

2012

E. W. Bethel, Surendra Byna, Jerry Chou, Cormier-Michel, Cameron G. R. Geddes, Howison, Fuyu Li, Prabhat, Ji Qiang, R\ ubel, Rob D. Ryne, Michael Wehner, Wu, "Big Data Analysis and Visualization: What Do LINACS Tropical Storms Have In Common?", 11th International Computational Accelerator Physics ICAP 2012, Germany, 2012,

Surendra Byna, Jerry Chou, Oliver Rubel, Homa Karimabadi, William S Daughter, Vadim Roytershteyn, E Wes Bethel, Mark Howison, Ke-Jou Hsu, Kuan-Wu Lin, others, "Parallel I/O, analysis, and visualization of a trillion particle simulation", SC 12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, January 2012, 1--12,

2011

Jerry Chou, Mark Howison, Brian Austin, Kesheng Wu, Ji Qiang, E Wes Bethel, Arie Shoshani, Oliver R\ ubel, Rob D Ryne, "Parallel index and query for large scale data analysis", Proceedings of 2011 international conference for high performance computing, networking, storage and analysis, 2011, 1--11, LBNL 5317E,

Hans Johansen

2016

Dharshi Devendran, Suren Byna, Bin Dong, Brian van Straalen, Hans Johansen, Noel Keen, and Nagiza Samatova,, "Collective I/O Optimizations for Adaptive Mesh Refinement Data Writes on Lustre File System", Cray User Group (CUG) 2016, May 10, 2016,

Xiaocheng Zou, David A Boyuka II, Dhara Desai, Daniel F Martin, Suren Byna, Kesheng Wu, "AMR-aware in situ indexing and scalable querying", Proceedings of the 24th High Performance Computing Symposium, January 1, 2016, 26,

Bin Dong, Suren Byna, Kesheng Wu, Hans Johansen, Jeffrey N Johnson, Noel Keen, others, "Data elevator: Low-contention data movement in hierarchical storage system", 2016 IEEE 23rd international conference on high performance computing (HiPC), January 1, 2016, 152--161,

2015

Xiaocheng Zou, Kesheng Wu, David A. Boyuka, Daniel F. Martin, Suren Byna, Houjun, Kushal Bansal, Terry J. Ligocki, Hans Johansen, and Nagiza F. Samatova, "Parallel In Situ Detection of Connected Components Adaptive Mesh Refinement Data", Proceedings of the Cluster, Cloud and Grid Computing (CCGrid) 2015, 2015,

Mariam Kiran

2020

Bashir Mohammed, Mariam Kiran, Dan Wang, Qiang Du, Russell Wilcox, "Deep Reinforcement Learning based Control for two-dimensional Coherent Combining", Laser Applications Conference, pp. JTu5A-7. Optical Society of America, 2020., OSA Publishing, December 1, 2020,

Dan Wang, Qiang Du, Tong Zhou, Bashir Mohammed, Mariam Kiran, Derun Li, Russell Wilcox, "Artificial Neural Networks Applied to Stabilization of 81-beam Coherent Combining", Advanced Solid State Lasers, Optical Society of America, December 1, 2020,

D. Bard, C. Snavely, L. Gerhardt, J. Lee, B. Totzke, K. Antypas, S. Byna, R. Cheema, S. Cholia, M. Day, B. Enders, A. Gaur, A. Greiner, T. Groves, M. Kiran, Q. Koziol, K. Rowland, C. Samuel, A. Selvarajan, A. Sim, D. Skinner, R. Thomas, G. Torok, The Superfacility project: automated pipelines for experiments and HPC, International Conference for High Performance Computing, Networking, Storage, and Analysis (SC20), State of the Practice (SOP), 2020,

B. Enders, D. Bard, C. Snavely, L. Gerhardt, J. Lee, B. Totzke, K. Antypas, S. Byna, R. Cheema, S. Cholia, M. Day, A. Gaur, A. Greiner, T. Groves, M. Kiran, Q. Koziol, K. Rowland, C. Samuel, A. Selvarajan, A. Sim, D. Skinner, R. Thomas, G. Torok, "Cross-facility science with the Superfacility Project at LBNL", 2nd Workshop on Large-scale Experiment-in-the-Loop Computing (XLOOP 2020), in conjunction with the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC 20), 2020,

B Mohammed, M Kiran; N Krishnaswamy; Keshang, Wu, "Predicting WAN Traffic Volumes using Fourier and Multivariate SARIMA Approach", International Journal of Big Data Intelligence, November 3, 2020,

N Krishnaswamy; M Kiran; B Mohammed; Singh Kunal, "Data-driven Learning to Predict WAN Network Traffic.", SNTA '20: Proceedings of the 3rd International Workshop on Systems and Network Telemetry and Analytics, November 3, 2020, 11-18, doi: 10.1145/3391812.3396268

T Mallick, M Kiran, B Mohammed, Prasanna Balaprakash, "Dynamic Graph Neural Network for Traffic Forecasting in Wide Area Networks.", Machine Learning Big Data 2020, November 2, 2020,

M Hocine, M Kiran, A Mercian, and B Mohammed, "Using Machine Learning for Intent-based provisioning in High-Speed Science Networks.", SNTA '20: Proceedings of the 3rd International Workshop on Systems and Network Telemetry and Analytics, November 2, 2020, 27-30, doi: 10.1145/3391812.3396269

2019

M Kiran, B Mohammed and N. Krishnaswamy, "DeepRoute: Herding Elephant and Mice Flows with Reinforcement Learning", 2nd IFIP International Conference on Machine Learning for Networking (MLN'2019), December 2, 2019, doi: 10.1007/978-3-030-45778-5_20

B Mohammed, M Kiran, N Krishnaswamy, "DeepRoute on Chameleon: Experimenting with Large-scale Reinforcement Learning and SDN on Chameleon Testbed", IEEE 27th International Conference on Network Protocols (ICNP), IEEE, November 14, 2019, 1-2, doi: 10.1109/ICNP.2019.8888090

George Papadimitriou, Mariam Kiran, Cong Wang, Anirban Mandal, Ewa Deelman, "Training Classifiers to Identify TCP Signatures in Scientific Workflows", INDIS, SC19, November 14, 2019,

B Mohammed, N Krishnaswamy, M Kiran, "Multivariate Time-Series Prediction for Traffic in Large WAN Topology", ACM/IEEE Symposium on Architectures for Networking and Communications, August 1, 2019, doi: 10.1109/ANCS.2019.8901870

M Kiran, A Chhabra, "Understanding flows in high-speed scientific networks: A Netflow data study", Future Generation Computer Systems, February 1, 2019, 94:72-79,

2017

B Mohammed, M Kiran, KM Maiyama, MM Kamala, IU Awan, "Failover strategy for fault tolerance in cloud computing environment", Journal of Software - Practice and Experience, 2017, 47:1243--1274, doi: https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.2491

Harinarayan Krishnan

2016

Harinarayan Krishnan, Burlen Loring, Suren Byna, Michael F. Wehner, Travis A. O'Brien, Prabhat, Chris Paciorek, and Daithi Stone, "Enabling End-to-End Climate Science Workflows in High Performance Computing Environments", The AMS (American Meteorological Society) 96th Annual Meeting, January 6, 2016,

Burlen Loring, Suren Byna, Prabhat, Junmin Gu, Hari Krishnan, Michael Wehner, and Oliver Ruebel, "TECA an Extreme Event Detection and Climate Analysis Package for High Performance Computing", The AMS (American Meteorological Society) 96th Annual Meeting, January 6, 2016,

Deborah A Agarwal, Boris Faybishenko, Vicky L Freedman, Harinarayan Krishnan, Gary Kushner, Carina Lansing, Ellen Porter, Alexandru Romosan, Arie Shoshani, Haruko Wainwright, others, "A science data gateway for environmental management", Concurrency and Computation: Practice and Experience, 2016, 28:1994--2004,

2015

Hari Krishnan, Suren Byna, Michael Wehner, Junmin Gu, Travis O'Brien, Burlen Loring, Daithi Stone, William Collins, Prabhat, Yunjie Liu, Jeffrey Johnson, and Christopher Paciorek, "Enabling Efficient Climate Science Workflows in High Performance Computing Environments", AGU Fall Meeting, 2015, December 13, 2015,

David Leinweber

2013

Kesheng Wu, Wes Bethel, Ming Gu, David, Oliver R\ ubel, Testing VPIN on Big Data, Available at SSRN 2318259, 2013,

Kesheng Wu, E Bethel, Ming Gu, David Leinweber, Oliver R\ ubel, "A big data approach to analyzing market volatility", Algorithmic Finance, 2013, 2:241--267, LBNL LBNL-6382E,

Understanding the microstructure of the financial market requires the processing of a vast amount of data related to individual trades, and sometimes even multiple levels of quotes. Analyzing such a large volume of data requires tremendous computing power that is not easily available to financial academics and regulators. Fortunately, public funded High Performance Computing (HPC) power is widely available at the National Laboratories in the US. In this paper we demonstrate that the HPC resource and the techniques for data-intensive sciences can be used to greatly accelerate the computation of an early warning indicator called Volume-synchronized Probability of Informed trading (VPIN). The test data used in this study contains five and a half year's worth of trading data for about 100 most liquid futures contracts, includes about 3 billion trades, and takes 140GB as text files. By using (1) a more efficient file format for storing the trading records, (2) more effective data structures and algorithms, and (3) parallelizing the computations, we are able to explore 16,000 different ways of computing VPIN in less than 20 hours on a 32-core IBM DataPlex machine. Our test demonstrates that a modest computer is sufficient to monitor a vast number of trading activities in real-time -- an ability that could be valuable to regulators.

Our test results also confirm that VPIN is a strong predictor of liquidity-induced volatility. With appropriate parameter choices, the false positive rates are about 7% averaged over all the futures contracts in the test data set. More specifically, when VPIN values rise above a threshold (CDF > 0.99), the volatility in the subsequent time windows is higher than the average in 93% of the cases.

2012

E. W. Bethel and D. Leinweber and O. Rubel and K. Wu, "Federal Market Information Technology in the Post Flash Crash Era: Roles of Supercomputing", The Journal of Trading, 2012, 7:9-24, LBNL 5263E, doi: 10.3905/jot.2012.7.2.009

E. Wes Bethel, David Leinweber, Oliver Rübel Kesheng Wu, Federal Market Information Technology in the Crash Era: Roles for Supercomputing, The Journal of Trading, Pages: 9--25 2012, doi: 10.3905/jot.2012.7.2.009

2011

E. Wes Bethel, David Leinweber, Oliver Rübel, Kesheng Wu, "Federal Market Information Technology in the Post Flash Crash Era: Roles of Supercomputing", Workshop on High Performance Computational Finance at SC11, Seattle, WA, USA, November 2011, LBNL 5263E,

Xiaoye Li

2018

Hongyuan Zhan, Gabriel Gomes, Xiaoye S Li, Kamesh Madduri, Kesheng Wu, "Efficient Online Hyperparameter Optimization for Kernel Ridge Regression with Applications to Traffic Time Series Prediction", arXiv preprint arXiv:1811.00620, 2018,

Hongyuan Zhan, Gabriel Gomes, Xiaoye S Li, Kamesh Madduri, Alex Sim, Kesheng Wu, "Consensus ensemble system for traffic flow prediction", IEEE Transactions on Intelligent Transportation Systems, 2018, 19:3903--3914,

Hongyuan Zhan, Gabriel Gomes, Xiaoye S Li, Kamesh Madduri, Kesheng Wu, "Efficient online hyperparameter learning for traffic flow prediction", 2018 21st International Conference on Intelligent Transportation Systems (ITSC), 2018, 164--169,

2011

R. Ryne, B. Austin, J. Byrd, J. Corlett, E. Esarey, C. G. R. Geddes, W. Leemans, X. Li, Prabhat, J. Qiang, O. Rübel, J.-L. Vay, M. Venturini, K. Wu, B. Carlsten, D. Higdon and N. Yampolsky, "High Performance Computing in Accelerator Science: Past Successes, Future Challenges", Workshop on Data and Communications in Basic Energy Sciences: Creating a Pathway for Scientific Discovery, October 2011,

Terry J. Ligocki

2015

Xiaocheng Zou, Kesheng Wu, David A. Boyuka, Daniel F. Martin, Suren Byna, Houjun, Kushal Bansal, Terry J. Ligocki, Hans Johansen, and Nagiza F. Samatova, "Parallel In Situ Detection of Connected Components Adaptive Mesh Refinement Data", Proceedings of the Cluster, Cloud and Grid Computing (CCGrid) 2015, 2015,

Burlen Loring

2016

Harinarayan Krishnan, Burlen Loring, Suren Byna, Michael F. Wehner, Travis A. O'Brien, Prabhat, Chris Paciorek, and Daithi Stone, "Enabling End-to-End Climate Science Workflows in High Performance Computing Environments", The AMS (American Meteorological Society) 96th Annual Meeting, January 6, 2016,

Burlen Loring, Suren Byna, Prabhat, Junmin Gu, Hari Krishnan, Michael Wehner, and Oliver Ruebel, "TECA an Extreme Event Detection and Climate Analysis Package for High Performance Computing", The AMS (American Meteorological Society) 96th Annual Meeting, January 6, 2016,

Utkarsh Ayachit, Andrew Bauer, Earl PN Duque, Greg Eisenhauer, Nicola Ferrier, Junmin Gu, Kenneth E Jansen, Burlen Loring, Zarija Lukic, Suresh Menon, others, "Performance analysis, design considerations, and applications of extreme-scale in situ infrastructures", SC 16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2016, 921--932, LBNL 1007264,

2015

Hari Krishnan, Suren Byna, Michael Wehner, Junmin Gu, Travis O'Brien, Burlen Loring, Daithi Stone, William Collins, Prabhat, Yunjie Liu, Jeffrey Johnson, and Christopher Paciorek, "Enabling Efficient Climate Science Workflows in High Performance Computing Environments", AGU Fall Meeting, 2015, December 13, 2015,

2012

Mehmet Balman, Eric Pouyoul, Yushu Yao, E. Wes Bethel, Burlen Loring, Prabhat, John Shalf, Alex Sim, and Brian L. Tierney, "Experiences with 100G Network Applications", In Proceedings of the Fifth international Workshop on Data-intensive Distributed Computing, in conjunction with ACM High Performance Distributing Computing (HPDC) Conference, 2012, Delft, Netherlands, June 2012, LBNL 5603E, doi: 10.1145/2286996.2287004

100Gbps networking has finally arrived, and many research and educational in- stitutions have begun to deploy 100Gbps routers and services. ESnet and Internet2 worked together to make 100Gbps networks available to researchers at the Super- computing 2011 conference in Seattle Washington. In this paper, we describe two of the first applications to take advantage of this network. We demonstrate a visu- alization application that enables remotely located scientists to gain insights from large datasets. We also demonstrate climate data movement and analysis over the 100Gbps network. We describe a number of application design issues and host tuning strategies necessary for enabling applications to scale to 100Gbps rates. 

Zarija Lukic

2016

Utkarsh Ayachit, Andrew Bauer, Earl PN Duque, Greg Eisenhauer, Nicola Ferrier, Junmin Gu, Kenneth E Jansen, Burlen Loring, Zarija Lukic, Suresh Menon, others, "Performance analysis, design considerations, and applications of extreme-scale in situ infrastructures", SC 16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2016, 921--932, LBNL 1007264,

Kamesh Madduri

2009

K. Madduri and D.A. Bader, "Compact Graph Representations and Parallel Connectivity Algorithms for Massive Dynamic Network Analysis", The 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2009), Rome, Italy, 2009,

Victor M. Markowitz

2013

Alex Romosan, Arie Shoshani, Kesheng Wu, Victor Markowitz, Kostas Mavrommatis, "Accelerating gene context analysis using bitmaps", Proceedings of the 25th International Conference on Scientific and Statistical Database Management, 2013, 1--12, LBNL 6397E,

Daniel F. Martin

2016

Xiaocheng Zou, David A Boyuka II, Dhara Desai, Daniel F Martin, Suren Byna, Kesheng Wu, "AMR-aware in situ indexing and scalable querying", Proceedings of the 24th High Performance Computing Symposium, January 1, 2016, 26,

2015

Xiaocheng Zou, Kesheng Wu, David A. Boyuka, Daniel F. Martin, Suren Byna, Houjun, Kushal Bansal, Terry J. Ligocki, Hans Johansen, and Nagiza F. Samatova, "Parallel In Situ Detection of Connected Components Adaptive Mesh Refinement Data", Proceedings of the Cluster, Cloud and Grid Computing (CCGrid) 2015, 2015,

2010

Gunther Weber, "Recent advances in visit: Amr streamlines and query-driven visualization", 2010,

Joerg Meyer

2012

Karen L. Schuchardt, Deborah A. Agarwal, Stefan A. Finsterle, Carl W. Gable, Ian Gorton, Luke J. Gosink, Elizabeth H. Keating, Carina S. Lansing, Joerg Meyer, William A.M. Moeglein, George S.H. Pau, Ellen A. Porter, Sumit Purohit, Mark L. Rockhold, Arie Shoshani, and Chandrika Sivaramakrishnan, Akuna, "Integrated Toolsets Supporting Advanced Subsurface Flow and Transport Simulations for Environmental Management", XIX International Conference on Computational Methods in Water Resources (CMWR 2012), University of Illinois at Urbana-Champaign, June 2012,

Bashir Mohammed

2020

Bashir Mohammed, Mariam Kiran, Dan Wang, Qiang Du, Russell Wilcox, "Deep Reinforcement Learning based Control for two-dimensional Coherent Combining", Laser Applications Conference, pp. JTu5A-7. Optical Society of America, 2020., OSA Publishing, December 1, 2020,

Dan Wang, Qiang Du, Tong Zhou, Bashir Mohammed, Mariam Kiran, Derun Li, Russell Wilcox, "Artificial Neural Networks Applied to Stabilization of 81-beam Coherent Combining", Advanced Solid State Lasers, Optical Society of America, December 1, 2020,

B Mohammed, M Kiran; N Krishnaswamy; Keshang, Wu, "Predicting WAN Traffic Volumes using Fourier and Multivariate SARIMA Approach", International Journal of Big Data Intelligence, November 3, 2020,

N Krishnaswamy; M Kiran; B Mohammed; Singh Kunal, "Data-driven Learning to Predict WAN Network Traffic.", SNTA '20: Proceedings of the 3rd International Workshop on Systems and Network Telemetry and Analytics, November 3, 2020, 11-18, doi: 10.1145/3391812.3396268

T Mallick, M Kiran, B Mohammed, Prasanna Balaprakash, "Dynamic Graph Neural Network for Traffic Forecasting in Wide Area Networks.", Machine Learning Big Data 2020, November 2, 2020,

M Hocine, M Kiran, A Mercian, and B Mohammed, "Using Machine Learning for Intent-based provisioning in High-Speed Science Networks.", SNTA '20: Proceedings of the 3rd International Workshop on Systems and Network Telemetry and Analytics, November 2, 2020, 27-30, doi: 10.1145/3391812.3396269

S Abimbola, B Mohammed, M Sibusiso, IU Awan, and Jules Pagna Disso, "A Framework for Distributed Denial of Service Attack Detection and Reactive Countermeasure in Software Defined Network", 2019 7th IEEE International Conference on Future Internet of Things and Cloud (FiCloud), January 30, 2020, doi: 10.1109/FiCloud.2019.00019

A Sangodoyin, B Mohammed, lU Awan, "Data driven Machine Learning approach to detect DDoS attack in Software Defined Network", Journal of Concurrency and Computation: Practice and Experience, January 1, 2020,

2019

M Kiran, B Mohammed and N. Krishnaswamy, "DeepRoute: Herding Elephant and Mice Flows with Reinforcement Learning", 2nd IFIP International Conference on Machine Learning for Networking (MLN'2019), December 2, 2019, doi: 10.1007/978-3-030-45778-5_20

B Mohammed, M Kiran, N Krishnaswamy, "DeepRoute on Chameleon: Experimenting with Large-scale Reinforcement Learning and SDN on Chameleon Testbed", IEEE 27th International Conference on Network Protocols (ICNP), IEEE, November 14, 2019, 1-2, doi: 10.1109/ICNP.2019.8888090

B Mohammed, N Krishnaswamy, M Kiran, "Multivariate Time-Series Prediction for Traffic in Large WAN Topology", ACM/IEEE Symposium on Architectures for Networking and Communications, August 1, 2019, doi: 10.1109/ANCS.2019.8901870

2017

B Mohammed, M Kiran, KM Maiyama, MM Kamala, IU Awan, "Failover strategy for fault tolerance in cloud computing environment", Journal of Software - Practice and Experience, 2017, 47:1243--1274, doi: https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.2491

Dmitriy Morozov

2016

Utkarsh Ayachit, Andrew Bauer, Earl PN Duque, Greg Eisenhauer, Nicola Ferrier, Junmin Gu, Kenneth E Jansen, Burlen Loring, Zarija Lukic, Suresh Menon, others, "Performance analysis, design considerations, and applications of extreme-scale in situ infrastructures", SC 16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2016, 921--932, LBNL 1007264,

Vijaya Natarajan

2011

Dean N. Williams, Ian T. Foster, Don E. Middleton, Rachana Ananthakrishnan, Neill Miller, Mehmet Balman, Junmin Gu, Vijaya Natarajan, Arie Shoshani, Alex Sim, Gavin Bell, Robert Drach, Michael Ganzberger, Jim Ahrens, Phil Jones, Daniel Crichton, Luca Cinquini, David Brown, Danielle Harper, Nathan Hook, Eric Nienhouse, Gary Strand, Hannah Wilcox, Nathan Wilhelmi, Stephan Zednik, Steve Hankin, Roland Schweitzer, John Harney, Ross Miller, Galen Shipman, Feiyi Wang, Peter Fox, Patrick West, Stephan Zednik, Ann Chervenak, Craig Ward, "Earth System Grid Center for Enabling Technologies (ESG-CET): A Data Infrastructure for Data-Intensive Climate Research", SciDAC Conference, 2011,

2010

Alex Sim, Mehmet Balman, Dean N. Williams, Arie Shoshani, Vijaya Natarajan, "Adaptive Transfer Adjustment in Efficient Bulk Data Transfer Management for Climate Datasets", The 22nd IASTED International Conference on Parallel and Distributed Computing and System, Marina Del Rey, CA, November 20, 2010, LBNL 3985E,

Many scientific applications and experiments, such as high energy and nuclear physics, astrophysics, climate observation and modeling, combustion, nano-scale material sciences, and computational biology, generate extreme volumes of data with a large number of files. These data sources are distributed among national and international data repositories, and are shared by large numbers of geographically distributed scientists. A large portion of the data is frequently accessed, and a large volume of data is moved from one place to another for analysis and storage. A challenging issue in such efforts is the limited network capacity for moving large datasets. A tool that addresses this challenge is the Bulk Data Mover (BDM), a data transfer management tool used in the Earth System Grid (ESG) community. It has been managing massive dataset transfers efficiently in the environment where the network bandwidth is limited. Adaptive transfer adjustment was studied to enhance the BDM to handle significant end-to-end performance changes in the dynamic network environments as well as to control the data transfers for the desired transfer performance. We describe the results from our hands-on data transfer management experience in the climate research community. We study a practical transfer estimation model and state our initial results from the adaptive transfer adjustment methodology. 

Peter Nugent

2020

Qiao Kang, Alex Sim, Peter Nugent, Sunwoo Lee, Wei-keng Liao, Ankit Agrawal, Alok Choudhary, Kesheng Wu, "Predicting Resource Requirement in Intermediate Palomar Transient Factory Workflow", 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID 2020), 2020, 619--628, doi: 10.1109/CCGrid49817.2020.00-31

2018

Weijie Zhao, Florin Rusu, Kesheng Wu, Peter Nugent, "Automatic identification and classification of Palomar Transient Factory astrophysical objects in GLADE", International Journal of Computational Science and Engineering, 2018, 16:337--349,

Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, Anna YQ Ho, Peter Nugent, "Distributed caching for processing raw arrays", Proceedings of the 30th International Conference on Scientific and Statistical Database Management, 2018, 1--12,

2017

Jonathan Wang, Wucherl Yoo, Alex Sim, Peter Nugent, Kesheng Wu, "Parallel variable selection for effective performance prediction", 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2017, 208--217,

Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, Peter Nugent, "Incremental view maintenance over array data", Proceedings of the 2017 ACM International Conference on Management of Data, January 1, 2017, 139--154,

2016

Wucherl Yoo, Michelle Koo, Yi Cao, Alex Sim, Peter Nugent, Kesheng Wu, "Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters", Conquering Big Data with High Performance Computing, (Springer, Cham: 2016) Pages: 139--161

2015

Wucherl Yoo, Michelle Koo, Yi Cao, Alex Sim, Peter Nugent, Kesheng Wu, "Patha: Performance analysis tool for hpc applications", 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC), 2015, 1--8,

W. Yoo, M. Koo, Y. Cao, A. Sim, P. Nugent, K. Wu, PATHA: Performance Analysis Tool for HPC, 2015 IEEE 34th International Performance Computing and Conference (IPCCC), Pages: 1--8 2015, doi: 10.1109/PCCC.2015.7410313

2014

F. Rusu, P. Nugent, K. Wu, "Implementing the Palomar Transient Factory Real-Time Pipeline in GLADE: Results and", Lecture Notes in Computer Science, ( 2014) Pages: 53--66

Douglas Olson

2008

W. Betts, L. Didenko, T. Freeman, P. Jakl, L. Hajdu, E. Hjort, K. Keahey, J. Lauret, D. Olson, A. Rose, I. Sakrejda, A. Sim, "STAR Grid Activities, OSG and Beyond", International Symposium on Grid Computing (ISGC), 2008,

2006

E. Hjort, L. Hajdu, J. Lauret, D. Olson, A. Sim, A. Shoshani, "Data and Computational Grid Coupling in RHIC/STAR – An Analysis Scenario using SRM Technology", Computing in High Energy Physics (CHEP), 2006,

2004

Eric Hjort, Doug Olson, Jerome Lauret, Arie Shoshani, Alex Sim, "Production mode Data- Replication framework in STAR using the HRM Grid middleware", Computing in High Energy Physics, 2004,

2003

A. Sim, J. Gu, A. Shoshani, E. Hjort, D. Olson, "Experience with Deploying Storage Resource Managers to Achieve Robust File Replication", Computing in High Energy Physics, 2003,

D. Yu, J. Lauret, A. Shoshani, D. Oldon, E. Hjort, A. Sim, "The Design of High Performance Data Replication in the Grid Environment for the STAR Collaboration", Computing in High Energy Physics, 2003,

2001

E. Hjort, D. Olson, A. Sim, J. Yang, J. Lauret, M. Messer, "Data Grid Services in STAR, Initial Deployment: Site-to-Site File Replication", Computing in High Energy Physics, 2001,

D. Olson, E. Hjort, J. Lauret, M. Messer, A. Shoshani, A. Sim, "Non-shared Disk Cluster - A Fault Tolerant, Commodity Approach to Hi-Bandwidth Data Analysis", Computing in High Energy Physics, 2001,

John Owens

2008

Luke J Gosink, "Bin-hash indexing: A parallel method for fast query processing", 2008, LBNL 729E,

Gilberto Pastorello

2019

P. Linton, W. Melodia, A. Lazar, D. Agarwal, L. Bianchi, D. Ghoshal, K. Wu, G. Pastorello, L. Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), 2019,

Payton A Linton, William M Melodia, Alina Lazar, Deborah Agarwal, Ludovico Bianchi, Devarshi Ghoshal, Kesheng Wu, Gilberto Pastorello, Lavanya Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", 2019,

Prabhat

2019

Babak Behzad, Suren Byna, Prabhat, and Marc Snir, "Optimizing I/O Performance of HPC Applications with Autotuning", ACM Transactions on Parallel Computing (TOPC), February 28, 2019,

2018

Haoyuan Xing, Sofoklis Floratos, Spyros Blanas, Suren Byna, Prabhat, Kesheng Wu, and Paul Brown,, "ArrayBridge: Interweaving declarative array processing with imperative high-performance computing", 34th IEEE International Conference on Data Engineering (ICDE) 2018, April 17, 2018,

2016

Md. Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Narayanan Sundaram, Jialin Liu, Peter Sadowski, Evan Racah, Suren Byna, Craig Tull, Wahid Bhimji, Prabhat, and Pradeep Dubey,, "PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures", 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS) 2016, Chicago, May 23, 2016,

Wahid Bhimji, Debbie Bard, Melissa Romanus, David Paul, Andrey Ovsyannikov, Brian Friesen, Matt Bryson, Joaquin Correa, Glenn K. Lockwood, Vakho Tsulaia, Suren Byna, Steve Farrell, Doga Gursoy, Chris Daley, Vince Beckner, Brian Van Straalen, Nicholas Wright, Katie Antypas, Prabhat,, "Accelerating Science with the NERSC Burst Buffer Early User Program", Cray User Group (CUG) 2016, May 10, 2016,

2015

Soyoung Jeon, Prabhat, Suren Byna, Junmin Gu, William Collins, and Michael Wehner,, "Characterization of extreme precipitation within atmospheric river events over California", Advances in Statistical Climatology, Meteorology and Oceanography (ASCMO), November 21, 2015, 1:45-57, doi: 10.5194/ascmo-1-45-2015

Md. Mostofa Ali Patwary, Suren Byna, Nadathur Rajagopalan Satish, Narayanan Sundaram, Zarija Lukic, Vadim Roytershteyn, Michael J. Anderson, Yushu Yao, Mr Prabhat, and Pradeep Dubey, "BD-CATS: Big Data Clustering at Trillion Particle Scale", Supercomputing 2015 (SC15), Supercomputing 2015 (SC15), November 17, 2015,

Babak Behzad, Suren Byna, Prabhat and Marc Snir, "Pattern-driven Parallel I/O Tuning", 10th Parallel Data Storage Workshop (PDSW) 2015, held in conjunction with SC15, 10th Parallel Data Storage Workshop (PDSW) 2015, to be held in conjunction with SC15, November 16, 2015,

Shane Snyder, Philip Carns, Robert Latham, Misbah Mubarak, Chris Carothers, Babak Behzad, Huong Vu Thanh Luu, Suren Byna, and Prabhat, "Techniques for Modeling Large-scale HPC I/O Workloads", the 6th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS15), in conjunction with SC15, the 6th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performa, November 15, 2015,

Babak Behzad, Suren Byna, Stefan Wild, Prabhat and Marc Snir, "Dynamic Model-driven Parallel I/O Performance Tuning", IEEE Cluster 2015, 2015,

H. Luu, M. Winslett, W. Gropp, R. Ross, P. Carns, K. Harms, Prabhat, S. Byna, Y. Yao,, "A Multi-platform Study of I/O Behavior on Petascale Supercomputers", The 24th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC) 2015, 2015,

2014

Soyoung Jeon, Christopher Paciorek, Prabhat, Surendra Byna, William Collins, Michael Wehner, "Uncertainty Quantification for Characterizing Spatial Tail Dependence under Statistical Framework", AGU, Fall Meeting 2014, 2014,

Babak Behzad, Surendra Byna, Stefan M. Wild, Mr. Prabhat, Marc Snir, "Improving Parallel I/O Autotuning with Performance Modeling", ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC 2014), New York, NY, USA, ACM, 2014, 253--256, doi: 10.1145/2600212.2600708

M Scot Breitenfeld, Kalyana Chadalavada, Robert Sisneros, Surendra Byna, Quincey Koziol, Neil Fortner, Prabhat, Venkat Vishwanath, "Recent Progress in Tuning Performance of Large-scale I/O with Parallel HDF5", The 9th Parallel Data Storage Workshop (PDSW) held in conjunction with SC14, 2014,

2013

Babak Behzad, Huong Vu Thanh Luu, Joseph Huchette, Surendra Byna, Prabhat, Ruth Aydt, Quincey Koziol, and Marc Snir, "Taming parallel I/O complexity with auto-tuning", In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '13), 2013,

E Wes Bethel, Prabhat Prabhat, Suren Byna, Oliver R\ ubel, K John Wu, Michael Wehner, "Why high performance visual data analytics is both relevant and difficult", Visualization and Data Analysis 2013, January 2013, 8654:86540B, LBNL LBNL-6063E,

2012

Mehmet Balman, Eric Pouyoul, Yushu Yao, E. Wes Bethel, Burlen Loring, Prabhat, John Shalf, Alex Sim, and Brian L. Tierney, "Experiences with 100G Network Applications", In Proceedings of the Fifth international Workshop on Data-intensive Distributed Computing, in conjunction with ACM High Performance Distributing Computing (HPDC) Conference, 2012, Delft, Netherlands, June 2012, LBNL 5603E, doi: 10.1145/2286996.2287004

100Gbps networking has finally arrived, and many research and educational in- stitutions have begun to deploy 100Gbps routers and services. ESnet and Internet2 worked together to make 100Gbps networks available to researchers at the Super- computing 2011 conference in Seattle Washington. In this paper, we describe two of the first applications to take advantage of this network. We demonstrate a visu- alization application that enables remotely located scientists to gain insights from large datasets. We also demonstrate climate data movement and analysis over the 100Gbps network. We describe a number of application design issues and host tuning strategies necessary for enabling applications to scale to 100Gbps rates. 

E. W. Bethel, Surendra Byna, Jerry Chou, Cormier-Michel, Cameron G. R. Geddes, Howison, Fuyu Li, Prabhat, Ji Qiang, R\ ubel, Rob D. Ryne, Michael Wehner, Wu, "Big Data Analysis and Visualization: What Do LINACS Tropical Storms Have In Common?", 11th International Computational Accelerator Physics ICAP 2012, Germany, 2012,

Surendra Byna, Jerry Chou, Oliver Rubel, Homa Karimabadi, William S Daughter, Vadim Roytershteyn, E Wes Bethel, Mark Howison, Ke-Jou Hsu, Kuan-Wu Lin, others, "Parallel I/O, analysis, and visualization of a trillion particle simulation", SC 12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, January 2012, 1--12,

Oliver R\ ubel, Surendra Byna, Kesheng Wu, Fuyu Li, Michael Wehner, Wes Bethel, others, "Teca: A parallel toolkit for extreme climate analysis", Procedia Computer Science, Elsevier, January 2012, 9:866--876, LBNL 5352E,

We present TECA, a parallel toolkit for detecting extreme events in large climate datasets. Modern climate datasets expose parallelism across a number of dimensions: spatial locations, timesteps and ensemble members. We design TECA to exploit these modes of parallelism and demonstrate a prototype implementation for detecting and tracking three classes of extreme events: tropical cyclones, extra-tropical cyclones and atmospheric rivers. We process a modern TB-sized CAM5 simulation dataset with TECA, and demonstrate good runtime performance for the three case studies.

Allen R Sanderson, Brad Whitlock, H Childs, GH Weber, K Wu, others, "A system for query based analysis and visualization", January 2012, LBNL 5507E,

2011

R. Ryne, B. Austin, J. Byrd, J. Corlett, E. Esarey, C. G. R. Geddes, W. Leemans, X. Li, Prabhat, J. Qiang, O. Rübel, J.-L. Vay, M. Venturini, K. Wu, B. Carlsten, D. Higdon and N. Yampolsky, "High Performance Computing in Accelerator Science: Past Successes, Future Challenges", Workshop on Data and Communications in Basic Energy Sciences: Creating a Pathway for Scientific Discovery, October 2011,

Surendra Byna, Michael F Wehner, Kesheng John Wu, "Detecting atmospheric rivers in large climate datasets", Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities, 2011, 7--14,

Extreme precipitation events on the western coast of North America are often traced to an unusual weather phenomenon known as atmospheric rivers. Although these storms may provide a significant fraction of the total water to the highly managed western US hydrological system, the resulting intense weather poses severe risks to the human and natural infrastructure through severe flooding and wind damage. To aid the understanding of this phenomenon, we have developed an efficient detection algorithm suitable for analyzing large amounts of data. In addition to detecting actual events in the recent observed historical record, this detection algorithm can be applied to global climate model output providing a new model validation methodology. Comparing the statistical behavior of simulated atmospheric river events in models to observations will enhance confidence in projections of future extreme storms. Our detection algorithm is based on a thresholding condition on the total column integrated water vapor established by Ralph et al. (2004) followed by a connected component labeling procedure to group the mesh points into connected regions in space. We develop an efficient parallel implementation of the algorithm and demonstrate good weak and strong scaling. We process a 30-year simulation output on 10,000 cores in under 3 seconds.

Jerry Chou, Mark Howison, Brian Austin, Kesheng Wu, Ji Qiang, E Wes Bethel, Arie Shoshani, Oliver R\ ubel, Rob D Ryne, "Parallel index and query for large scale data analysis", Proceedings of 2011 international conference for high performance computing, networking, storage and analysis, 2011, 1--11, LBNL 5317E,

M Prabhat, S Byna, C Paciorek, G Weber, K Wu, T Yopes, MF Wehner, G Ostrouchov, D Pugmire, R Strelitz, others, "Pattern Detection and Extreme Value Analysis on Large Climate Data", AGUFM, Pages: IN41C--03 January 2011,

J. Chou, K. Wu, O. R\ ubel, M. Howison, Qiang, Prabhat, B. Austin, E. W. Bethel, D. Ryne, A. Shoshani, "Parallel Index and Query for Large Scale Data", SC11, 2011, doi: 10.1145/2063384.2063424

Jerry Chou, Kesheng Wu, others, "Fastquery: A parallel indexing system for scientific data", 2011 IEEE International Conference on Cluster Computing, 2011, 455--464,

2010

Oliver R\ ubel, Sean Ahern, E Wes Bethel, Mark D Biggin, Hank Childs, Estelle Cormier-Michel, Angela DePace, Michael B Eisen, Charless C Fowlkes, Cameron GR Geddes, others, "Coupling visualization and data analysis for knowledge discovery from multi-dimensional scientific data", Procedia computer science, Elsevier, January 2010, 1:1757--1764, LBNL 3669E,

Gunther Weber, "Recent advances in visit: Amr streamlines and query-driven visualization", 2010,

Oliver Rübel, Sean Ahern, E. Wes Bethel, D. Biggin, Hank Childs, Estelle, Angela DePace, Michael B. Eisen Charless C. Fowlkes, Cameron G. R. Geddes, Hagen, Bernd Hamann, Min-Yu Huang, Soile E. Keränen, David W. Knowles, Cris L. Hendriks, Jitendra Malik, Jeremy Meredith Peter Messmer, Prabhat, Daniela Ushizima, H. Weber, Kesheng Wu, "Coupling visualization and data analysis for knowledge from multi-dimensional scientific data", Procedia Computer Science, 2010, 1:1751--1758, doi: 10.1016/j.procs.2010.04.197

2009

E. W. Bethel, C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, M. Day, E. Deines, T. Fogal, C. Garth, C. G. R. Geddes, H. Hagen, B. Hamann, C. Hansen, J. Jacobsen, K. Joy, J. Kruger, J. Meredith, P. Messmer, G. Ostrouchov, V. Pascucci, K. Potter, Prabhat, D. Pugmire, O. Rubel, A. Sanderson, C. Silva, D. Ushizima, G. Weber, B. Whitlock, K. Wu, "Occam's Razor and Petascale Visual Data Analysis", SciDAC 2009, J. of Physics: Conference Series, San Diego, California, July 2009, LBNL 2210E,

K Wu, S Ahern, EW Bethel, J Chen, H Childs, C Geddes, J Gu, H Hagen, B Hamann, J Lauret, others, "FastBit: Interactively Searching Massive Data", Proc. of SciDAC 2009, 2009, LBNL 2164E,

Oliver R\ ubel, Cameron GR Geddes, Estelle Cormier-Michel, Kesheng Wu, Gunther H Weber, Daniela M Ushizima, Peter Messmer, Hans Hagen, Bernd Hamann, Wes Bethel, others, "Automatic beam path analysis of laser wakefield particle acceleration data", Computational Science \& Discovery, January 2009, 2:015005, LBNL 2734E,

E Bethel, "Modern Scientific Visualization is More than Just Pretty Pictures", January 2009, LBNL 1450E,

C. G. R. Geddes, E Cormier-Michel, E. H. Esarey, C. B. Schroeder, J.-L. Vay, W. P. Leemans, D. L.. Bruhwiler, J. R. Cary, B. Cowan, M. Durant, P. Hamill, P. Messmer, P. Mullowney, C. Nieter, K. Paul, S. Shasharina, S. Veitzer, G. Weber, O. Rübel, D. Ushizima, Prabhat, E. W.Bethel, K. Wu, Large Fields for Smaller Facility Sources, SciDAC Review, Pages: 13-21, 2009,

2008

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "High Performance Multivariate Visual Data Exploration for Extemely Large Data", Supercomputing (SC), Austin, Texas, USA, November 2008, LBNL 716E,

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "Application of High-performance Visual Analysis Methods to Laser Wakefield Particle Acceleration Data", IEEE Visualization 2008, October 2008,

E. Wes Bethel, Oliver Rübel, Prabhat, Wu, Gunther H. Weber, Valerio Pascucci Hank Childs, Ajith Mascarenhas, Jeremy, Sean Ahern, "Modern Scientific Visualization is More than Just Pictures", Numerical Modeling of Space Plasma Flows: (Astronomical Society of the Pacific Series), St. Thomas, USVI, 2008, 301--317,

Lavanya Ramakrishnan

2019

P. Linton, W. Melodia, A. Lazar, D. Agarwal, L. Bianchi, D. Ghoshal, K. Wu, G. Pastorello, L. Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), 2019,

Payton A Linton, William M Melodia, Alina Lazar, Deborah Agarwal, Ludovico Bianchi, Devarshi Ghoshal, Kesheng Wu, Gilberto Pastorello, Lavanya Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", 2019,

Alex Romosan

2016

Deborah A Agarwal, Boris Faybishenko, Vicky L Freedman, Harinarayan Krishnan, Gary Kushner, Carina Lansing, Ellen Porter, Alexandru Romosan, Arie Shoshani, Haruko Wainwright, others, "A science data gateway for environmental management", Concurrency and Computation: Practice and Experience, 2016, 28:1994--2004,

2014

DP Schissel, Gheni Abla, SM Flanagan, M Greenwald, X Lee, A Romosan, A Shoshani, J Stillerman, J Wright, "Automated metadata, provenance cataloging and navigable interfaces: Ensuring the usefulness of extreme-scale data", Fusion Engineering and Design, North-Holland, 2014,

John C Wright, Martin Greenwald, Joshua Stillerman, Gheni Abla, Bobby Chanthavong, Sean Flanagan, David Schissel, Xia Lee, Alex Romosan, Arie Shoshani, The MPO API: A tool for recording scientific workflows, Fusion Engineering and Design, 2014,

2013

Alex Romosan, Arie Shoshani, Kesheng Wu, Victor Markowitz, Kostas Mavrommatis, "Accelerating gene context analysis using bitmaps", Proceedings of the 25th International Conference on Scientific and Statistical Database Management, 2013, 1--12, LBNL 6397E,

Doron Rotem

2011

A. Shoshani, I. Altintas, J. Chen, G. Chin, A. Choudhary, D. Crawl, T. Critchlow, K. Gao, B. Grimm, H. Iyer, C. Kamath, A. Khan, S. Klasky, S. Koehler, S. Lang, R. Latham, J. W. Li, W. Liao, J. Ligon, Q. Liu, B. Ludaescher, P. Mouallem, M. Nagappan, N. Podhorszki, R. Ross, D. Rotem, N. Samatova, C. Silva, A. Sim, R. Tchoua, R. Thakur, M. Vouk, K. Wu, W. Yu, "The Scientific Data Management Center: Available Technologies and Highlights", SciDAC Conference, 2011,

Kesheng Wu, Surendra Byna, Doron Rotem, Arie, "Scientific Data Services -- A High-Performance I/O with Array Semantics", HPCDB, IEEE, 2011, doi: 10.11v45/2125636.2125640

2009

Scientific Data Management: Challenges, Technology, and Deployment, edited by Arie Shoshani and Doron Rotem, (Chapman & Hall/CRC Computational Science: December 2009)

Ekow J. Otoo, Doron Rotem, and Shih-Chiang Tsao, "Energy smart management of scientific data", 21st Int'l. Conf. on Sc. and Stat. Database Management (SSDBM’2009), New Orleans, Louisiana, USA, June 2009, LBNL 2185E,

Ekow Otoo, Doron Rotem and Shih-Chiang Tsao, "Analysis of Trade-Off Between Power Saving and Response Time in Disk Storage Systems", Fifth Workshop on High-Performance, Power-Aware Computing, Rome, Italy, May 2, 2009,

Ekow J. Otoo, Doron Rotem, and Shih-Chiang Tsao, "Workload-adaptive management of energy-smart disk storage systems", IASDS09: Workshop on Interfaces and Architecture, 2009,

2008

Kurt Stockinger, John Cieslewicz, Kesheng Wu, Rotem, Arie Shoshani, "Using Bitmap Indexing Technology for Combined and Text Queries", Annals of Information Systems, (Springer: 2008) Pages: 1--23

2001

A. Sim, H. Nordberg, L.M. Bernardo, A. Shoshani, D. Rotem, "Experience with using CORBA to implement a file caching coordination system", Concurrency and Computation: Practice and Experience, 2001, 13:1-15,

1999

A. Sim, H. Nordberg, L. M. Bernardo, A. Shoshani, D. Rotem, "Storage Access Coordination Using CORBA", Distributed Objects and Application, 1999, 168-175,

A. Shoshani, L.M. Bernardo, H. Nordberg, D. Rotem and A. Sim, "Multidimensional Indexing and Query Coordination for Tertiary Storage Management", International Conference on Scientific and Statistical Database Management, 1999, 214-225,

1998

L.M. Bernardo, D. Rotem, A. Shoshani, H. Nordberg, A. Sim, "Using Access Patterns to Partition Large Datasets on Tertiary Storage in Order to Minimize Retrieval Costs", 1998, LBNL 41504,

A. Shoshani, L.M. Bernardo, H. Nordberg, D. Rotem, A. Sim, "Storage Management for High Energy Physics Applications", Computing in High Energy Physics, 1998,

Florin Rusu

2018

Weijie Zhao, Florin Rusu, Kesheng Wu, Peter Nugent, "Automatic identification and classification of Palomar Transient Factory astrophysical objects in GLADE", International Journal of Computational Science and Engineering, 2018, 16:337--349,

Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, Anna YQ Ho, Peter Nugent, "Distributed caching for processing raw arrays", Proceedings of the 30th International Conference on Scientific and Statistical Database Management, 2018, 1--12,

2017

Bin Dong, Kesheng Wu, Surendra Byna, Jialin Liu, Weijie Zhao, Florin Rusu, "ArrayUDF: User-defined scientific data analysis on arrays", Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, January 1, 2017, 53--64,

Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, Peter Nugent, "Incremental view maintenance over array data", Proceedings of the 2017 ACM International Conference on Management of Data, January 1, 2017, 139--154,

2014

F. Rusu, P. Nugent, K. Wu, "Implementing the Palomar Transient Factory Real-Time Pipeline in GLADE: Results and", Lecture Notes in Computer Science, ( 2014) Pages: 53--66

Oliver Rübel

2019

Donghe Kang, Oliver Rübel, Suren Byna, Spyros Blanas, "Comparison of Array Management Library Performance - A Neuroscience Use Case", SC19 Poster, November 20, 2019,

2016

Burlen Loring, Suren Byna, Prabhat, Junmin Gu, Hari Krishnan, Michael Wehner, and Oliver Ruebel, "TECA an Extreme Event Detection and Climate Analysis Package for High Performance Computing", The AMS (American Meteorological Society) 96th Annual Meeting, January 6, 2016,

2013

Kesheng Wu, Wes Bethel, Ming Gu, David, Oliver R\ ubel, Testing VPIN on Big Data, Available at SSRN 2318259, 2013,

Kesheng Wu, E Bethel, Ming Gu, David Leinweber, Oliver R\ ubel, "A big data approach to analyzing market volatility", Algorithmic Finance, 2013, 2:241--267, LBNL LBNL-6382E,

Understanding the microstructure of the financial market requires the processing of a vast amount of data related to individual trades, and sometimes even multiple levels of quotes. Analyzing such a large volume of data requires tremendous computing power that is not easily available to financial academics and regulators. Fortunately, public funded High Performance Computing (HPC) power is widely available at the National Laboratories in the US. In this paper we demonstrate that the HPC resource and the techniques for data-intensive sciences can be used to greatly accelerate the computation of an early warning indicator called Volume-synchronized Probability of Informed trading (VPIN). The test data used in this study contains five and a half year's worth of trading data for about 100 most liquid futures contracts, includes about 3 billion trades, and takes 140GB as text files. By using (1) a more efficient file format for storing the trading records, (2) more effective data structures and algorithms, and (3) parallelizing the computations, we are able to explore 16,000 different ways of computing VPIN in less than 20 hours on a 32-core IBM DataPlex machine. Our test demonstrates that a modest computer is sufficient to monitor a vast number of trading activities in real-time -- an ability that could be valuable to regulators.

Our test results also confirm that VPIN is a strong predictor of liquidity-induced volatility. With appropriate parameter choices, the false positive rates are about 7% averaged over all the futures contracts in the test data set. More specifically, when VPIN values rise above a threshold (CDF > 0.99), the volatility in the subsequent time windows is higher than the average in 93% of the cases.

E Wes Bethel, Prabhat Prabhat, Suren Byna, Oliver R\ ubel, K John Wu, Michael Wehner, "Why high performance visual data analytics is both relevant and difficult", Visualization and Data Analysis 2013, January 2013, 8654:86540B, LBNL LBNL-6063E,

2012

E. W. Bethel and D. Leinweber and O. Rubel and K. Wu, "Federal Market Information Technology in the Post Flash Crash Era: Roles of Supercomputing", The Journal of Trading, 2012, 7:9-24, LBNL 5263E, doi: 10.3905/jot.2012.7.2.009

E. W. Bethel, Surendra Byna, Jerry Chou, Cormier-Michel, Cameron G. R. Geddes, Howison, Fuyu Li, Prabhat, Ji Qiang, R\ ubel, Rob D. Ryne, Michael Wehner, Wu, "Big Data Analysis and Visualization: What Do LINACS Tropical Storms Have In Common?", 11th International Computational Accelerator Physics ICAP 2012, Germany, 2012,

Surendra Byna, Jerry Chou, Oliver Rubel, Homa Karimabadi, William S Daughter, Vadim Roytershteyn, E Wes Bethel, Mark Howison, Ke-Jou Hsu, Kuan-Wu Lin, others, "Parallel I/O, analysis, and visualization of a trillion particle simulation", SC 12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, January 2012, 1--12,

Oliver R\ ubel, Surendra Byna, Kesheng Wu, Fuyu Li, Michael Wehner, Wes Bethel, others, "Teca: A parallel toolkit for extreme climate analysis", Procedia Computer Science, Elsevier, January 2012, 9:866--876, LBNL 5352E,

We present TECA, a parallel toolkit for detecting extreme events in large climate datasets. Modern climate datasets expose parallelism across a number of dimensions: spatial locations, timesteps and ensemble members. We design TECA to exploit these modes of parallelism and demonstrate a prototype implementation for detecting and tracking three classes of extreme events: tropical cyclones, extra-tropical cyclones and atmospheric rivers. We process a modern TB-sized CAM5 simulation dataset with TECA, and demonstrate good runtime performance for the three case studies.

Allen R Sanderson, Brad Whitlock, H Childs, GH Weber, K Wu, others, "A system for query based analysis and visualization", January 2012, LBNL 5507E,

E. Wes Bethel, David Leinweber, Oliver Rübel Kesheng Wu, Federal Market Information Technology in the Crash Era: Roles for Supercomputing, The Journal of Trading, Pages: 9--25 2012, doi: 10.3905/jot.2012.7.2.009

2011

E. Wes Bethel, David Leinweber, Oliver Rübel, Kesheng Wu, "Federal Market Information Technology in the Post Flash Crash Era: Roles of Supercomputing", Workshop on High Performance Computational Finance at SC11, Seattle, WA, USA, November 2011, LBNL 5263E,

R. Ryne, B. Austin, J. Byrd, J. Corlett, E. Esarey, C. G. R. Geddes, W. Leemans, X. Li, Prabhat, J. Qiang, O. Rübel, J.-L. Vay, M. Venturini, K. Wu, B. Carlsten, D. Higdon and N. Yampolsky, "High Performance Computing in Accelerator Science: Past Successes, Future Challenges", Workshop on Data and Communications in Basic Energy Sciences: Creating a Pathway for Scientific Discovery, October 2011,

Jerry Chou, Mark Howison, Brian Austin, Kesheng Wu, Ji Qiang, E Wes Bethel, Arie Shoshani, Oliver R\ ubel, Rob D Ryne, "Parallel index and query for large scale data analysis", Proceedings of 2011 international conference for high performance computing, networking, storage and analysis, 2011, 1--11, LBNL 5317E,

J. Chou, K. Wu, O. R\ ubel, M. Howison, Qiang, Prabhat, B. Austin, E. W. Bethel, D. Ryne, A. Shoshani, "Parallel Index and Query for Large Scale Data", SC11, 2011, doi: 10.1145/2063384.2063424

2010

Oliver R\ ubel, Sean Ahern, E Wes Bethel, Mark D Biggin, Hank Childs, Estelle Cormier-Michel, Angela DePace, Michael B Eisen, Charless C Fowlkes, Cameron GR Geddes, others, "Coupling visualization and data analysis for knowledge discovery from multi-dimensional scientific data", Procedia computer science, Elsevier, January 2010, 1:1757--1764, LBNL 3669E,

Gunther Weber, "Recent advances in visit: Amr streamlines and query-driven visualization", 2010,

Oliver Rübel, Sean Ahern, E. Wes Bethel, D. Biggin, Hank Childs, Estelle, Angela DePace, Michael B. Eisen Charless C. Fowlkes, Cameron G. R. Geddes, Hagen, Bernd Hamann, Min-Yu Huang, Soile E. Keränen, David W. Knowles, Cris L. Hendriks, Jitendra Malik, Jeremy Meredith Peter Messmer, Prabhat, Daniela Ushizima, H. Weber, Kesheng Wu, "Coupling visualization and data analysis for knowledge from multi-dimensional scientific data", Procedia Computer Science, 2010, 1:1751--1758, doi: 10.1016/j.procs.2010.04.197

2009

E. W. Bethel, C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, M. Day, E. Deines, T. Fogal, C. Garth, C. G. R. Geddes, H. Hagen, B. Hamann, C. Hansen, J. Jacobsen, K. Joy, J. Kruger, J. Meredith, P. Messmer, G. Ostrouchov, V. Pascucci, K. Potter, Prabhat, D. Pugmire, O. Rubel, A. Sanderson, C. Silva, D. Ushizima, G. Weber, B. Whitlock, K. Wu, "Occam's Razor and Petascale Visual Data Analysis", SciDAC 2009, J. of Physics: Conference Series, San Diego, California, July 2009, LBNL 2210E,

K Wu, S Ahern, EW Bethel, J Chen, H Childs, C Geddes, J Gu, H Hagen, B Hamann, J Lauret, others, "FastBit: Interactively Searching Massive Data", Proc. of SciDAC 2009, 2009, LBNL 2164E,

Oliver R\ ubel, Cameron GR Geddes, Estelle Cormier-Michel, Kesheng Wu, Gunther H Weber, Daniela M Ushizima, Peter Messmer, Hans Hagen, Bernd Hamann, Wes Bethel, others, "Automatic beam path analysis of laser wakefield particle acceleration data", Computational Science \& Discovery, January 2009, 2:015005, LBNL 2734E,

E Bethel, "Modern Scientific Visualization is More than Just Pretty Pictures", January 2009, LBNL 1450E,

C. G. R. Geddes, E Cormier-Michel, E. H. Esarey, C. B. Schroeder, J.-L. Vay, W. P. Leemans, D. L.. Bruhwiler, J. R. Cary, B. Cowan, M. Durant, P. Hamill, P. Messmer, P. Mullowney, C. Nieter, K. Paul, S. Shasharina, S. Veitzer, G. Weber, O. Rübel, D. Ushizima, Prabhat, E. W.Bethel, K. Wu, Large Fields for Smaller Facility Sources, SciDAC Review, Pages: 13-21, 2009,

2008

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "High Performance Multivariate Visual Data Exploration for Extemely Large Data", Supercomputing (SC), Austin, Texas, USA, November 2008, LBNL 716E,

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "Application of High-performance Visual Analysis Methods to Laser Wakefield Particle Acceleration Data", IEEE Visualization 2008, October 2008,

E. Wes Bethel, Oliver Rübel, Prabhat, Wu, Gunther H. Weber, Valerio Pascucci Hank Childs, Ajith Mascarenhas, Jeremy, Sean Ahern, "Modern Scientific Visualization is More than Just Pictures", Numerical Modeling of Space Plasma Flows: (Astronomical Society of the Pacific Series), St. Thomas, USVI, 2008, 301--317,

John M. Shalf

2012

Mehmet Balman, Eric Pouyoul, Yushu Yao, E. Wes Bethel, Burlen Loring, Prabhat, John Shalf, Alex Sim, and Brian L. Tierney, "Experiences with 100G Network Applications", In Proceedings of the Fifth international Workshop on Data-intensive Distributed Computing, in conjunction with ACM High Performance Distributing Computing (HPDC) Conference, 2012, Delft, Netherlands, June 2012, LBNL 5603E, doi: 10.1145/2286996.2287004

100Gbps networking has finally arrived, and many research and educational in- stitutions have begun to deploy 100Gbps routers and services. ESnet and Internet2 worked together to make 100Gbps networks available to researchers at the Super- computing 2011 conference in Seattle Washington. In this paper, we describe two of the first applications to take advantage of this network. We demonstrate a visu- alization application that enables remotely located scientists to gain insights from large datasets. We also demonstrate climate data movement and analysis over the 100Gbps network. We describe a number of application design issues and host tuning strategies necessary for enabling applications to scale to 100Gbps rates. 

2006

Luke Gosink, John Shalf, Kurt Stockinger, Wu, Wes Bethel, "HDF5-FastQuery: Accelerating Complex Queries on Datasets using Fast Bitmap Indices", SSDBM 2006, Vienna, Austria, July 2006, IEEE Computer Society Press., 2006, 149--158,

2005

Kurt Stockinger, John Shalf, Kesheng Wu, E Wes Bethel, "Query-driven visualization of large data sets", VIS 05. IEEE Visualization, 2005., 2005, 167--174,

Arie Shoshani

2019

Beytullah Yildiz, Kesheng Wu, Suren Byna, Arie Shoshani, "Parallel membership queries on very large scientific data sets using bitmap indexes", Concurrency and Computation: Practice and Experience, January 1, 2019, 31:e5157,

Many scientific applications produce very large amounts of data as advances in hardware fuel computing and experimental facilities. Managing and analyzing massive quantities of scientific data is challenging as data are often stored in specific formatted files, such as HDF5 and NetCDF, which do not offer appropriate search capabilities. In this research, we investigated a special class of search capability, called membership query, to identify whether queried elements of a set are members of an attribute. Attributes that naturally have classification values appear frequently in scientific domains such as category and object type as well as in daily life such as zip code and occupation. Because classification attribute values are discrete and require random data access, performing a membership query on a large scientific data set creates challenges. We applied bitmap indexing and parallelization to membership queries to overcome these challenges. Bitmap indexing provides high performance not only for low cardinality attributes but also for high cardinality attributes, such as floating‐point variables, electric charge, or momentum in a particle physics data set, due to compression algorithms such as Word‐Aligned Hybrid. We conducted experiments, in a highly parallelized environment, on data obtained from a particle accelerator model and a synthetic data set.

2016

Deborah A Agarwal, Boris Faybishenko, Vicky L Freedman, Harinarayan Krishnan, Gary Kushner, Carina Lansing, Ellen Porter, Alexandru Romosan, Arie Shoshani, Haruko Wainwright, others, "A science data gateway for environmental management", Concurrency and Computation: Practice and Experience, 2016, 28:1994--2004,

2015

Xiaocheng (Chris) Zou, Suren Byna, Hans Johansen, Daniel Martin, Nagiza F. Samatova, Arie Shoshani, John Wu, "Six-fold Speedup of Ice Calving Detection Achieved by AMR-aware Parallel Connected Component Labeling", SciDAC PI Meeting, July 2015, 2015,

Elaheh Pourabbas, Arie Shoshani, "The Composite Data Model: A Unified Approach for Combining and Querying Multiple Data Models", IEEE Trans. Knowl. Data Eng, 2015, 27(5):1424-1437,

2014

US Patent 8,705,342 B2. “Co-scheduling of network resource provisioning and host-to-host bandwidth reservation on high-performance network and storage systems”, D. Yu, D. Katramatos, A. Sim, and A. Shoshani, Apr. 22, 2014, LBNL IB-3152, BNL BSA 11-02.

Spyros Blanas, Kesheng Wu, Surendra Byna, Bin Dong, Arie Shoshani, "Parallel Data Analysis Directly on Scientific File Formats", SIGMOD 14, 2014, 385--396, doi: 10.1145/2588555.2612185

DP Schissel, Gheni Abla, SM Flanagan, M Greenwald, X Lee, A Romosan, A Shoshani, J Stillerman, J Wright, "Automated metadata, provenance cataloging and navigable interfaces: Ensuring the usefulness of extreme-scale data", Fusion Engineering and Design, North-Holland, 2014,

John C Wright, Martin Greenwald, Joshua Stillerman, Gheni Abla, Bobby Chanthavong, Sean Flanagan, David Schissel, Xia Lee, Alex Romosan, Arie Shoshani, The MPO API: A tool for recording scientific workflows, Fusion Engineering and Design, 2014,

Qian Sun, Fan Zhang, Tong Jin, Hoang Bui, Kesheng Wu, Arie Shoshani, Hemanth Kolla, Scott Klasky, Jacqueline Chen, Manish Parashar, "Scalable run-time data indexing and querying for scientific simulations", Big Data Analytics: Challenges and Opportunities (BDAC-14) Workshop at Supercomputing Conference, 2014,

2013

Alex Romosan, Arie Shoshani, Kesheng Wu, Victor Markowitz, Kostas Mavrommatis, "Accelerating gene context analysis using bitmaps", Proceedings of the 25th International Conference on Scientific and Statistical Database Management, 2013, 1--12, LBNL 6397E,

2012

Karen L. Schuchardt, Deborah A. Agarwal, Stefan A. Finsterle, Carl W. Gable, Ian Gorton, Luke J. Gosink, Elizabeth H. Keating, Carina S. Lansing, Joerg Meyer, William A.M. Moeglein, George S.H. Pau, Ellen A. Porter, Sumit Purohit, Mark L. Rockhold, Arie Shoshani, and Chandrika Sivaramakrishnan, Akuna, "Integrated Toolsets Supporting Advanced Subsurface Flow and Transport Simulations for Environmental Management", XIX International Conference on Computational Methods in Water Resources (CMWR 2012), University of Illinois at Urbana-Champaign, June 2012,

D. Yu, D. Katramatos, A. Shoshani, A. Sim, J. Gu, V. Natarajan, "StorNet: Integrating Storage Resource Management with Dynamic Network Provisioning for Automated Data Transfer", International Committee for Future Accelerators (ICFA) Standing Committee on Inter-Regional Connectivity (SCIC) 2012 Report: Networking for High Energy Physics, 2012,

Benson Ma, Arie Shoshani, Alex Sim, Kesheng, Yong-Ik Byun, Jaegyoon Hahm, Min-Su Shin, "Efficient Attribute-Based Data Access in Astronomy", The 2nd International Workshop on Network-Aware Data Workshop (NDM2012), 2012, 562--571,

Surendra Byna, Jerry Chou, Oliver Rubel, Homa Karimabadi, William S Daughter, Vadim Roytershteyn, E Wes Bethel, Mark Howison, Ke-Jou Hsu, Kuan-Wu Lin, others, "Parallel I/O, analysis, and visualization of a trillion particle simulation", SC 12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, January 2012, 1--12,

Elaheh Pourabbas, Arie Shoshani, Kesheng Wu, "Minimizing index size by reordering rows and columns", International Conference on Scientific and Statistical Database Management, January 2012, 467--484,

G. F. Lofstead, Q. Liu, J. Logan, Y. Tian, Abbasi, N. Podhorszki, J. Y. Choi, S., R. Tchoua, R. A. Oldfield, others, "Hello ADIOS: The Challenges and Lessons of Leadership Class I/O Frameworks", 2012,

Karen L. Schuchardt, Deborah A. Agarwal, Stefan A. Finsterle, Carl W. Gable, Ian Gorton, Luke J. Gosink, Elizabeth H. Keating, Carina S. Lansing, Joerg Meyer, William A.M. Moeglein, George S.H. Pau, Ellen A. Porter, Sumit Purohit, Mark L. Rockhold, Arie Shoshani, Chandrika Sivaramakrishnan, "Akuna-Integrated Toolsets Supporting Advanced Subsurface Flow and Transport Simulations for Environmental Management", XIX International Conference on Computational Methods in Water Resources (CMWR 2012), University of Illinois at Urbana-Champaign, June 17-22, 2012, 2012,

2011

J. Gu, D. Katramatos, X. Liu, V. Natarajan, A. Shoshani, A. Sim, D. Yu, S. Bradley, S. McKee, "StorNet: Integrated Dynamic Storage and Network Resource Provisioning and Management for Automated Data Transfers", Journal of Physics: Conf. Ser., 2011, 331, doi: 10.1088/1742- 6596/331/1/012002

A. Shoshani, I. Altintas, J. Chen, G. Chin, A. Choudhary, D. Crawl, T. Critchlow, K. Gao, B. Grimm, H. Iyer, C. Kamath, A. Khan, S. Klasky, S. Koehler, S. Lang, R. Latham, J. W. Li, W. Liao, J. Ligon, Q. Liu, B. Ludaescher, P. Mouallem, M. Nagappan, N. Podhorszki, R. Ross, D. Rotem, N. Samatova, C. Silva, A. Sim, R. Tchoua, R. Thakur, M. Vouk, K. Wu, W. Yu, "The Scientific Data Management Center: Available Technologies and Highlights", SciDAC Conference, 2011,

Junmin Gu, Dimitrios Katramatos, Xin Liu, Vijaya Natarajan, Arie Shoshani, Alex Sim, Dantong Yu, Scott Bradley, Shawn McKee, "StorNet: Co-Scheduling of End-to-End Bandwidth Reservation on Storage and Network Systems for High Performance Data Transfers", IEEE INFOCOM HSN 2011, 2011,

Dean N. Williams, Ian T. Foster, Don E. Middleton, Rachana Ananthakrishnan, Neill Miller, Mehmet Balman, Junmin Gu, Vijaya Natarajan, Arie Shoshani, Alex Sim, Gavin Bell, Robert Drach, Michael Ganzberger, Jim Ahrens, Phil Jones, Daniel Crichton, Luca Cinquini, David Brown, Danielle Harper, Nathan Hook, Eric Nienhouse, Gary Strand, Hannah Wilcox, Nathan Wilhelmi, Stephan Zednik, Steve Hankin, Roland Schweitzer, John Harney, Ross Miller, Galen Shipman, Feiyi Wang, Peter Fox, Patrick West, Stephan Zednik, Ann Chervenak, Craig Ward, "Earth System Grid Center for Enabling Technologies (ESG-CET): A Data Infrastructure for Data-Intensive Climate Research", SciDAC Conference, 2011,

Jerry Chou, Mark Howison, Brian Austin, Kesheng Wu, Ji Qiang, E Wes Bethel, Arie Shoshani, Oliver R\ ubel, Rob D Ryne, "Parallel index and query for large scale data analysis", Proceedings of 2011 international conference for high performance computing, networking, storage and analysis, 2011, 1--11, LBNL 5317E,

Kesheng Wu, Rishi R Sinha, Chad Jones, Stephane Ethier, Scott Klasky, Kwan-Liu Ma, Arie Shoshani, Marianne Winslett, "Finding regions of interest on toroidal meshes", Computational Science \& Discovery, 2011, 4:015003,

Kesheng Wu, Surendra Byna, Doron Rotem, Arie, "Scientific Data Services -- A High-Performance I/O with Array Semantics", HPCDB, IEEE, 2011, doi: 10.11v45/2125636.2125640

J. Chou, K. Wu, O. R\ ubel, M. Howison, Qiang, Prabhat, B. Austin, E. W. Bethel, D. Ryne, A. Shoshani, "Parallel Index and Query for Large Scale Data", SC11, 2011, doi: 10.1145/2063384.2063424

Jinoh Kim, Hasan Abbasi, Luis Chac\ on, Docan, Scott Klasky, Qing Liu, Norbert, Arie Shoshani, Kesheng Wu, "Parallel In Situ Indexing for Data-intensive", LDAV, 2011, 65--72, doi: 10.1109/LDAV.2011.6092319

2010

Alex Sim, Mehmet Balman, Dean N. Williams, Arie Shoshani, Vijaya Natarajan, "Adaptive Transfer Adjustment in Efficient Bulk Data Transfer Management for Climate Datasets", The 22nd IASTED International Conference on Parallel and Distributed Computing and System, Marina Del Rey, CA, November 20, 2010, LBNL 3985E,

Many scientific applications and experiments, such as high energy and nuclear physics, astrophysics, climate observation and modeling, combustion, nano-scale material sciences, and computational biology, generate extreme volumes of data with a large number of files. These data sources are distributed among national and international data repositories, and are shared by large numbers of geographically distributed scientists. A large portion of the data is frequently accessed, and a large volume of data is moved from one place to another for analysis and storage. A challenging issue in such efforts is the limited network capacity for moving large datasets. A tool that addresses this challenge is the Bulk Data Mover (BDM), a data transfer management tool used in the Earth System Grid (ESG) community. It has been managing massive dataset transfers efficiently in the environment where the network bandwidth is limited. Adaptive transfer adjustment was studied to enhance the BDM to handle significant end-to-end performance changes in the dynamic network environments as well as to control the data transfers for the desired transfer performance. We describe the results from our hands-on data transfer management experience in the climate research community. We study a practical transfer estimation model and state our initial results from the adaptive transfer adjustment methodology. 

Mehmet Balman, Evangelos Chaniotakis, Arie Shoshani, Alex Sim, "A Flexible Reservation Algorithm for Advance Network Provisioning", ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, November 2010 (SC'10)., New Orleans, LA, IEEE Computer Society Washington, DC, USA ISBN: 978-1-4244-7559-, November 14, 2010, LBNL 4017E, doi: http://dx.doi.org/10.1109/SC.2010.4

Many scientific applications need support from a communication infrastructure that provides predictable performance, which requires effective algorithms for bandwidth reservations. Network reservation sys- tems such as ESnet’s OSCARS, establish guaranteed bandwidth of secure virtual circuits for a certain bandwidth and length of time. However, users currently cannot inquire about bandwidth availability, nor have alternative suggestions when reservation requests fail. In general, the number of reservation options is exponential with the number of nodes n, and current reservation commitments. We present a novel approach for path finding in time-dependent networks taking advantage of user-provided parameters of total volume and time constraints, which produces options for earliest completion and shortest duration. The theoretical complexity is only O(n2r2) in the worst-case, where r is the number of reservations in the desired time interval. We have implemented our algorithm and developed efficient methodologies for incorporation into network reservation frameworks. Performance measurements confirm the theoretical predictions. 

M. Balman, E. Chaniotakis, A. Shoshani, A. Sim, "A New Approach in Advance Network Reservation and Provisioning for High-Performance Scientific Data Transfers", 2010, LBNL 4091E,

Julian Cummings, Jay Lofstead, Karsten Schwan, Alexander Sim, Arie Shoshani, Ciprian Docan, Manish Parashar, Scott Klasky, Norbert Podhorszki, Roselyne Barreto, "EFFIS: An End-to-end Framework for Fusion Integrated Simulation", 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010,

A. Sim, D. Gunter, V. Natarajan, A. Shoshani, D. Williams, J. Long, J. Hick, J. Lee, E. Dart, "Efficient Bulk Data Replication for the Earth System Grid", Data Driven E-science: Use Cases and Successful Applications of Distributed Computing Infrastructures (ISGC 2010), (Springer-Verlag New York Inc: 2010) Pages: 435

Kesheng Wu, Arie Shoshani, Kurt Stockinger, "Analyses of multi-level and multi-component compressed indexes", ACM Transactions on Database Systems, ACM, 2010, 35:1--52, doi: 10.1145/1670243.1670245

E. Pourabbas, A. Shoshani, "Improving Estimation Accuracy of Aggregate Queries on Data Cubes", Data & Knowledge Engineering 69 (2010), January 1, 2010, 69:50-72,

2009

Scientific Data Management: Challenges, Technology, and Deployment, edited by Arie Shoshani and Doron Rotem, (Chapman & Hall/CRC Computational Science: December 2009)

A. Sim, A. Shoshani, F. Donno, J. Jensen, Storage Resource Manager Interface Specification V2.2 Implementations Experience Report, Open Grid Forum, GFD.154, 2009,

D. N. Williams, R. Ananthakrishnan, D. E. Bernholdt, S. Bharathi, D. Brown, M. Chen, A. L. Chervenak, L. Cinquini, R. Drach, I. T. Foster, P. Fox, D. Fraser, J. Garcia, S. Hankin, P. Jones, D. E. Middleton, J. Schwidder, R. Schweitzer, R. Schuler, A. Shoshani, F. Siebenlist, A. Sim, W. G. Strand, M. Su, N. Wilhelmi, "The Earth System Grid: Enabling Access to Multimodel Climate Simulation Data", American Meteorological Society, 2009, 90(2):195-205,

M. Riedel, E. Laure, Th. Soddemann, L. Field, J. P. Navarro, J. Casey, M. Litmaath, J. Ph. Baud, B. Koblitz, C. Catlett, D. Skow, C. Zheng, P. M. Papadopoulos, M. Katz, N. Sharma, O. Smirnova, B. Kónya, P. Arzberger, F. Würthwein, A. S. Rana, T. Martin, M. Wan, V. Welch, T. Rimovsky, S. Newhouse, A. Vanni, Y. Tanaka, Y. Tanimura, T. Ikegami, D. Abramson, C. Enticott, G. Jenkins, R. Pordes, N. Sharma, S. Timm, N. Sharma, G. Moont, M. Aggarwal, D. Colling, O. van der Aa, A. Sim, V. Natarajan, A. Shoshani, J. Gu, S. Chen, G. Galang, R. Zappi, L. Magnoni, V. Ciaschini, M. Pace, V. Venturi, M. Marzolla, P. Andreetto, B. Cowles, S. Wang, Y. Saeki, H. Sato, S. Matsuoka, P. Uthayopas, S. Sriprayoonsakul, O. Koeroo, M. Viljoen, L. Pearlman, S. Pickles, David Wallom, G. Moloney, J. Lauret, J. Marsteller, P. Sheldon, S. Pathak, S. De Witt, J. Mencák, J. Jensen, M. Hodges, D. Ross, S. Phatanapherom, G. Netzer, A. R. Gregersen, M. Jones, S. Chen, P. Kacsuk, A. Streit, D. Mallmann, F. Wolf, T. Lippert, Th. Delaitre, E. Huedo, N. Geddes, "Interoperation of world-wide production e-Science infrastructures", Concurrency and Computation: Practice and Experience, 2009, 21(8):961-990,

Arie Shoshani, Flavia Donno, Junmin Gu, Jason Hick, Maarten Litmaath, Alex Sim, "Dynamic Storage Management", Scientific Data Management: Challenges, Technology, and Deployment, edited by Arie Shoshani, Doron Rotem, (Chapman & Hall/CRC Computational Science: 2009)

K Wu, S Ahern, EW Bethel, J Chen, H Childs, C Geddes, J Gu, H Hagen, B Hamann, J Lauret, others, "FastBit: Interactively Searching Massive Data", Proc. of SciDAC 2009, 2009, LBNL 2164E,

2008

P. Jakl, J. Lauret, A. Hanushevsky, A. Shoshani, A. Sim, J. Gu, "Grid data access on widely distributed worker nodes using scalla and SRM", Journal of Physics: Conf. Ser., 2008, 119, doi: 10.1088/1742-6596/119/7/072019

D. N. Williams, R. Ananthakrishnan, D. E. Bernholdt, S. Bharathi, D. Brown, M. Chen, A. L. Chervenak, L. Cinquini, R. Drach, I. T. Foster, P. Fox, S. Hankin, V. E. Henson, P. Jones, D. E. Middleton, J. Schwidder, R. Schweitzer, R. Schuler, A Shoshani, F. Siebenlist, A. Sim, W. G. Strand, N. Wilhelmi, M. Su, "Data Management and Analysis for the Earth System Grid", SciDAC Conference, 2008,

Alex Sim, Arie Shoshani (Editors), Paolo Badino, Olof Barring, Jean‐Philippe Baud, Ezio Corso, Shaun De Witt, Flavia Donno, Junmin Gu, Michael Haddox‐Schatz, Bryan Hess, Jens Jensen, Andy Kowalski, Maarten Litmaath, Luca Magnoni, Timur Perelmutov, Don Petravick, Chip Watson, The Storage Resource Manager Interface Specification Version 2.2, Open Grid Forum, Document in Full Recommendation, GFD.129, 2008,

C S Chang, S Klasky, J Cummings, R. Samtaney, A Shoshani, L Sugiyama, D Keyes, S Ku, G Park, S Parker, N Podhorszki, H. Strauss, H Abbasi, M Adams, R Barreto, G Bateman, K Bennett, Y Chen, E D’Azevedo, C Docan, S Ethier, E Feibush, L Greengard, T Hahm, F Hinton, C Jin, A. Khan, A Kritz, P Krsti, T Lao, W Lee, Z Lin, J Lofstead, P Mouallem, M Nagappan, A Pankin, M Parashar, M Pindzola, C Reinhold, D Schultz, K Schwan, D. Silver, A Sim, D Stotler, M Vouk, M Wolf, H Weitzner, P Worley, Y Xiao, E Yoon, D Zorin, "Toward a first- principles integrated simulation of tokamak edge plasmas", Journal of Physics: Conf. Ser., 2008, 125, doi: 10.1088/1742-6596/125/1/012042

R Ananthakrishnan, D E Bernholdt, S Bharathi, D Brown, M Chen, A L Chervenak, L Cinquini, R Drach, I T Foster, P Fox, D Fraser, K Halliday, S Hankin, P Jones, C Kesselman, D E Middleton, J Schwidder, R Schweitzer, R Schuler, A Shoshani, F Siebenlist, A Sim, W G Strand, N Wilhelmi, M Su, D N Williams, "Building a global federation system for climate change research: the earth system grid center for enabling technologies (ESG-CET)", Journal of Physics: Conf. Ser., 2008, 78, doi: 10.1088/1742-6596/78/1/012050

Meiyappan Nagappan, Mladen A. Vouk, Kesheng Wu Alex Sim, Arie Shoshani, "Efficient Operational Profiling of Systems Using Arrays on Execution Logs", ISSRE, 2008, 313--314, doi: 10.1109/ISSRE.2008.45

Kurt Stockinger, John Cieslewicz, Kesheng Wu, Rotem, Arie Shoshani, "Using Bitmap Indexing Technology for Combined and Text Queries", Annals of Information Systems, (Springer: 2008) Pages: 1--23

Rishi Rakesh Sinha, Marianne Winslett, Kesheng, Kurt Stockinger, Arie Shoshani, "Adaptive Bitmap Indexes for Space-Constrained", ICDE 2008, 2008, 1418--1420,

Kesheng Wu, Kurt Stockinger, Arie Shoshani, "Breaking the curse of cardinality on bitmap indexes", International Conference on Scientific and Statistical Database Management, 2008, 348--365,

2007

L. Abadie, P. Badino, J. Baud, E. Corso, M. Crawford, S. De Witt, F. Donno, A. Forti, P. Fuhrmann,
G. Grosdidier, J. Gu , J. Jensen, S. Lemaitre, M. Litmaath, D. Litvinsev, G. Lo Presti, L. Magnoni, T. Mkrtchan, A. Moibenko, V. Natarajan, G. Oleynik, T. Perelmutov, D. Petravick, A. Shoshani, A. Sim, M. Sponza, R. Zappi,
"Storage Resource Managers: Recent International Experience on Requirements and Multiple Co-Operating Implementations", the 24th IEEE Conference on Mass Storage Systems and Technologies, 2007,

F. Donno, L. Abadie, P. Badino, J. Baud, E. Corso, M. Crawford, S. De Witt, A. Forti, P. Fuhrmann, G. Grosdidier, J. Gu , J. Jensen, S. Lemaitre, M. Litmaath, D. Litvinsev, G. Lo Presti, L. Magnoni, T. Mkrtchan, A. Moibenko, V. Natarajan, G. Oleynik, T. Perelmutov, D. Petravick, A. Shoshani, A. Sim, M. Sponza, R. Zappi, "Storage Resource Manager version 2.2: design, implementation, and testing experience", Journal of Physics: Conf. Ser., 2007, 119, doi: 10.1088/1742-6596/119/6/062028

Elaheh Pourabbas, Arie Shoshani, "Efficient Estimation of Joint Queries from Multiple OLAP Databases", ACM Transactions on Database Systems (TODS), March 1, 2007, Volume 3,

Kesheng Wu, Kurt Stockinger, Arie Shoshani, Performance of Multi-Level and Multi-Component Bitmap Indexes, 2007, doi: 10.1145/1670243.1670245

Frederick Reiss, Kurt Stockinger, Kesheng Wu, Shoshani, Joseph M. Hellerstein, "Enabling Real-Time Querying of Live and Historical Data", SSDBM 2007, 2007,

2006

Elaheh Pourabbas, Arie Shoshani, "The Composite OLAP-Object Data Model: Removing an Unnecessary Barrier", International Conference on Scientific and Statistical Database Management (SSDBM) 2006, July 3, 2006, 291-300,

A. Shoshani, A. Sim, K. Stockinger, "RRS: Replica Registration Service for Data Grids", Lecture Notes in Computer Science, edited by Jean-Marc Pierson, (Springer-Verlag GmbH Publisher: 2006) Pages: 100-112

D. E. Middleton, D. E. Bernholdt, D. Brown, M. Chen, A. L. Chervenak, L. Cinquini, R. Drach, P. Fox, P. Jones, C. Kesselman, I. T. Foster, V. Nefedova, A. Shoshani, A. Sim, W. G. Strand, D. Williams, "Enabling worldwide access to climate simulation data: the earth system grid (ESG)", SciDAV Conference, 2006,

P. Jakl, J. Lauret, A. Hanushevky, A. Shoshani, A. Sim, "From rootd to Xrootd, from physical to logical files: experience on accessing and managing distributed data", Computing in High Energy Physics (CHEP), 2006,

E. Hjort, L. Hajdu, J. Lauret, D. Olson, A. Sim, A. Shoshani, "Data and Computational Grid Coupling in RHIC/STAR – An Analysis Scenario using SRM Technology", Computing in High Energy Physics (CHEP), 2006,

Kesheng Wu, Ekow J Otoo, Arie Shoshani, "Optimizing bitmap indices with efficient compression", ACM Transactions on Database Systems (TODS), 2006, 31:1--38,

K. Wu, K. Stockinger, A. Shoshani, Wes, "FastBit--Helps Finding the Proverbial Needle in a", 2006, LBNL LBNL-PUB/963,

F. Reiss, K. Stockinger, K. Wu, A. Shoshani J. M. Hellerstein, "Efficient analysis of live and historical streaming and its application to cybersecurity", 2006,

2005

D. Bernholdt, S. Bharathi, D. Brown, K. Chanchio, M. Chen, A. Chervenak, L. Cinquini, B. Zrach, I. Foster, P. Fox, J. Garcia, C. Kesselman, R. Markel, D. Middleton, V. Nefedova, L. Pouchard, A. Shoshani, A. Sim, G. Strand, D. Williams, "The Earth System Grid: Supporting the Next Generation of Climate Modeling Research", IEEE, 2005, 93(3):485-495,

A. Shoshani, A. Sim, K. Stockinger, "RRS: Replica Registration Service for Data Grids", International Workshop on Data Management in Grids, 2005,

Arie Shoshani, Alex Sim, Kurt Stockinger, "Replica Registration Service Functional Interface Specification 1.0", 2005, LBNL 57520,

Kesheng Wu, Junmin Gu, Jerome Lauret, Arthur Poskanzer, Arie Shoshani, Alexander Sim, Zhang, "Grid Collector: Facilitating Efficient Selective from Data Grids", International Supercomputer Conference 2005, 2005,

2004

Eric Hjort, Doug Olson, Jerome Lauret, Arie Shoshani, Alex Sim, "Production mode Data- Replication framework in STAR using the HRM Grid middleware", Computing in High Energy Physics, 2004,

Alex Sim, Junmin Gu, Arie Shoshani, Vijaya Natarajan, "DataMover: Robust Terabytes-Scale Multi-file Replication over Wide-Area Networks", the 16th International Conference on Scientific and Statistical Database Management (SSDBM 2004), 2004,

Kesheng Wu, Ekow J Otoo, Arie Shoshani, "An efficient compression scheme for bitmap indices", 2004,

Kesheng Wu, Wei-Ming Zhang, Victor, Jerome Lauret, Arie Shoshani, "The Grid Collector: Using an Event Catalog to Speed up Analysis in Distributed Environment", Proceedings of Computing in High Energy and Nuclear (CHEP) 2004, 2004,

K. Wu, A. Shoshani, E. J. Otoo, Word aligned bitmap compression method, data and apparatus, US Patent 6,831,575, 2004,

2003

Elaheh Pourabbas, Arie Shoshani, "Answering Joint Queries from Multiple Aggregate OLAP Databases", Data Warehousing and Knowledge Discovery, 5th International Conference, DaWaK 2003, September 3, 2003, 24-34,

Arie Shoshani, Alexander Sim, Junmin Gu, "Storage Resource Managers: Essential Components for the Grid", Grid Resource Management: State of the Art and Future Trends, edited by Jarek Nabrzyski, Jennifer M. Schopf, Jan Weglarz, (Kluwer Academic Publishers: 2003)

Ann L. Chervenak, Ewa Deelman, Carl Kesselman, William E. Allcock, Ian T. Foster, Veronika Nefedova, Jason Lee, Alex Sim, Arie Shoshani, Bob Drach, Dean Williams, Don Middleton, "High-performance remote access to climate simulation data: a challenge problem for data grid technologies", Parallel Computing, 2003, 29(10):1335-1356,

A. Sim, J. Gu, A. Shoshani, E. Hjort, D. Olson, "Experience with Deploying Storage Resource Managers to Achieve Robust File Replication", Computing in High Energy Physics, 2003,

D. Yu, J. Lauret, A. Shoshani, D. Oldon, E. Hjort, A. Sim, "The Design of High Performance Data Replication in the Grid Environment for the STAR Collaboration", Computing in High Energy Physics, 2003,

L. Pouchard, L. Cinquini, B. Drach, D. Middleton, D. Bernholdt, K. Chanchio, I. Foster, V. Nefedova, D. Brown, P. Fox, J. Garcia, G. Strand, D. Williams, A. Chervenak, C. Kesselman, A. Shoshani, A. Sim, "An Ontology for Scientific Information in a Grid Environment: the Earth System Grid", the Symposium on Cluster Computing and the Grid (CCGrid), 2003,

Arie Shoshani, Alex Sim, Junmin Gu, Storage Resource Managers: Essential Components for Grid Applications, Globus World, 2003,

Kesheng Wu, Wei-Ming Zlang, Alexander Sim, Junmin Gu, Arie Shoshani, "Grid collector: An event catalog with automated file management", 2003 IEEE Nuclear Science Symposium. Conference Record (IEEE Cat. No. 03CH37515), 2003, LBNL 55563,

Kesheng Wu, Wei-Ming Zhang, Alexander Sim, Gu, Arie Shoshani, "Grid Collector: An Event Catalog With Automated File", Proceedings of IEEE Nuclear Science Symposium 2003, 2003, doi: 10.1109/NSSMIC.2003.1351830

2002

Elaheh Pourabbas, Arie Shoshani, "Joint Queries Estimation from Multiple OLAP Databases", International Conference on Scientific and Statistical Database Management, 2002 (SSDBM’02), July 24, 2002,

A. Shoshani, A. Sim, J. Gu, "Storage Resource Managers: Middleware components for Grid Storage", the 19th IEEE Symposium on Mass Storage Systems, 2002,

2001

B. Allcock, I. Foster, V. Nefedova, A. Chervenak, E. Deelman, C. Kesselman, J. Lee, A. Sim, A. Shoshani, B. Drach, D. Williams, "High-Performance Remote Access to Climate Simulation Data: A Challenge Problem for Data Grid Technologies", Super Computing 2001, 2001,

A. Sim, H. Nordberg, L.M. Bernardo, A. Shoshani, D. Rotem, "Experience with using CORBA to implement a file caching coordination system", Concurrency and Computation: Practice and Experience, 2001, 13:1-15,

D. Olson, E. Hjort, J. Lauret, M. Messer, A. Shoshani, A. Sim, "Non-shared Disk Cluster - A Fault Tolerant, Commodity Approach to Hi-Bandwidth Data Analysis", Computing in High Energy Physics, 2001,

L Bernardo, H Nordberg, D Olson, A Shoshani, A Sim, A Vaniachine, D Zimmerman, B Gibbard, R Porter, T Wenaus, others, "New capabilities in the HENP grand challenge storage access system and its application at RHIC", Computer physics communications, 2001, 140:179--188,

2000

A. Shoshani, A. Sim, L.M. Bernerdo, H. Nordberg, "Coordinating Simultaneous Caching of File Bundles from Tertiary Storage", International Conference on Scientific and Statistical Database Management (SSDBM), 2000,

L. M. Bernardo, B. Gibbard, D. Malon, H. Nordberg, D. Olson, R. Porter, A. Shoshani, A. Sim, A. Vaniachine, T. Wenaus, K. Wu, D. Zimmerman, "New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC", Computing in High Energy Physics, 2000,

L. M. Bernardo, A. Shoshani, A. Sim, H. Nordberg, "Access Coordination Of Tertiary Storage For High Energy Physics Applications", the 17th IEEE Symposium on Mass Storage Systems, 2000,

A. Sim, A. Shoshani, HRM: Hierarchical Resource Manager, Globus World, 2000,

A. Sim, A. Shoshani, L. M. Bernardo, H. Nordberg, A Storage Access Coordination System for Perabyte Scale Scientific Data, IONA World, 2000,

1999

A. Sim, H. Nordberg, L. M. Bernardo, A. Shoshani, D. Rotem, "Storage Access Coordination Using CORBA", Distributed Objects and Application, 1999, 168-175,

A. Shoshani, L.M. Bernardo, H. Nordberg, D. Rotem and A. Sim, "Multidimensional Indexing and Query Coordination for Tertiary Storage Management", International Conference on Scientific and Statistical Database Management, 1999, 214-225,

1998

L.M. Bernardo, D. Rotem, A. Shoshani, H. Nordberg, A. Sim, "Using Access Patterns to Partition Large Datasets on Tertiary Storage in Order to Minimize Retrieval Costs", 1998, LBNL 41504,

A. Shoshani, L.M. Bernardo, H. Nordberg, D. Rotem, A. Sim, "Storage Management for High Energy Physics Applications", Computing in High Energy Physics, 1998,

Alex Sim

2021

M. Nakashima, A. Sim, Y. Kim, J. Kim, J. Kim, "Automated Variable Selection for Network Anomaly Detection", ACM Transactions on Management Information Systems (TMIS), 2021,

A. Syal, A. Lazar, J. Kim, A. Sim, K. W, "Network Traffic Performance Analysis and Anomaly Detection using Supervised Machine Learning", International Journal of Big Data Intelligence, Special Issue on Systems and Network Telemetry and Analytics, 2021,

2020

Ling Jin, Alina Lazar, James Sears, Annika Todd, Alex Sim, Kesheng Wu, Hung-Chai Yang, C. Anna Spurlock, "Clustering Life Course to Understand the Heterogeneous Effects of Life Events, Gender and Generation on Habitual Travel Modes", IEEE Access, 2020, 1-17, doi: 10.1109/ACCESS.2020.3032328

B. Weinger, J. Kim, A. Sim, M. Nakashima, N. Moustafa, K. Wu, "Enhancing IoT Anomaly Detection Performance for Federated Learning", The 16th IEEE International Conference on Mobility, Sensing and Networking (IEEE MSN 2020), 2020,

B. Cho, T. Dayrit, Y. Gao, Z. Wang, T. Hong, A. Sim, K. Wu, "Effective Missing Value Imputation Methods for Building Monitoring Data", The 2nd International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD 2020) in conjunction with IEEE International Conference on Big Data (IEEE BigData 2020), 2020,

J. Kim, A. Sim, J. Kim, K. Wu, "Botnets Detection Using Recurrent Variational Autoencoder", IEEE Global Communications Conference (Globecom 2020), 2020,

I. Monga, C. Guok, J. MacAuley, A. Sim, H. Newman, J. Balcas, P. DeMar, L. Winkler, T. Lehman, X. Yang, "SDN for End-to-end Networked Science at the Exascale", Future Generation Computer Systems, 2020, doi: 10.1016/j.future.2020.04.018

Brett Weinger, Alex Sim (Advisor), John Wu (Advisor), Jinoh Kim (Advisor), "Enhancing IoT Anomaly Detection Performance for Federated Learning", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’20), ACM Student Research Competition (SRC), 2020,

D. Bard, C. Snavely, L. Gerhardt, J. Lee, B. Totzke, K. Antypas, S. Byna, R. Cheema, S. Cholia, M. Day, B. Enders, A. Gaur, A. Greiner, T. Groves, M. Kiran, Q. Koziol, K. Rowland, C. Samuel, A. Selvarajan, A. Sim, D. Skinner, R. Thomas, G. Torok, The Superfacility project: automated pipelines for experiments and HPC, International Conference for High Performance Computing, Networking, Storage, and Analysis (SC20), State of the Practice (SOP), 2020,

B. Enders, D. Bard, C. Snavely, L. Gerhardt, J. Lee, B. Totzke, K. Antypas, S. Byna, R. Cheema, S. Cholia, M. Day, A. Gaur, A. Greiner, T. Groves, M. Kiran, Q. Koziol, K. Rowland, C. Samuel, A. Selvarajan, A. Sim, D. Skinner, R. Thomas, G. Torok, "Cross-facility science with the Superfacility Project at LBNL", 2nd Workshop on Large-scale Experiment-in-the-Loop Computing (XLOOP 2020), in conjunction with the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC 20), 2020,

A. Sim, Statistical Pattern Detection with Locally Exchangeable Measures, International Conference on Advanced Communications and Computation (INFOCOMP 2020), 2020,

C. A. Spurlock, A. Gopal, J. Auld, P. Leiby, C. Sheppard, T. Wenzel, S. Belal, A. Duvall, A. Enam, S. Fujita, A. Henao, L. Jin, E. Kontou, A. Lazar, Z. Needell, C. Rames, T. Rashidi, J. Sears, A. Sim, M. Stinson, M. Taylor, A. Todd-Blick, O. Verbas, V. Walker, J. Ward, G. Wong-Parodi, K. Wu, H.-C. Yang, "SMART Mobility, Mobility Decision Science Capstone Report", Vehicle Technologies Office (VTO), Office of Energy Efficiency and Renewable Energy (EERE), US Department of Energy, 2020,

Sunggon Kim, Alex Sim, Kesheng Wu, Suren Byna, Yongseok Son, Hyeonsang Eom, "Towards hpc i/o performance prediction through large-scale log analysis", Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020), 2020, 77--88, doi: 10.1145/3369583.3392678

Gaurav R Ghosal, Dipak Ghosal, Alex Sim, Aditya V Thakur, Kesheng Wu, "A Deep Deterministic Policy Gradient Based Network Scheduler For Deadline-Driven Data Transfers", Proceedings of International Federation for Information Processing (IFIP) Networking Conference (NETWORKING 2020), 2020, 253--261,

Jeeyung Kim, Alex Sim, Jinoh Kim, Kesheng Wu, Jaegyoon Hahm, "Transfer Learning Approach for Botnet Detection Based on Recurrent Variational Autoencoder", ACM International Workshop on ​System and Network Telemetry and Analysis (SNTA 2020), in conjunction with The 29th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020), 2020, 41--47, doi: 10.1145/3391812.3396273

Jiwoo Bang, Chungyong Kim, Kesheng Wu, Alex Sim, Suren Byna, Sunggon Kim, Hyeonsang Eom, "HPC Workload Characterization Using Feature Selection and Clustering", ACM International Workshop on ​System and Network Telemetry and Analysis (SNTA 2020), in conjunction with The 29th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020), 2020, 33--40, doi: 10.1145/3391812.3396270

M. Nakashima, A. Sim, J. Kim, "Evaluation of Deep Learning Models for Network PerformancePrediction for Scientific Facilities", the 3rd ACM International Workshop on System and Network Telemetry and Analysis (SNTA) 2020, in conjunction with The 29th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2020, doi: 10.1145/3391812.3396272

S. Bhandari, A. K. Kukreja, A. Lazar, A. Sim, K. Wu, "Feature Selection and Tree-based Classification for Wireless Intrusion Detection", the 3rd ACM International Workshop on System and Network Telemetry and Analysis (SNTA) 2020, in conjunction with The 29th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2020, doi: 10.1145/3391812.3396274

Qiao Kang, Alex Sim, Peter Nugent, Sunwoo Lee, Wei-keng Liao, Ankit Agrawal, Alok Choudhary, Kesheng Wu, "Predicting Resource Requirement in Intermediate Palomar Transient Factory Workflow", 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID 2020), 2020, 619--628, doi: 10.1109/CCGrid49817.2020.00-31

H. Sung, J. Bang, C. Kim, H. Kim, A. Sim, G. K. Lockwood, H. Eom, "BBOS: Efficient HPC Storage Management via Burst Buffer Over-Subscription", the 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid 2020), 2020, doi: 10.1109/CCGrid49817.2020.00-79

L. Jin, A. Lazar, J. Sears, A. Todd, A. Sim, K. Wu, C. A. Spurlock, "Life Course as a Contextual System to Investigate the Effects of Life Events, Gender, and Generation on Travel Mode Use", Transportation Research Board (TRB) 99th Annual Meeting, 2020,

Jeeyung Kim, Alex Sim, Jinoh Kim, Kesheng Wu, Botnet Detection Using Recurrent Variational Autoencoder, arXiv preprint arXiv:2004.00234, 2020,

2019

A. Lazar, A. Ballow, L. Jin, C. A. Spurlock, A. Sim, K. Wu, "Machine Learning for Prediction of Mid to LongTerm Habitual Transportation Mode Use", International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD), in conjunction with the IEEE International Conference on Big Data (Big Data), 2019, doi: 10.1109/BigData47090.2019.9006411

L. Jin, A. Lazar, J. Sears, A. Todd, A. Sim, K. Wu, C. A. Spurlock, Life course as a contextual system to investigate the effects of life events, gender and generation on travel mode usage, The Behavior, Energy & Climate Change Conference (BECC), 2019,

J. Balcas, H. Newman, M. Spiropulu, X. Yang, T. Lehman, I. Monga, C. Guok, J. MacAuley, A. Sim, P. Demar, "SDN for End-to-End Networking at Exascale", the 24th International Conference on Computing in High Energy and Nuclear Physics (CHEP2019), 2019,

Alexandra Ballow, Alina Lazar (Advisor), Alex Sim (Advisor), Kesheng Wu (Advisor), "Handling Missing Values in Joint Sequence Analysis", ACM Richard Tapia Celebration of Diversity in Computing (TAPIA 2019), ACM Student Research Competition (SRC), First place winner, 2019,

J. Choi, A. Sim, Data reduction methods, systems and devices, U.S. Patent No. 10,366,078, 2019,

U.S. Patent No. 10,366,078, “DATA REDUCTION METHODS, SYSTEMS, AND DEVICES”, LBNL IB2013-133.

S. Kim, A. Sim, K. Wu, S. Byna, T. Wang, Y. Son, H. Eom, "DCA-IO: A Dynamic I/O Control Scheme for Parallel and Distributed File System", 19th Annual IEEE/ACM International Symposium in Cluster, Cloud, and Grid Computing (CCGrid 2019), 2019, doi: 10.1109/CCGRID.2019.00049

J. Kim, A. Sim, B. Tierney, S. Suh, I. Kim, "Multivariate Network Traffic Analysis using Clustered Patterns", Journal of Computing, April 2019, 101(4):339-361, doi: 10.1007/s00607-018-0619-4

J. Kim, A. Sim, "A new approach to multivariate network traffic analysis", Journal of Computer Science and Technology, 2019, 34(2):388–402, doi: 10.1007/s11390-019-1915-y

Alexandra Ballow, Alina Lazar, Alex Sim, Kesheng Wu, "Joint Sequence Analysis Challenges: How to Handle Missing Values and Mixed Variable Types", SIAM Conference on Computational Science and Engineering (CSE19), 2019,

Tyler Leibengood, Alina Lazar, Alex Sim, Kesheng Wu, "Network Traffic Performance Prediction with Multivariate Clusters in Time Windows", SIAM Conference on Computational Science and Engineering (CSE19), 2019,

Olivia Del Guercio, Rafael Orozco, Alex Sim, Kesheng Wu, "Multidimensional Compression with Pattern Matching", 2019 Data Compression Conference (DCC), Pages: 567--567 2019,

Alina Lazar, Ling Jin, C Anna Spurlock, Kesheng Wu, Alex Sim, Annika Todd, "Evaluating the effects of missing values and mixed data types on social sequence clustering using t-SNE visualization", Journal of Data and Information Quality (JDIQ), 2019, 11:1--22,

Sambit Shukla, Dipak Ghosal, Kesheng Wu, Alex Sim, Matthew Farrens, "Co-optimizing Latency and Energy for IoT services using HMP servers in Fog Clusters", 2019 Fourth International Conference on Fog and Mobile Edge Computing (FMEC), 2019, 121--128,

Hanul Sung, Jiwoo Bang, Alexander Sim, Kesheng Wu, Hyeonsang Eom, "Understanding Parallel I/O Performance Trends Under Various HPC Configurations", Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, 2019, 29--36,

Mengtian Jin, Youkow Homma, Alex Sim, Wilko Kroeger, Kesheng Wu, "Performance prediction for data transfers in LCLS workflow", Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, 2019, 37--44,

Olivia Del Guercio, Rafael Orozco, Alex Sim, Kesheng Wu, "Similarity-based Compression with Multidimensional Pattern Matching", Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, 2019, 19--24,

Astha Syal, Alina Lazar, Jinoh Kim, Alex Sim, Kesheng Wu, "Automatic detection of network traffic anomalies and changes", Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, 2019, 3--10,

Dipak Ghosal, Sambit Shukla, Alex Sim, Aditya V Thakur, Kesheng Wu, "A Reinforcement Learning Based Network Scheduler For Deadline-Driven Data Transfers", 2019 IEEE Global Communications Conference (GLOBECOM), 2019, 1--6,

Qiao Kang, Ankit Agrawal, Alok Choudhary, Alex Sim, Kesheng Wu, Rajkumar Kettimuthu, Peter H Beckman, Zhengchun Liu, Wei-keng Liao, "Spatiotemporal Real-Time Anomaly Detection for Supercomputing Systems", 2019 IEEE International Conference on Big Data (Big Data), 2019, 4381--4389,

Burak Cetin, Alina Lazar, Jinoh Kim, Alex Sim, Kesheng Wu, "Federated Wireless Network Intrusion Detection", 2019 IEEE International Conference on Big Data (Big Data), Pages: 6004--6006 2019,

Kesheng Wu, Alex Sim, Jonathan Wang, Seongwook Hwangbo, Methods, systems, and devices for accurate signal timing of power component events, 2019,

US Patent app no. 20190138371, “Methods, systems, and devices for accurate signal timing of power component events”

2018

Kade Gibson, Dongeun Lee, Jaesik Choi, Alex Sim, "Dynamic Online Performance Optimization in Streaming Data Compression", IEEE International Conference on Big Data (Big Data 2018), 2018, doi: 10.1109/bigdata.2018.8621867

Karen Tu, Alex Sim (Advisor), John Wu (Advisor), "Identification of Network Data Transfer Bottlenecks in HPC Systems", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’18), ACM Student Research Competition (SRC), 2018,

I. Monga, C. Guok, J. MacAuley, A. Sim, H. Newman, J. Balcas, P. DeMar, L. Winkler, T. Lehman, X. Yang, "SDN for End-to-end Networked Science at the Exascale (SENSE)", Innovate the Network for Data-Intensive Science Workshop (INDIS 2018), in conjunction with the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'18), 2018, doi: 10.1109/INDIS.2018.00007

J. Kim, J. Choi, A. Sim, "Spatio-temporal Analysis of HPC I/O and Connection Data", International Workshop on Scalable Network Traffic Analytics (SNTA 2018), 2018, in conjunction with the 38th IEEE International Conference on Distributed Computing Systems (ICDCS 2018), 2018, doi: 10.1109/icdcs.2018.00176

Taehoon Kim, Jaesik Choi, Dongeun Lee, Alex Sim, C Anna Spurlock, Annika Todd, Kesheng Wu, "Predicting baseline for analysis of electricity pricing", International Journal of Big Data Intelligence, 2018, 5:3--20,

Hongyuan Zhan, Gabriel Gomes, Xiaoye S Li, Kamesh Madduri, Alex Sim, Kesheng Wu, "Consensus ensemble system for traffic flow prediction", IEEE Transactions on Intelligent Transportation Systems, 2018, 19:3903--3914,

Cecilia Dao, Xinyu Liu, Alex Sim, Craig Tull, Kesheng Wu, "Modeling data transfers: change point and anomaly detection", 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), 2018, 1589--1594,

Rajkumar Kettimuthu, Zhengchun Liu, Ian Foster, Peter H Beckman, Alex Sim, Kesheng Wu, Wei-keng Liao, Qiao Kang, Ankit Agrawal, Alok Choudhary, "Towards autonomic science infrastructure: architecture, limitations, and open issues", Proceedings of the 1st International Workshop on Autonomous Infrastructure for Science, 2018, 1--9,

Mengying Yang, Xinyu Liu, Wilko Kroeger, Alex Sim, Kesheng Wu, "Identifying anomalous file transfer events in LCLS workflow", Proceedings of the 1st International Workshop on Autonomous Infrastructure for Science, 2018, 1--4,

Sowmya Balasubramanian, Dipak Ghosal, Kamala Narayanan Balasubramanian Sharath, Eric Pouyoul, Alex Sim, Kesheng Wu, Brian Tierney, "Auto-tuned publisher in a pub/sub system: Design and performance evaluation", 2018 IEEE International Conference on Autonomic Computing (ICAC), 2018, 21--30,

Jonathan Wang, Kesheng Wu, Alex Sim, Seongwook Hwangbo, "Feature Engineering and Classification Models for Partial Discharge in Power Transformers", Mij, 2018, 1001:60,

Tal Shachaf, Alexander Sim, Kesheng Wu, Wilko Kroeger, "Detecting Anomalies in the LCLS Workflow", 2018 IEEE International Conference on Big Data (Big Data), 2018, 3256--3260,

Alina Lazar, Kesheng Wu, Alex Sim, "Predicting Network Traffic Using TCP Anomalies", 2018 IEEE International Conference on Big Data (Big Data), Pages: 5369--5371 2018,

2017

Jinoh Kim, Alex Sim, "A New Approach to Online, Multivariate Network Traffic Analysis", 2nd Workshop on Network Security Analytics and Automation (NSAA), in conjunction with the 26th International Conference on Computer Communications and Networks (ICCCN 2017), 2017, doi: 10.1109/ICCCN.2017.8038520

J. Kim, A. Sim, S.C. Suh, I. Kim, "An Approach to Online Network Monitoring Using Clustered Patterns", International Conference on Computing, Networking and Communications (ICNC 2017), 2017, doi: 10.1109/ICCNC.2017.7876207

J. Kim, W. Yoo, A. Sim, S.C. Suh, I. Kim, "A Lightweight Network Anomaly Detection Technique", International Workshop on Computing, Networking and Communications (CNC 2017), 2017, doi: 10.1109/ICCNC.2017.7876251

Ling Jin, Doris Lee, Alex Sim, Sam Borgeson, Kesheng Wu, C Anna Spurlock, Annika Todd, "Comparison of clustering techniques for residential energy behavior using smart meter data", 2017,

Dongeun Lee, Alex Sim, Jaesik Choi, Kesheng Wu, "Expanding statistical similarity based data reduction to capture diverse patterns", 2017 Data Compression Conference (DCC), Pages: 445--445 2017,

Jonathan Wang, Wucherl Yoo, Alex Sim, Peter Nugent, Kesheng Wu, "Parallel variable selection for effective performance prediction", 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2017, 208--217,

Dongeun Lee, Alex Sim, Jaesik Choi, Kesheng Wu, "Improving statistical similarity based data reduction for non-stationary data", Proceedings of the 29th International Conference on Scientific and Statistical Database Management, 2017, 1--6,

Updated experiment version: https://sdm.lbl.gov/oapapers/ssdbm17-lee-upd.pdf
Original version: http://dl.acm.org/citation.cfm?doid=3085504.3085583

Kesheng Wu, Dongeun Lee, Alex Sim, Jaesik Choi, "Statistical data reduction for streaming data", 2017 New York Scientific Data Summit (NYSDS), 2017, 1--6,

Jonathan Wang, Kesheng Wu, Alex Sim, Seongwook Hwangbo, "Convolutional Filtering for Accurate Signal Timing from Noisy Streaming Data", 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech, 2017, 941--948,

Jonathan Wang, Kesheng Wu, Alex Sim, Seongwook Hwangbo, "Feature Engineering and Classification Models for Partial Discharge Events in Power Transformers", Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, Pages: 269--270 2017,

Alina Lazar, Ling Jin, C Anna Spurlock, Kesheng Wu, Alex Sim, "Data quality challenges with missing values and mixed types in joint sequence analysis", 2017 IEEE International Conference on Big Data (Big Data), 2017, 2620--2627,

Peter Harrington, Wucherl Yoo, Alexander Sim, Kesheng Wu, "Diagnosing parallel I/O bottlenecks in HPC applications", International Conference for High Performance Computing Networking Storage and Analysis (SCI7) ACM Student Research Competition (SRC), 2017,

Jonathan Wang, Kesheng Wu, Alex Sim, Seongwook Hwangbo, "Accurate signal timing from high frequency streaming data", 2017 IEEE International Conference on Big Data (Big Data), Pages: 4852--4854 2017,

2016

Sam Fries, Sasha Ames, Alex Sim, Dean Williams, "HPSS Connections to ESGF: BASEJumper", 2016 Earth System Grid Federation (ESGF) Conference, 2016,

M. Bae, W. Yoo (Advisor), A. Sim (Advisor), K. Wu (Advisor), "Discovering Energy Resource Usage Patterns on Scientific Clusters", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), ACM Student Research Competition (SRC), Third place winner, 2016, 2016,

M. Bryson, S. Byna (Advisor), A. Sim (Advisor), K. Wu (Advisor), "The Search for Missing Parallel IO Performance on the Cori Supercomputer", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), ACM Student Research Competition (SRC), 2016,

Wucherl Yoo, Michelle Koo, Yi Cao, Alex Sim, Peter Nugent, Kesheng Wu, "Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters", Conquering Big Data with High Performance Computing, (Springer, Cham: 2016) Pages: 139--161

Dongeun Lee, Alex Sim, Jaesik Choi, Kesheng Wu, "Novel data reduction based on statistical similarity", Proceedings of the 28th International Conference on Scientific and Statistical Database Management, 2016, 1--12,

Wucherl Yoo, Alex Sim, Kesheng Wu, "Machine learning based job status prediction in scientific clusters", 2016 SAI Computing Conference (SAI), 2016, 44--53,

David Pugmire, James Kress, Jong Choi, Scott Klasky, Tahsin Kurc, Randy Michael Churchill, Matthew Wolf, Greg Eisenhower, Hank Childs, Kesheng Wu, others, "Visualization and analysis for near-real-time decision making in distributed workflows", 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2016, 1007--1013,

Lingfei Wu, Kesheng John Wu, Alex Sim, Michael Churchill, Jong Y Choi, Andreas Stathopoulos, Choong-Seock Chang, Scott Klasky, "Towards real-time detection and tracking of spatio-temporal features: Blob-filaments in fusion plasma", IEEE Transactions on Big Data, 2016, 2:262--275,

D. Pugmire, J. Kress, J. Choi, S. Klasky, Kurc, R. M. Churchill, M. Wolf, G., H. Childs, K. Wu, A. Sim, J. Gu, J. Low, "Visualization and Analysis for Near-Real-Time Decision in Distributed Workflows", 2016 IEEE International Parallel and Distributed Symposium Workshops (IPDPSW), 2016, 1007--1013, doi: 10.1109/IPDPSW.2016.175

2015

S. Fries, A. Sim, "HPSS connections to ESGF", Earth System Grid Federation Conference, (ESGF 2015), 2015,

M. Koo, W. Yoo (advisor), A. Sim (advisor), "I/O Performance Analysis Framework on Measurement Data from Scientific Clusters", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15), ACM Student Research Competition (SRC), 2015, 2015,

J. Kim, A. Sim, "Peeking Network States with Clustered Patterns", 2015, LBNL 1003744,

K. Hu, J. Choi, A. Sim, J. Jiang, "Best Predictive Generalized Linear Mixed Model with Predictive Lasso for High-Speed Network Data Analysis", International Journal of Statistics and Probability, 2015,

S. Shannigrahi, A. J. Barczyk, C. Papadopoulos, A. Sim, I. Monga, H. Newman, K. Wu, E. Yeh, "Named Data Networking in Climate Research and HEP Applications", 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015), 2015,

W. Yoo, A. Sim, "Network Bandwidth Utilization Forecast Model on High Bandwidth Networks", IEEE International Conference on Computing, Networking and Communications (ICNC’15), 2015,

David H Bailey, Stephanie Ger, Marcos L\ opez de Prado, Alexander Sim, "Statistical overfitting and backtest performance", Risk-Based and Factor Investing, 2015,

http://ssrn.com/abstract=2507040

Wucherl Yoo, Michelle Koo, Yi Cao, Alex Sim, Peter Nugent, Kesheng Wu, "Patha: Performance analysis tool for hpc applications", 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC), 2015, 1--8,

Taehoon Kim, Dongeun Lee, Jaesik Choi, Anna Spurlock, Alex Sim, Annika Todd, Kesheng Wu, "Extracting baseline electricity usage using gradient tree boosting", 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), 2015, 734--741,

L. Wu, K. Wu, A. Sim, M. Churchill, J. Y. Choi, A. Stathopoulos, C.S. Chang, S. Klasky, "Towards Real-Time Detection and Tracking of Blob-Filaments in Fusion Plasma Big Data", WM-CS-2015-01, Department of Computer Science, College of William and Mary, 2015,

W. Yoo, M. Koo, Y. Cao, A. Sim, P. Nugent, K. Wu, PATHA: Performance Analysis Tool for HPC, 2015 IEEE 34th International Performance Computing and Conference (IPCCC), Pages: 1--8 2015, doi: 10.1109/PCCC.2015.7410313

Taehoon Kim, Dongeun Lee, Jaesik Choi, C. Anna Spurlock, Alex Sim, Annika Todd, Kesheng Wu, "Extracting Baseline Electricity Usage with Gradient Boosting", International Conference on Big Intelligence and Computing (DataCom 2015), 2015, doi: 10.1109/SmartCity.2015.156

2014

W. Yoo, A. Sim, "Efficient Changing Pattern Detection on High Bandwidth Network Measurements", 7th International Conference on Grid and Distributed Computing, 2014,

A. L. Chervenak, A. Sim, J. Gu, R. Schuler, N. Hirpathak, "Adaptation and Policy-Based Resource Allocation for Efficient Bulk Data Transfers in High Performance Computing Environments", 4th International Workshop on Network-aware Data Management (NDM'14), 2014,

John Wu, Alex Sim, Lingfei Wu, Abraham Frankl, Scott Klasky, Jong Y Choi, CS Chang, Michael Churchill, "Exercising ICEE Framework with Fusion Blob Detection", DOE/ASCR NGNS PI meeting, 2014,

US Patent 8,705,342 B2. “Co-scheduling of network resource provisioning and host-to-host bandwidth reservation on high-performance network and storage systems”, D. Yu, D. Katramatos, A. Sim, and A. Shoshani, Apr. 22, 2014, LBNL IB-3152, BNL BSA 11-02.

A. L. Chervenak, A. Sim, J. Gu, R. Schuler, N. Hirpathak, "Efficient Data Staging Using Performance-Based Adaptation and Policy-Based Resource Allocation", 22nd Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2014,

Lingfei Wu, Kesheng Wu, Alex Sim, Michael Churchill, Jong Y Choi, Andreas Stathopoulos, CS Chang, Scott Klasky, "High-performance outlier detection algorithm for finding blob-filaments in plasma", Proc. of 5rd International Workshop on Big Data Analytics: Challenges and Opportunites (BDAC-14), held in conjunction with ACM/IEEE SC14, 2014,

Lingfei Wu, Kesheng Wu, Alex Sim, Andreas Stathopoulos, "Real-time outlier detection algorithm for finding blob-filaments in plasma", ACM/IEEE SC14 ACM SRC Poster, 2014,

David H. Bailey, Stephanie Ger, Marcos L\ opez Prado, Alexander Sim, Kesheng Wu, "Statistical Overfitting and Backtest Performance", http://ssrn.com/abstract2507040, ( January 1, 2014)

ISBN 978-1-78548-008-9

L. Wu, K. Wu, A. Sim, M. Churchill, J. Y. Choi, A. Stathopoulos, CS Chang, S. Klasky, "High-Performance Outlier Detection Algorithm for Blob-Filaments in Plasma", 5th International Workshop on Big Data Analytics: and Opportunities (BDAC 14), 2014,

2013

J. Choi, K. Hu, A. Sim, "Relational Dynamic Bayesian Networks with Locally Exchangeable Measures", 2013, LBNL 6341E,

K. Hu, J. Choi, J. Jiang, A. Sim, "Best Predictive GLMM using LASSO with Application on High- Speed Network", 2013, LBNL 6327E,

K. Hu, A. Sim, D. Antoniades, C. Dovrolis, "Estimating and Forecasting Network Traffic Performance based on Statistical Patterns Observed in SNMP data", the 9th International Conference on Machine Learning and Data Mining (MLDM2013), 2013,

D. Antoniades, K. Hu, A. Sim, C. Dovrolis, "What SNMP data can tell us about Edge-to-Edge network performance", Passive and Active Measurement Conference (PAM2013), 2013,

K. Hu, A. Sim, D. Antoniades, C. Dovrolis, Statistical Prediction Models for Network Traffic Performance, the APAN 35 conference and the Winter 2013 ESCC/Internet2 Joint Techs meeting (TIP2013), 2013,

Jong Y Choi, Kesheng Wu, Jacky C Wu, Alex Sim, Qing G Liu, Matthew Wolf, C Chang, Scott Klasky, "Icee: Wide-area in transit data processing framework for near real-time scientific applications", 4th SC Workshop on Petascale (Big) Data Analytics: Challenges and Opportunities in conjunction with SC13, 2013, 11,

2012

Junmin Gu, David Smith, Ann L. Chervenak, Alex Sim, "Adaptive Data Transfers that Utilize Policies for Resource Sharing", The 2nd International Workshop on Network-Aware Data Management Workshop (NDM2012), 2012,

Mehmet Balman, Eric Pouyoul, Yushu Yao, E. Wes Bethel, Burlen Loring, Prabhat, John Shalf, Alex Sim, and Brian L. Tierney, "Experiences with 100G Network Applications", In Proceedings of the Fifth international Workshop on Data-intensive Distributed Computing, in conjunction with ACM High Performance Distributing Computing (HPDC) Conference, 2012, Delft, Netherlands, June 2012, LBNL 5603E, doi: 10.1145/2286996.2287004

100Gbps networking has finally arrived, and many research and educational in- stitutions have begun to deploy 100Gbps routers and services. ESnet and Internet2 worked together to make 100Gbps networks available to researchers at the Super- computing 2011 conference in Seattle Washington. In this paper, we describe two of the first applications to take advantage of this network. We demonstrate a visu- alization application that enables remotely located scientists to gain insights from large datasets. We also demonstrate climate data movement and analysis over the 100Gbps network. We describe a number of application design issues and host tuning strategies necessary for enabling applications to scale to 100Gbps rates. 

M. Balman, A. Sim, "Scaling the Earth System Grid to 100Gbps Networks", 2012, LBNL 5794E,

D. Yu, D. Katramatos, A. Shoshani, A. Sim, J. Gu, V. Natarajan, "StorNet: Integrating Storage Resource Management with Dynamic Network Provisioning for Automated Data Transfer", International Committee for Future Accelerators (ICFA) Standing Committee on Inter-Regional Connectivity (SCIC) 2012 Report: Networking for High Energy Physics, 2012,

Benson Ma, Arie Shoshani, Alex Sim, Kesheng, Yong-Ik Byun, Jaegyoon Hahm, Min-Su Shin, "Efficient Attribute-Based Data Access in Astronomy", The 2nd International Workshop on Network-Aware Data Workshop (NDM2012), 2012, 562--571,

2011

J. Gu, D. Katramatos, X. Liu, V. Natarajan, A. Shoshani, A. Sim, D. Yu, S. Bradley, S. McKee, "StorNet: Integrated Dynamic Storage and Network Resource Provisioning and Management for Automated Data Transfers", Journal of Physics: Conf. Ser., 2011, 331, doi: 10.1088/1742- 6596/331/1/012002

G. Garzoglio, J. Bester, K. Chadwick, D. Dykstra, D. Groep, J. Gu, T. Hesselroth, O. Koeroo, T. Levshina, S. Martin, M. Salle, N. Sharma, A. Sim, S. Timm, A. Verstegen, "Adoption of a SAML-XACML Profile for Authorization Interoperability across Grid Middleware in OSG and EGEE", Journal of Physics: Conf. Ser., 2011, 331, doi: 10.1088/1742-6596/331/6/062011

A. Shoshani, I. Altintas, J. Chen, G. Chin, A. Choudhary, D. Crawl, T. Critchlow, K. Gao, B. Grimm, H. Iyer, C. Kamath, A. Khan, S. Klasky, S. Koehler, S. Lang, R. Latham, J. W. Li, W. Liao, J. Ligon, Q. Liu, B. Ludaescher, P. Mouallem, M. Nagappan, N. Podhorszki, R. Ross, D. Rotem, N. Samatova, C. Silva, A. Sim, R. Tchoua, R. Thakur, M. Vouk, K. Wu, W. Yu, "The Scientific Data Management Center: Available Technologies and Highlights", SciDAC Conference, 2011,

Junmin Gu, Dimitrios Katramatos, Xin Liu, Vijaya Natarajan, Arie Shoshani, Alex Sim, Dantong Yu, Scott Bradley, Shawn McKee, "StorNet: Co-Scheduling of End-to-End Bandwidth Reservation on Storage and Network Systems for High Performance Data Transfers", IEEE INFOCOM HSN 2011, 2011,

Dean N. Williams, Ian T. Foster, Don E. Middleton, Rachana Ananthakrishnan, Neill Miller, Mehmet Balman, Junmin Gu, Vijaya Natarajan, Arie Shoshani, Alex Sim, Gavin Bell, Robert Drach, Michael Ganzberger, Jim Ahrens, Phil Jones, Daniel Crichton, Luca Cinquini, David Brown, Danielle Harper, Nathan Hook, Eric Nienhouse, Gary Strand, Hannah Wilcox, Nathan Wilhelmi, Stephan Zednik, Steve Hankin, Roland Schweitzer, John Harney, Ross Miller, Galen Shipman, Feiyi Wang, Peter Fox, Patrick West, Stephan Zednik, Ann Chervenak, Craig Ward, "Earth System Grid Center for Enabling Technologies (ESG-CET): A Data Infrastructure for Data-Intensive Climate Research", SciDAC Conference, 2011,

2010

Alex Sim, Mehmet Balman, Dean N. Williams, Arie Shoshani, Vijaya Natarajan, "Adaptive Transfer Adjustment in Efficient Bulk Data Transfer Management for Climate Datasets", The 22nd IASTED International Conference on Parallel and Distributed Computing and System, Marina Del Rey, CA, November 20, 2010, LBNL 3985E,

Many scientific applications and experiments, such as high energy and nuclear physics, astrophysics, climate observation and modeling, combustion, nano-scale material sciences, and computational biology, generate extreme volumes of data with a large number of files. These data sources are distributed among national and international data repositories, and are shared by large numbers of geographically distributed scientists. A large portion of the data is frequently accessed, and a large volume of data is moved from one place to another for analysis and storage. A challenging issue in such efforts is the limited network capacity for moving large datasets. A tool that addresses this challenge is the Bulk Data Mover (BDM), a data transfer management tool used in the Earth System Grid (ESG) community. It has been managing massive dataset transfers efficiently in the environment where the network bandwidth is limited. Adaptive transfer adjustment was studied to enhance the BDM to handle significant end-to-end performance changes in the dynamic network environments as well as to control the data transfers for the desired transfer performance. We describe the results from our hands-on data transfer management experience in the climate research community. We study a practical transfer estimation model and state our initial results from the adaptive transfer adjustment methodology. 

Mehmet Balman, Evangelos Chaniotakis, Arie Shoshani, Alex Sim, "A Flexible Reservation Algorithm for Advance Network Provisioning", ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, November 2010 (SC'10)., New Orleans, LA, IEEE Computer Society Washington, DC, USA ISBN: 978-1-4244-7559-, November 14, 2010, LBNL 4017E, doi: http://dx.doi.org/10.1109/SC.2010.4

Many scientific applications need support from a communication infrastructure that provides predictable performance, which requires effective algorithms for bandwidth reservations. Network reservation sys- tems such as ESnet’s OSCARS, establish guaranteed bandwidth of secure virtual circuits for a certain bandwidth and length of time. However, users currently cannot inquire about bandwidth availability, nor have alternative suggestions when reservation requests fail. In general, the number of reservation options is exponential with the number of nodes n, and current reservation commitments. We present a novel approach for path finding in time-dependent networks taking advantage of user-provided parameters of total volume and time constraints, which produces options for earliest completion and shortest duration. The theoretical complexity is only O(n2r2) in the worst-case, where r is the number of reservations in the desired time interval. We have implemented our algorithm and developed efficient methodologies for incorporation into network reservation frameworks. Performance measurements confirm the theoretical predictions. 

D. Hasenkamp, A. Sim, M. Wehner, K. Wu, "Finding Tropical Cyclones on Clouds", Supercomputing 2010, ACM SRC 3rd place, 2010,

M. Balman, E. Chaniotakis, A. Shoshani, A. Sim, "A New Approach in Advance Network Reservation and Provisioning for High-Performance Scientific Data Transfers", 2010, LBNL 4091E,

G. Attebury, A. Baranovski, K. Bloom, B. Bockelman, D. Kcira, J. Letts, T. Levshina, C. Lundestedt, T. Martin, W. Maier, H. Pi, A. Rana, I. Sfiligoi, A. Sim, M. Thomas, F. Wuerthwein, "Roadmap for Applying Hadoop Distributed File System in Scientific Grid Computing", International Symposium on Grid Computing (ISGC), 2010,

Julian Cummings, Jay Lofstead, Karsten Schwan, Alexander Sim, Arie Shoshani, Ciprian Docan, Manish Parashar, Scott Klasky, Norbert Podhorszki, Roselyne Barreto, "EFFIS: An End-to-end Framework for Fusion Integrated Simulation", 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010,

Daren Hasenkamp, Alexander Sim, Michael Wehner, Kesheng Wu, "Finding tropical cyclones on a cloud computing cluster: Using parallel virtualization for large-scale climate simulation analysis", 2010 IEEE Second International Conference on Cloud Computing Technology and Science, 2010, 201--208, LBNL 4218E,

 

 

A. Sim, D. Gunter, V. Natarajan, A. Shoshani, D. Williams, J. Long, J. Hick, J. Lee, E. Dart, "Efficient Bulk Data Replication for the Earth System Grid", Data Driven E-science: Use Cases and Successful Applications of Distributed Computing Infrastructures (ISGC 2010), (Springer-Verlag New York Inc: 2010) Pages: 435

Raj Kettimuthu, Alex Sim, Dan Gunter, Bill Allcock, Peer T. Bremer, John Bresnahan, Andrew Cherry, Lisa Childers, Eli Dart, Ian Foster, Kevin Harms, Jason Hick, Jason Lee, Michael Link, Jeff Long, Keith Miller, Vijaya Natarajan, Valerio Pascucci, Ken Raffenetti, David Ressman, Dean Williams, Loren Wilson, Linda Winkler, "Lessons learned from moving earth system grid data sets over a 20 Gbps wide-area network", HPDC 10, New York, NY, USA, ACM, 2010, 316--319, doi: 10.1145/1851476.1851519

2009

A. Sim, A. Shoshani, F. Donno, J. Jensen, Storage Resource Manager Interface Specification V2.2 Implementations Experience Report, Open Grid Forum, GFD.154, 2009,

D. N. Williams, R. Ananthakrishnan, D. E. Bernholdt, S. Bharathi, D. Brown, M. Chen, A. L. Chervenak, L. Cinquini, R. Drach, I. T. Foster, P. Fox, D. Fraser, J. Garcia, S. Hankin, P. Jones, D. E. Middleton, J. Schwidder, R. Schweitzer, R. Schuler, A. Shoshani, F. Siebenlist, A. Sim, W. G. Strand, M. Su, N. Wilhelmi, "The Earth System Grid: Enabling Access to Multimodel Climate Simulation Data", American Meteorological Society, 2009, 90(2):195-205,

G. Attebury, A. Baranovski, K. Bloom, B. Bockelman, D. Kcira, J. Letts, T. Levshina, C. Lundestedt, T. Martin, W. Maier, H. Pi, A. Rana, I. Sfiligoi, A. Sim, M. Thomas, F. Wuerthwein, "Hadoop Distributed File System for the Grid", IEEE Nuclear Science Symposium, 2009,

J. Jensen, R. Downing, D. Ross, A. Sim, "Practical Grid Storage Interoperation", Journal of Grid Computing, 2009, 7:3, doi: 10.1007/s10723-009-9127-2

M. Riedel, E. Laure, Th. Soddemann, L. Field, J. P. Navarro, J. Casey, M. Litmaath, J. Ph. Baud, B. Koblitz, C. Catlett, D. Skow, C. Zheng, P. M. Papadopoulos, M. Katz, N. Sharma, O. Smirnova, B. Kónya, P. Arzberger, F. Würthwein, A. S. Rana, T. Martin, M. Wan, V. Welch, T. Rimovsky, S. Newhouse, A. Vanni, Y. Tanaka, Y. Tanimura, T. Ikegami, D. Abramson, C. Enticott, G. Jenkins, R. Pordes, N. Sharma, S. Timm, N. Sharma, G. Moont, M. Aggarwal, D. Colling, O. van der Aa, A. Sim, V. Natarajan, A. Shoshani, J. Gu, S. Chen, G. Galang, R. Zappi, L. Magnoni, V. Ciaschini, M. Pace, V. Venturi, M. Marzolla, P. Andreetto, B. Cowles, S. Wang, Y. Saeki, H. Sato, S. Matsuoka, P. Uthayopas, S. Sriprayoonsakul, O. Koeroo, M. Viljoen, L. Pearlman, S. Pickles, David Wallom, G. Moloney, J. Lauret, J. Marsteller, P. Sheldon, S. Pathak, S. De Witt, J. Mencák, J. Jensen, M. Hodges, D. Ross, S. Phatanapherom, G. Netzer, A. R. Gregersen, M. Jones, S. Chen, P. Kacsuk, A. Streit, D. Mallmann, F. Wolf, T. Lippert, Th. Delaitre, E. Huedo, N. Geddes, "Interoperation of world-wide production e-Science infrastructures", Concurrency and Computation: Practice and Experience, 2009, 21(8):961-990,

Arie Shoshani, Flavia Donno, Junmin Gu, Jason Hick, Maarten Litmaath, Alex Sim, "Dynamic Storage Management", Scientific Data Management: Challenges, Technology, and Deployment, edited by Arie Shoshani, Doron Rotem, (Chapman & Hall/CRC Computational Science: 2009)

K Wu, S Ahern, EW Bethel, J Chen, H Childs, C Geddes, J Gu, H Hagen, B Hamann, J Lauret, others, "FastBit: Interactively Searching Massive Data", Proc. of SciDAC 2009, 2009, LBNL 2164E,

2008

P. Jakl, J. Lauret, A. Hanushevsky, A. Shoshani, A. Sim, J. Gu, "Grid data access on widely distributed worker nodes using scalla and SRM", Journal of Physics: Conf. Ser., 2008, 119, doi: 10.1088/1742-6596/119/7/072019

D. N. Williams, R. Ananthakrishnan, D. E. Bernholdt, S. Bharathi, D. Brown, M. Chen, A. L. Chervenak, L. Cinquini, R. Drach, I. T. Foster, P. Fox, S. Hankin, V. E. Henson, P. Jones, D. E. Middleton, J. Schwidder, R. Schweitzer, R. Schuler, A Shoshani, F. Siebenlist, A. Sim, W. G. Strand, N. Wilhelmi, M. Su, "Data Management and Analysis for the Earth System Grid", SciDAC Conference, 2008,

Alex Sim, Arie Shoshani (Editors), Paolo Badino, Olof Barring, Jean‐Philippe Baud, Ezio Corso, Shaun De Witt, Flavia Donno, Junmin Gu, Michael Haddox‐Schatz, Bryan Hess, Jens Jensen, Andy Kowalski, Maarten Litmaath, Luca Magnoni, Timur Perelmutov, Don Petravick, Chip Watson, The Storage Resource Manager Interface Specification Version 2.2, Open Grid Forum, Document in Full Recommendation, GFD.129, 2008,

C S Chang, S Klasky, J Cummings, R. Samtaney, A Shoshani, L Sugiyama, D Keyes, S Ku, G Park, S Parker, N Podhorszki, H. Strauss, H Abbasi, M Adams, R Barreto, G Bateman, K Bennett, Y Chen, E D’Azevedo, C Docan, S Ethier, E Feibush, L Greengard, T Hahm, F Hinton, C Jin, A. Khan, A Kritz, P Krsti, T Lao, W Lee, Z Lin, J Lofstead, P Mouallem, M Nagappan, A Pankin, M Parashar, M Pindzola, C Reinhold, D Schultz, K Schwan, D. Silver, A Sim, D Stotler, M Vouk, M Wolf, H Weitzner, P Worley, Y Xiao, E Yoon, D Zorin, "Toward a first- principles integrated simulation of tokamak edge plasmas", Journal of Physics: Conf. Ser., 2008, 125, doi: 10.1088/1742-6596/125/1/012042

R Ananthakrishnan, D E Bernholdt, S Bharathi, D Brown, M Chen, A L Chervenak, L Cinquini, R Drach, I T Foster, P Fox, D Fraser, K Halliday, S Hankin, P Jones, C Kesselman, D E Middleton, J Schwidder, R Schweitzer, R Schuler, A Shoshani, F Siebenlist, A Sim, W G Strand, N Wilhelmi, M Su, D N Williams, "Building a global federation system for climate change research: the earth system grid center for enabling technologies (ESG-CET)", Journal of Physics: Conf. Ser., 2008, 78, doi: 10.1088/1742-6596/78/1/012050

W. Betts, L. Didenko, T. Freeman, P. Jakl, L. Hajdu, E. Hjort, K. Keahey, J. Lauret, D. Olson, A. Rose, I. Sakrejda, A. Sim, "STAR Grid Activities, OSG and Beyond", International Symposium on Grid Computing (ISGC), 2008,

Meiyappan Nagappan, Mladen A. Vouk, Kesheng Wu Alex Sim, Arie Shoshani, "Efficient Operational Profiling of Systems Using Arrays on Execution Logs", ISSRE, 2008, 313--314, doi: 10.1109/ISSRE.2008.45

2007

L. Abadie, P. Badino, J. Baud, E. Corso, M. Crawford, S. De Witt, F. Donno, A. Forti, P. Fuhrmann,
G. Grosdidier, J. Gu , J. Jensen, S. Lemaitre, M. Litmaath, D. Litvinsev, G. Lo Presti, L. Magnoni, T. Mkrtchan, A. Moibenko, V. Natarajan, G. Oleynik, T. Perelmutov, D. Petravick, A. Shoshani, A. Sim, M. Sponza, R. Zappi,
"Storage Resource Managers: Recent International Experience on Requirements and Multiple Co-Operating Implementations", the 24th IEEE Conference on Mass Storage Systems and Technologies, 2007,

F. Donno, L. Abadie, P. Badino, J. Baud, E. Corso, M. Crawford, S. De Witt, A. Forti, P. Fuhrmann, G. Grosdidier, J. Gu , J. Jensen, S. Lemaitre, M. Litmaath, D. Litvinsev, G. Lo Presti, L. Magnoni, T. Mkrtchan, A. Moibenko, V. Natarajan, G. Oleynik, T. Perelmutov, D. Petravick, A. Shoshani, A. Sim, M. Sponza, R. Zappi, "Storage Resource Manager version 2.2: design, implementation, and testing experience", Journal of Physics: Conf. Ser., 2007, 119, doi: 10.1088/1742-6596/119/6/062028

2006

A. Shoshani, A. Sim, K. Stockinger, "RRS: Replica Registration Service for Data Grids", Lecture Notes in Computer Science, edited by Jean-Marc Pierson, (Springer-Verlag GmbH Publisher: 2006) Pages: 100-112

D. E. Middleton, D. E. Bernholdt, D. Brown, M. Chen, A. L. Chervenak, L. Cinquini, R. Drach, P. Fox, P. Jones, C. Kesselman, I. T. Foster, V. Nefedova, A. Shoshani, A. Sim, W. G. Strand, D. Williams, "Enabling worldwide access to climate simulation data: the earth system grid (ESG)", SciDAV Conference, 2006,

P. Jakl, J. Lauret, A. Hanushevky, A. Shoshani, A. Sim, "From rootd to Xrootd, from physical to logical files: experience on accessing and managing distributed data", Computing in High Energy Physics (CHEP), 2006,

E. Hjort, L. Hajdu, J. Lauret, D. Olson, A. Sim, A. Shoshani, "Data and Computational Grid Coupling in RHIC/STAR – An Analysis Scenario using SRM Technology", Computing in High Energy Physics (CHEP), 2006,

2005

D. Bernholdt, S. Bharathi, D. Brown, K. Chanchio, M. Chen, A. Chervenak, L. Cinquini, B. Zrach, I. Foster, P. Fox, J. Garcia, C. Kesselman, R. Markel, D. Middleton, V. Nefedova, L. Pouchard, A. Shoshani, A. Sim, G. Strand, D. Williams, "The Earth System Grid: Supporting the Next Generation of Climate Modeling Research", IEEE, 2005, 93(3):485-495,

A. Shoshani, A. Sim, K. Stockinger, "RRS: Replica Registration Service for Data Grids", International Workshop on Data Management in Grids, 2005,

Arie Shoshani, Alex Sim, Kurt Stockinger, "Replica Registration Service Functional Interface Specification 1.0", 2005, LBNL 57520,

Kesheng Wu, Junmin Gu, Jerome Lauret, Arthur Poskanzer, Arie Shoshani, Alexander Sim, Zhang, "Grid Collector: Facilitating Efficient Selective from Data Grids", International Supercomputer Conference 2005, 2005,

2004

Eric Hjort, Doug Olson, Jerome Lauret, Arie Shoshani, Alex Sim, "Production mode Data- Replication framework in STAR using the HRM Grid middleware", Computing in High Energy Physics, 2004,

Alex Sim, Junmin Gu, Arie Shoshani, Vijaya Natarajan, "DataMover: Robust Terabytes-Scale Multi-file Replication over Wide-Area Networks", the 16th International Conference on Scientific and Statistical Database Management (SSDBM 2004), 2004,

2003

Arie Shoshani, Alexander Sim, Junmin Gu, "Storage Resource Managers: Essential Components for the Grid", Grid Resource Management: State of the Art and Future Trends, edited by Jarek Nabrzyski, Jennifer M. Schopf, Jan Weglarz, (Kluwer Academic Publishers: 2003)

Ann L. Chervenak, Ewa Deelman, Carl Kesselman, William E. Allcock, Ian T. Foster, Veronika Nefedova, Jason Lee, Alex Sim, Arie Shoshani, Bob Drach, Dean Williams, Don Middleton, "High-performance remote access to climate simulation data: a challenge problem for data grid technologies", Parallel Computing, 2003, 29(10):1335-1356,

A. Sim, J. Gu, A. Shoshani, E. Hjort, D. Olson, "Experience with Deploying Storage Resource Managers to Achieve Robust File Replication", Computing in High Energy Physics, 2003,

D. Yu, J. Lauret, A. Shoshani, D. Oldon, E. Hjort, A. Sim, "The Design of High Performance Data Replication in the Grid Environment for the STAR Collaboration", Computing in High Energy Physics, 2003,

L. Pouchard, L. Cinquini, B. Drach, D. Middleton, D. Bernholdt, K. Chanchio, I. Foster, V. Nefedova, D. Brown, P. Fox, J. Garcia, G. Strand, D. Williams, A. Chervenak, C. Kesselman, A. Shoshani, A. Sim, "An Ontology for Scientific Information in a Grid Environment: the Earth System Grid", the Symposium on Cluster Computing and the Grid (CCGrid), 2003,

Arie Shoshani, Alex Sim, Junmin Gu, Storage Resource Managers: Essential Components for Grid Applications, Globus World, 2003,

Kesheng Wu, Wei-Ming Zlang, Alexander Sim, Junmin Gu, Arie Shoshani, "Grid collector: An event catalog with automated file management", 2003 IEEE Nuclear Science Symposium. Conference Record (IEEE Cat. No. 03CH37515), 2003, LBNL 55563,

Kesheng Wu, Wei-Ming Zhang, Alexander Sim, Gu, Arie Shoshani, "Grid Collector: An Event Catalog With Automated File", Proceedings of IEEE Nuclear Science Symposium 2003, 2003, doi: 10.1109/NSSMIC.2003.1351830

2002

A. Shoshani, A. Sim, J. Gu, "Storage Resource Managers: Middleware components for Grid Storage", the 19th IEEE Symposium on Mass Storage Systems, 2002,

2001

B. Allcock, I. Foster, V. Nefedova, A. Chervenak, E. Deelman, C. Kesselman, J. Lee, A. Sim, A. Shoshani, B. Drach, D. Williams, "High-Performance Remote Access to Climate Simulation Data: A Challenge Problem for Data Grid Technologies", Super Computing 2001, 2001,

A. Sim, H. Nordberg, L.M. Bernardo, A. Shoshani, D. Rotem, "Experience with using CORBA to implement a file caching coordination system", Concurrency and Computation: Practice and Experience, 2001, 13:1-15,

E. Hjort, D. Olson, A. Sim, J. Yang, J. Lauret, M. Messer, "Data Grid Services in STAR, Initial Deployment: Site-to-Site File Replication", Computing in High Energy Physics, 2001,

D. Olson, E. Hjort, J. Lauret, M. Messer, A. Shoshani, A. Sim, "Non-shared Disk Cluster - A Fault Tolerant, Commodity Approach to Hi-Bandwidth Data Analysis", Computing in High Energy Physics, 2001,

L Bernardo, H Nordberg, D Olson, A Shoshani, A Sim, A Vaniachine, D Zimmerman, B Gibbard, R Porter, T Wenaus, others, "New capabilities in the HENP grand challenge storage access system and its application at RHIC", Computer physics communications, 2001, 140:179--188,

L. Bernardo, H. Nordberg, D. Olson, A. Sim, A. Vaniachine, D. Zimmerman, B. Gibbard, R. Porter, T. Wenaus, D., "New capabilities in the HENP Grand Challenge Storage System and its application at RHIC", Computer Physics Communications, 2001, 140:179--188,

2000

A. Shoshani, A. Sim, L.M. Bernerdo, H. Nordberg, "Coordinating Simultaneous Caching of File Bundles from Tertiary Storage", International Conference on Scientific and Statistical Database Management (SSDBM), 2000,

L. M. Bernardo, B. Gibbard, D. Malon, H. Nordberg, D. Olson, R. Porter, A. Shoshani, A. Sim, A. Vaniachine, T. Wenaus, K. Wu, D. Zimmerman, "New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC", Computing in High Energy Physics, 2000,

L. M. Bernardo, A. Shoshani, A. Sim, H. Nordberg, "Access Coordination Of Tertiary Storage For High Energy Physics Applications", the 17th IEEE Symposium on Mass Storage Systems, 2000,

A. Sim, A. Shoshani, HRM: Hierarchical Resource Manager, Globus World, 2000,

A. Sim, A. Shoshani, L. M. Bernardo, H. Nordberg, A Storage Access Coordination System for Perabyte Scale Scientific Data, IONA World, 2000,

1999

A. Sim, H. Nordberg, L. M. Bernardo, A. Shoshani, D. Rotem, "Storage Access Coordination Using CORBA", Distributed Objects and Application, 1999, 168-175,

A. Shoshani, L.M. Bernardo, H. Nordberg, D. Rotem and A. Sim, "Multidimensional Indexing and Query Coordination for Tertiary Storage Management", International Conference on Scientific and Statistical Database Management, 1999, 214-225,

1998

L.M. Bernardo, D. Rotem, A. Shoshani, H. Nordberg, A. Sim, "Using Access Patterns to Partition Large Datasets on Tertiary Storage in Order to Minimize Retrieval Costs", 1998, LBNL 41504,

A. Shoshani, L.M. Bernardo, H. Nordberg, D. Rotem, A. Sim, "Storage Management for High Energy Physics Applications", Computing in High Energy Physics, 1998,

1996

A. Sim, B. Parvin, P. Keagy, "Invariant Representation and Classification of Fruits from X-ray Images", International Journal of Imaging Systems and Technology, 1996, 7:231-237,

1995

A. Sim, B. Parvin, P. Keagy, "Invariant Representation and Hierarchical Network for Inspection of Nuts from X-ray Images", IEEE International Conference on Neural Networks, 1995, II:738-743,

A. Sim, B. Parvin, P. Keagy, "Machine Vision Inspection of Insect Infested Pistachio Nuts from X-ray Images", Vision Interface, 1995, 17-22,

1969

Jonathan Wang, Wucherl Yoo, Alex Sim, K John Wu, "Analysis of Variable Selection Methods on Scientific Cluster Measurement Data", 1969,

Horst D. Simon

2019

Jung Heon Song, Marcos L\ opez de Prado, Horst D Simon, Kesheng Wu, Extracting Signals from High-Frequency Trading with Digital Signal Processing Tools, The Journal of Financial Data Science, Pages: 124--138 2019,

2018

Kesheng Wu, Horst D Simon, "High-Performance Computational Intelligence and Forecasting Technologies", 2018,

2014

Jung Heon Song, Marcos L\ opez de Prado, Horst Simon, Kesheng Wu, "Exploring Irregular Time Series Through Non-uniform Fourier Transform", WHPCF 14, Piscataway, NJ, USA, IEEE Press, 2014, 37--44, doi: 10.1109/WHPCF.2014.8

Jung Heon Song, Kesheng Wu, Horst D Simon, "Parameter Analysis of the VPIN (Volume synchronized of Informed Trading) Metric", Quantitative Financial Risk Management: Theory and, 2014,

2013

UC Berkeley, William Gu, Jaesik Choi, Ming Gu, Horst Simon, Kesheng Wu, "Fast Change Point Detection for Electricity Market Analysis", January 1, 2013, LBNL LBNL-6388E,

2010

Ichitaro Yamazaki, Zhaojun Bai, Horst D. Simon Lin-Wang Wang, Kesheng Wu, "Adaptive Projection Subspace Dimension for the Lanczos Method", ACM Transactions on Mathematical Software, 2010, 37, doi: 10.1145/1824801.1824805

2008

I. Yamazaki, K. Wu, H. Simon, "nu-TRLan User Guide version 1.0", 2008, LBNL 1288E,

Erich Strohmaier

2019

Devarshi Ghoshal, Kesheng Wu, Eric Pouyoul, Erich Strohmaier, "Analysis and Prediction of Data Transfer Throughput for Data-Intensive Workloads", 2019 IEEE International Conference on Big Data (Big Data), 2019, 3648--3657,

Houjun Tang

2020

Suren Byna, M. Scot Breitenfeld, Bin Dong, Quincey Koziol, Elena Pourmal, Dana Robinson, Jerome Soumagne, Houjun Tang, Venkatram Vishwanath, and Richard Warren, "ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems", Journal of Computer Science and Technology 2020, 35(1): 145-160, February 2, 2020, doi: 10.1007/s11390-020-9822-9

2019

Richard Warren, Jerome Soumagne, Jingqing Mu, Houjun Tang, Suren Byna, Bin Dong, Quincey Koziol, "Analysis in the Data Path of an Object-centric Data Management System", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 18, 2019,

Houjun Tang, Suren Byna, Stephen Bailey, Zarija Lukic, Jialin Liu, Quincey Koziol, Bin Dong, "Tuning Object-centric Data Management Systems for Large Scale Scientific Applications", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 18, 2019,

Houjun Tang, Quincey Koziol, Suren Byna, John Mainzer, Tonglin Li, "Enabling Transparent Asynchronous I/O using Background Threads", 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW 2019), November 19, 2019, doi: DOI 10.1109/PDSW49588.2019.00006

Wei Zhang, Suren Byna, Houjun Tang, Brody Williams, Yong Chen, "MIQS: Metadata Indexing and erying Service for Self-Describing File Formats", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), November 19, 2019,

Tonglin Li, Quincey Koziol, Houjun Tang, Jialin Liu, Suren Byna, "I/O Performance Analysis of Science Applications Using HDF5 File-level Provenance", Cray User Group (CUG) 2019, May 10, 2019,

Jingqing Mu, Jerome Soumagne, Suren Byna, Quincey Koziol, Houjun Tang, Richard Warren, "Interfacing HDF5 with A Scalable Object-centric Storage System on Hierarchical Storage", Cray User Group (CUG) 2019, May 7, 2019,

Bin Dong, Kesheng Wu, Suren Byna, Houjun Tang, "SLOPE: Structural Locality-Aware Programming Model for Composing Array Data Analysis", International Conference on High Performance Computing, January 1, 2019, 61--80,

2018

Suren Byna, Quincey Koziol, Venkatram Vishwanath, Jerome Soumagne, Houjun Tang, Kimmy Mu, Richard Warren, François Tessier, Bin Dong, Teng Wang, and Jialin Liu, Proactive Data Containers (PDC): An object-centric data store for large-scale computing systems, AGU Fall Meeting, December 13, 2018,

Wei Zhang, Houjun Tang, Suren Byna, Yong Chen, "DART: Distributed Adaptive Radix Tree for Efficient Affix-based Keyword Search on HPC Systems", Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, November 1, 2018, 24,

Teng Wang, Suren Byna, Bin Dong, and Houjun Tang, "UniviStor: Integrated Hierarchical and Distributed Storage for HPC", IEEE Cluster 2018., September 1, 2018,

Houjun Tang, Suren Byna, Francois Tessier, Teng Wang, Bin Dong, Jingqing Mu, Quincey Koziol, Jerome Soumagne, Venkatram Vishwanath, Jialin Liu, and Richard Warren, "Toward Scalable and Asynchronous Object-centric Data Management for HPC", 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) 2018, May 1, 2018,

Bin Dong, Teng Wang, Houjun Tang, Quincey Koziol, Kesheng Wu, Suren Byna, "ARCHIE: Data analysis acceleration with array caching in hierarchical storage", 2018 IEEE International Conference on Big Data (Big Data), January 1, 2018, 211--220,

2017

Houjun Tang, Suren Byna, Bin Dong, Jialin Liu, and Quincey Koziol, "SoMeta: Scalable Object-centric Metadata Management for High Performance Computing", IEEE Cluster 2017, September 5, 2017,

2016

Wenzhao Zhang, Houjun Tang, Xiaocheng Zou, Steven Harenberg, Qing Liu, Scott Klasky, Nagiza F Samatova, "Exploring Memory Hierarchy to Improve Scientific Data Read Performance", 2015 IEEE International Conference on Cluster Computing, 2016, 66--69,

Xiaocheng Zou, David A Boyuka II, Dhara Desai, Daniel F Martin, Suren Byna, Kesheng Wu, "AMR-aware in situ indexing and scalable querying", Proceedings of the 24th High Performance Computing Symposium, January 1, 2016, 26,

Houjun Tang, Suren Byna, Steve Harenberg, Xiaocheng Zou, Wenzhao Zhang, Kesheng Wu, Bin Dong, Oliver Rubel, Kristofer Bouchard, Scott Klasky, others, "Usage pattern-driven dynamic data layout reorganization", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), January 1, 2016, 356--365,

Wenzhao Zhang, Houjun Tang, Steve Harenberg, Surendra Byna, Xiaocheng Zou, Dharshi Devendran, Daniel F Martin, Kesheng Wu, Bin Dong, Scott Klasky, others, "Amrzone: A runtime amr data sharing framework for scientific applications", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), January 1, 2016, 116--125,

Wenzhao Zhang, Houjun Tang, Stephen Ranshous, Surendra Byna, Daniel F Mart\ \in, Kesheng Wu, Bin Dong, Scott Klasky, Nagiza F Samatova, "Exploring memory hierarchy and network topology for runtime AMR data sharing across scientific applications", 2016 IEEE International Conference on Big Data (Big Data), January 1, 2016, 1359--1366,

2015

Xiaocheng Zou, Kesheng Wu, David A. Boyuka, Daniel F. Martin, Suren Byna, Houjun, Kushal Bansal, Terry J. Ligocki, Hans Johansen, and Nagiza F. Samatova, "Parallel In Situ Detection of Connected Components Adaptive Mesh Refinement Data", Proceedings of the Cluster, Cloud and Grid Computing (CCGrid) 2015, 2015,

David A Boyuka II, Houjun Tang, Kushal Bansal, Xiaocheng Zou, Scott Klasky, Nagiza F Samatova, "The hyperdyadic index and generalized indexing and query with PIQUE", Proceedings of the 27th International Conference on Scientific and Statistical Database Management, 2015, 20,

2014

John Jenkins, Xiaocheng Zou, Houjun Tang, Dries Kimpe, Robert Ross, Nagiza F Samatova, "Radar: Runtime asymmetric data-access driven scientific data replication", International Supercomputing Conference, 2014, 296--313,

Houjun Tang, Xiaocheng Zou, John Jenkins, David A Boyuka II, Stephen Ranshous, Dries Kimpe, Scott Klasky, Nagiza F Samatova, "Improving read performance with online access pattern analysis and prefetching", European Conference on Parallel Processing, 2014, 246--257,

Xiaocheng Zou, Sriram Lakshminarasimhan, David A Boyuka II, Stephen Ranshous, Houjun Tang, Scott Klasky, Nagiza F Samatova, "Fast set intersection through run-time bitmap construction over pfordelta-compressed indexes", European Conference on Parallel Processing, 2014, 668--679,

2013

Eric R Schendel, Steve Harenberg, Houjun Tang, Venkatram Vishwanath, Michael E Papka, Nagiza F Samatova, "A generic high-performance method for deinterleaving scientific data", European Conference on Parallel Processing, 2013, 571--582,

Rollin Thomas

2020

D. Bard, C. Snavely, L. Gerhardt, J. Lee, B. Totzke, K. Antypas, S. Byna, R. Cheema, S. Cholia, M. Day, B. Enders, A. Gaur, A. Greiner, T. Groves, M. Kiran, Q. Koziol, K. Rowland, C. Samuel, A. Selvarajan, A. Sim, D. Skinner, R. Thomas, G. Torok, The Superfacility project: automated pipelines for experiments and HPC, International Conference for High Performance Computing, Networking, Storage, and Analysis (SC20), State of the Practice (SOP), 2020,

B. Enders, D. Bard, C. Snavely, L. Gerhardt, J. Lee, B. Totzke, K. Antypas, S. Byna, R. Cheema, S. Cholia, M. Day, A. Gaur, A. Greiner, T. Groves, M. Kiran, Q. Koziol, K. Rowland, C. Samuel, A. Selvarajan, A. Sim, D. Skinner, R. Thomas, G. Torok, "Cross-facility science with the Superfacility Project at LBNL", 2nd Workshop on Large-scale Experiment-in-the-Loop Computing (XLOOP 2020), in conjunction with the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC 20), 2020,

Craig Tull

2018

Cecilia Dao, Xinyu Liu, Alex Sim, Craig Tull, Kesheng Wu, "Modeling data transfers: change point and anomaly detection", 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), 2018, 1589--1594,

Daniela Ushizima

2010

Oliver R\ ubel, Sean Ahern, E Wes Bethel, Mark D Biggin, Hank Childs, Estelle Cormier-Michel, Angela DePace, Michael B Eisen, Charless C Fowlkes, Cameron GR Geddes, others, "Coupling visualization and data analysis for knowledge discovery from multi-dimensional scientific data", Procedia computer science, Elsevier, January 2010, 1:1757--1764, LBNL 3669E,

Oliver Rübel, Sean Ahern, E. Wes Bethel, D. Biggin, Hank Childs, Estelle, Angela DePace, Michael B. Eisen Charless C. Fowlkes, Cameron G. R. Geddes, Hagen, Bernd Hamann, Min-Yu Huang, Soile E. Keränen, David W. Knowles, Cris L. Hendriks, Jitendra Malik, Jeremy Meredith Peter Messmer, Prabhat, Daniela Ushizima, H. Weber, Kesheng Wu, "Coupling visualization and data analysis for knowledge from multi-dimensional scientific data", Procedia Computer Science, 2010, 1:1751--1758, doi: 10.1016/j.procs.2010.04.197

2009

E. W. Bethel, C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, M. Day, E. Deines, T. Fogal, C. Garth, C. G. R. Geddes, H. Hagen, B. Hamann, C. Hansen, J. Jacobsen, K. Joy, J. Kruger, J. Meredith, P. Messmer, G. Ostrouchov, V. Pascucci, K. Potter, Prabhat, D. Pugmire, O. Rubel, A. Sanderson, C. Silva, D. Ushizima, G. Weber, B. Whitlock, K. Wu, "Occam's Razor and Petascale Visual Data Analysis", SciDAC 2009, J. of Physics: Conference Series, San Diego, California, July 2009, LBNL 2210E,

Oliver R\ ubel, Cameron GR Geddes, Estelle Cormier-Michel, Kesheng Wu, Gunther H Weber, Daniela M Ushizima, Peter Messmer, Hans Hagen, Bernd Hamann, Wes Bethel, others, "Automatic beam path analysis of laser wakefield particle acceleration data", Computational Science \& Discovery, January 2009, 2:015005, LBNL 2734E,

C. G. R. Geddes, E Cormier-Michel, E. H. Esarey, C. B. Schroeder, J.-L. Vay, W. P. Leemans, D. L.. Bruhwiler, J. R. Cary, B. Cowan, M. Durant, P. Hamill, P. Messmer, P. Mullowney, C. Nieter, K. Paul, S. Shasharina, S. Veitzer, G. Weber, O. Rübel, D. Ushizima, Prabhat, E. W.Bethel, K. Wu, Large Fields for Smaller Facility Sources, SciDAC Review, Pages: 13-21, 2009,

Brian Van Straalen

2016

Dharshi Devendran, Suren Byna, Bin Dong, Brian van Straalen, Hans Johansen, Noel Keen, and Nagiza Samatova,, "Collective I/O Optimizations for Adaptive Mesh Refinement Data Writes on Lustre File System", Cray User Group (CUG) 2016, May 10, 2016,

2010

Gunther Weber, "Recent advances in visit: Amr streamlines and query-driven visualization", 2010,

Teng Wang

2019

S. Kim, A. Sim, K. Wu, S. Byna, T. Wang, Y. Son, H. Eom, "DCA-IO: A Dynamic I/O Control Scheme for Parallel and Distributed File System", 19th Annual IEEE/ACM International Symposium in Cluster, Cloud, and Grid Computing (CCGrid 2019), 2019, doi: 10.1109/CCGRID.2019.00049

Teng Wang, Suren Byna, Glenn Lockwood, Philip Carns, Shane Snyder, Sunggon Kim, Nicholas Wright, "A Zoom-in Analysis of I/O Logs to Detect Root Causes of I/O Performance Bottlenecks", IEEE/ACM CCGrid 2019, May 14, 2019,

2018

Glenn Lockwood, Shane Snyder, Teng Wang, Suren Byna, Phil Carns, and Nicholas Wright, "A Year in the Life of a Parallel File System", International Conference for High Performance Computing, Networking, and Storage (SC'18), IEEE / ACM, November 15, 2018,

Teng Wang, Suren Byna, Glenn Lockwood, Nicholas Wright, Phil Carns, and Shane Snyder,, "IOMiner: Large-scale Analytics Framework for Gaining Knowledge from I/O Logs", IEEE Cluster 2018, September 10, 2018,

Teng Wang, Suren Byna, Bin Dong, and Houjun Tang, "UniviStor: Integrated Hierarchical and Distributed Storage for HPC", IEEE Cluster 2018., September 1, 2018,

Gunther H. Weber

2016

Wahid Bhimji, Debbie Bard, Melissa Romanus, David Paul, Andrey Ovsyannikov, Brian Friesen, Matt Bryson, Joaquin Correa, Glenn K. Lockwood, Vakho Tsulaia, Suren Byna, Steve Farrell, Doga Gursoy, Chris Daley, Vince Beckner, Brian Van Straalen, Nicholas Wright, Katie Antypas, Prabhat,, "Accelerating Science with the NERSC Burst Buffer Early User Program", Cray User Group (CUG) 2016, May 10, 2016,

Utkarsh Ayachit, Andrew Bauer, Earl PN Duque, Greg Eisenhauer, Nicola Ferrier, Junmin Gu, Kenneth E Jansen, Burlen Loring, Zarija Lukic, Suresh Menon, others, "Performance analysis, design considerations, and applications of extreme-scale in situ infrastructures", SC 16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2016, 921--932, LBNL 1007264,

2012

Allen R Sanderson, Brad Whitlock, H Childs, GH Weber, K Wu, others, "A system for query based analysis and visualization", January 2012, LBNL 5507E,

2011

M Prabhat, S Byna, C Paciorek, G Weber, K Wu, T Yopes, MF Wehner, G Ostrouchov, D Pugmire, R Strelitz, others, "Pattern Detection and Extreme Value Analysis on Large Climate Data", AGUFM, Pages: IN41C--03 January 2011,

2010

Oliver R\ ubel, Sean Ahern, E Wes Bethel, Mark D Biggin, Hank Childs, Estelle Cormier-Michel, Angela DePace, Michael B Eisen, Charless C Fowlkes, Cameron GR Geddes, others, "Coupling visualization and data analysis for knowledge discovery from multi-dimensional scientific data", Procedia computer science, Elsevier, January 2010, 1:1757--1764, LBNL 3669E,

Gunther Weber, "Recent advances in visit: Amr streamlines and query-driven visualization", 2010,

Oliver Rübel, Sean Ahern, E. Wes Bethel, D. Biggin, Hank Childs, Estelle, Angela DePace, Michael B. Eisen Charless C. Fowlkes, Cameron G. R. Geddes, Hagen, Bernd Hamann, Min-Yu Huang, Soile E. Keränen, David W. Knowles, Cris L. Hendriks, Jitendra Malik, Jeremy Meredith Peter Messmer, Prabhat, Daniela Ushizima, H. Weber, Kesheng Wu, "Coupling visualization and data analysis for knowledge from multi-dimensional scientific data", Procedia Computer Science, 2010, 1:1751--1758, doi: 10.1016/j.procs.2010.04.197

2009

E. W. Bethel, C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, M. Day, E. Deines, T. Fogal, C. Garth, C. G. R. Geddes, H. Hagen, B. Hamann, C. Hansen, J. Jacobsen, K. Joy, J. Kruger, J. Meredith, P. Messmer, G. Ostrouchov, V. Pascucci, K. Potter, Prabhat, D. Pugmire, O. Rubel, A. Sanderson, C. Silva, D. Ushizima, G. Weber, B. Whitlock, K. Wu, "Occam's Razor and Petascale Visual Data Analysis", SciDAC 2009, J. of Physics: Conference Series, San Diego, California, July 2009, LBNL 2210E,

K Wu, S Ahern, EW Bethel, J Chen, H Childs, C Geddes, J Gu, H Hagen, B Hamann, J Lauret, others, "FastBit: Interactively Searching Massive Data", Proc. of SciDAC 2009, 2009, LBNL 2164E,

Oliver R\ ubel, Cameron GR Geddes, Estelle Cormier-Michel, Kesheng Wu, Gunther H Weber, Daniela M Ushizima, Peter Messmer, Hans Hagen, Bernd Hamann, Wes Bethel, others, "Automatic beam path analysis of laser wakefield particle acceleration data", Computational Science \& Discovery, January 2009, 2:015005, LBNL 2734E,

E Bethel, "Modern Scientific Visualization is More than Just Pretty Pictures", January 2009, LBNL 1450E,

C. G. R. Geddes, E Cormier-Michel, E. H. Esarey, C. B. Schroeder, J.-L. Vay, W. P. Leemans, D. L.. Bruhwiler, J. R. Cary, B. Cowan, M. Durant, P. Hamill, P. Messmer, P. Mullowney, C. Nieter, K. Paul, S. Shasharina, S. Veitzer, G. Weber, O. Rübel, D. Ushizima, Prabhat, E. W.Bethel, K. Wu, Large Fields for Smaller Facility Sources, SciDAC Review, Pages: 13-21, 2009,

2008

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "High Performance Multivariate Visual Data Exploration for Extemely Large Data", Supercomputing (SC), Austin, Texas, USA, November 2008, LBNL 716E,

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "Application of High-performance Visual Analysis Methods to Laser Wakefield Particle Acceleration Data", IEEE Visualization 2008, October 2008,

E. Wes Bethel, Oliver Rübel, Prabhat, Wu, Gunther H. Weber, Valerio Pascucci Hank Childs, Ajith Mascarenhas, Jeremy, Sean Ahern, "Modern Scientific Visualization is More than Just Pictures", Numerical Modeling of Space Plasma Flows: (Astronomical Society of the Pacific Series), St. Thomas, USVI, 2008, 301--317,

Michael Wehner

2013

E Wes Bethel, Prabhat Prabhat, Suren Byna, Oliver R\ ubel, K John Wu, Michael Wehner, "Why high performance visual data analytics is both relevant and difficult", Visualization and Data Analysis 2013, January 2013, 8654:86540B, LBNL LBNL-6063E,

2012

E. W. Bethel, Surendra Byna, Jerry Chou, Cormier-Michel, Cameron G. R. Geddes, Howison, Fuyu Li, Prabhat, Ji Qiang, R\ ubel, Rob D. Ryne, Michael Wehner, Wu, "Big Data Analysis and Visualization: What Do LINACS Tropical Storms Have In Common?", 11th International Computational Accelerator Physics ICAP 2012, Germany, 2012,

Oliver R\ ubel, Surendra Byna, Kesheng Wu, Fuyu Li, Michael Wehner, Wes Bethel, others, "Teca: A parallel toolkit for extreme climate analysis", Procedia Computer Science, Elsevier, January 2012, 9:866--876, LBNL 5352E,

We present TECA, a parallel toolkit for detecting extreme events in large climate datasets. Modern climate datasets expose parallelism across a number of dimensions: spatial locations, timesteps and ensemble members. We design TECA to exploit these modes of parallelism and demonstrate a prototype implementation for detecting and tracking three classes of extreme events: tropical cyclones, extra-tropical cyclones and atmospheric rivers. We process a modern TB-sized CAM5 simulation dataset with TECA, and demonstrate good runtime performance for the three case studies.

2011

Surendra Byna, Michael F Wehner, Kesheng John Wu, "Detecting atmospheric rivers in large climate datasets", Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities, 2011, 7--14,

Extreme precipitation events on the western coast of North America are often traced to an unusual weather phenomenon known as atmospheric rivers. Although these storms may provide a significant fraction of the total water to the highly managed western US hydrological system, the resulting intense weather poses severe risks to the human and natural infrastructure through severe flooding and wind damage. To aid the understanding of this phenomenon, we have developed an efficient detection algorithm suitable for analyzing large amounts of data. In addition to detecting actual events in the recent observed historical record, this detection algorithm can be applied to global climate model output providing a new model validation methodology. Comparing the statistical behavior of simulated atmospheric river events in models to observations will enhance confidence in projections of future extreme storms. Our detection algorithm is based on a thresholding condition on the total column integrated water vapor established by Ralph et al. (2004) followed by a connected component labeling procedure to group the mesh points into connected regions in space. We develop an efficient parallel implementation of the algorithm and demonstrate good weak and strong scaling. We process a 30-year simulation output on 10,000 cores in under 3 seconds.

M Prabhat, S Byna, C Paciorek, G Weber, K Wu, T Yopes, MF Wehner, G Ostrouchov, D Pugmire, R Strelitz, others, "Pattern Detection and Extreme Value Analysis on Large Climate Data", AGUFM, Pages: IN41C--03 January 2011,

2010

Daren Hasenkamp, Alexander Sim, Michael Wehner, Kesheng Wu, "Finding tropical cyclones on a cloud computing cluster: Using parallel virtualization for large-scale climate simulation analysis", 2010 IEEE Second International Conference on Cloud Computing Technology and Science, 2010, 201--208, LBNL 4218E,

 

 

Nicholas J. Wright

2019

Glenn K. Lockwood, Shane Snyder, Suren Byna, Philip Carns, Nicholas J. Wright, "Understanding Data Motion in the Modern HPC Data Center", 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW), November 19, 2019, doi: DOI 10.1109/PDSW49588.2019.00012

Teng Wang, Suren Byna, Glenn Lockwood, Philip Carns, Shane Snyder, Sunggon Kim, Nicholas Wright, "A Zoom-in Analysis of I/O Logs to Detect Root Causes of I/O Performance Bottlenecks", IEEE/ACM CCGrid 2019, May 14, 2019,

2018

Glenn Lockwood, Shane Snyder, Teng Wang, Suren Byna, Phil Carns, and Nicholas Wright, "A Year in the Life of a Parallel File System", International Conference for High Performance Computing, Networking, and Storage (SC'18), IEEE / ACM, November 15, 2018,

Teng Wang, Suren Byna, Glenn Lockwood, Nicholas Wright, Phil Carns, and Shane Snyder,, "IOMiner: Large-scale Analytics Framework for Gaining Knowledge from I/O Logs", IEEE Cluster 2018, September 10, 2018,

2017

Glenn Lockwood, Shane Snyder, Wucherl Yoo, Kevin Harms, Zachary Nault, Suren Byna, Philip Carns, Nicholas Wright, "UMAMI: A Recipe for Generating Meaningful Metrics through Holistic I/O Performance Analysis", 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS), 2017 (Held in conjunction with SC17), November 14, 2017,

Kesheng Wu

2021

A. Syal, A. Lazar, J. Kim, A. Sim, K. W, "Network Traffic Performance Analysis and Anomaly Detection using Supervised Machine Learning", International Journal of Big Data Intelligence, Special Issue on Systems and Network Telemetry and Analytics, 2021,

Donghun Koo, Jaehwan Lee, Jialin Liu, Eun-Kyu Byun, Jae-Hyuck Kwak, Glenn K Lockwood, Soonwook Hwang, Katie Antypas, Kesheng Wu, Hyeonsang Eom, "An empirical study of I/O separation for burst buffers in HPC systems", Journal of Parallel and Distributed Computing, 2021, 148:96-108, doi: 10.1016/j.jpdc.2020.10.007

2020

Ling Jin, Alina Lazar, James Sears, Annika Todd, Alex Sim, Kesheng Wu, Hung-Chai Yang, C. Anna Spurlock, "Clustering Life Course to Understand the Heterogeneous Effects of Life Events, Gender and Generation on Habitual Travel Modes", IEEE Access, 2020, 1-17, doi: 10.1109/ACCESS.2020.3032328

B. Weinger, J. Kim, A. Sim, M. Nakashima, N. Moustafa, K. Wu, "Enhancing IoT Anomaly Detection Performance for Federated Learning", The 16th IEEE International Conference on Mobility, Sensing and Networking (IEEE MSN 2020), 2020,

B. Cho, T. Dayrit, Y. Gao, Z. Wang, T. Hong, A. Sim, K. Wu, "Effective Missing Value Imputation Methods for Building Monitoring Data", The 2nd International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD 2020) in conjunction with IEEE International Conference on Big Data (IEEE BigData 2020), 2020,

Veronica Rodr\iguez Tribaldos, Nathaniel J Lindsey, Shan Dou, Craig Ulrich, Michelle Robertson, Bin Dong, Vincent Dumont, Kesheng Wu, Inder Monga, Chris Tracy, others, Combining Ambient Noise and Distributed Acoustic Sensing (DAS) Deployed on Dark Fiber Networks for High-resolution Imaging at the Basin Scale, AGU Fall Meeting 2020, 2020,

V. Dumont, V. Rodriguez Tribaldos, J. Ajo-Franklin, K. Wu, "Deep Learning for Surface Wave Identification in Distributed Acoustic Sensing Data", IEEE BigData 2020, December 8, 2020,

J. Kim, A. Sim, J. Kim, K. Wu, "Botnets Detection Using Recurrent Variational Autoencoder", IEEE Global Communications Conference (Globecom 2020), 2020,

Brett Weinger, Alex Sim (Advisor), John Wu (Advisor), Jinoh Kim (Advisor), "Enhancing IoT Anomaly Detection Performance for Federated Learning", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’20), ACM Student Research Competition (SRC), 2020,

B Mohammed, M Kiran; N Krishnaswamy; Keshang, Wu, "Predicting WAN Traffic Volumes using Fourier and Multivariate SARIMA Approach", International Journal of Big Data Intelligence, November 3, 2020,

Jonathan Blair Ajo-Franklin, Ver\ onica Rodr\ \iguez Tribaldos, Avinash Nayak, Nathaniel J Lindsey, Feng Cheng, Benxin Chi, Bin Dong, Kesheng Wu, Inder Monga, Distributed Acoustic Sensing (DAS) at the Plot to Basin Scale: Connecting Near-Surface Sensing and Seismology with a Common Observational Tool, AGU Fall Meeting 2020, 2020,

V. Dumont, V. Rodriguez Tribaldos, J. Ajo-Franklin, K. Wu, "Deep Learning on Real Geophysical Data: A Case Study for Distributed Acoustic Sensing Research", NeurIPS "Machine Learning and the Physical Sciences" workshop, 2020,

C. A. Spurlock, A. Gopal, J. Auld, P. Leiby, C. Sheppard, T. Wenzel, S. Belal, A. Duvall, A. Enam, S. Fujita, A. Henao, L. Jin, E. Kontou, A. Lazar, Z. Needell, C. Rames, T. Rashidi, J. Sears, A. Sim, M. Stinson, M. Taylor, A. Todd-Blick, O. Verbas, V. Walker, J. Ward, G. Wong-Parodi, K. Wu, H.-C. Yang, "SMART Mobility, Mobility Decision Science Capstone Report", Vehicle Technologies Office (VTO), Office of Energy Efficiency and Renewable Energy (EERE), US Department of Energy, 2020,

Bin Dong, Ver\ onica Rodr\ \iguez Tribaldos, Xin Xing, Suren Byna, Jonathan Ajo-Franklin, Kesheng Wu, "DASSA: Parallel DAS Data Storage and Analysis for Subsurface Event Detection", 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), July 14, 2020, 254--263,

Sunggon Kim, Alex Sim, Kesheng Wu, Suren Byna, Yongseok Son, Hyeonsang Eom, "Towards hpc i/o performance prediction through large-scale log analysis", Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020), 2020, 77--88, doi: 10.1145/3369583.3392678

Gaurav R Ghosal, Dipak Ghosal, Alex Sim, Aditya V Thakur, Kesheng Wu, "A Deep Deterministic Policy Gradient Based Network Scheduler For Deadline-Driven Data Transfers", Proceedings of International Federation for Information Processing (IFIP) Networking Conference (NETWORKING 2020), 2020, 253--261,

Jeeyung Kim, Alex Sim, Jinoh Kim, Kesheng Wu, Jaegyoon Hahm, "Transfer Learning Approach for Botnet Detection Based on Recurrent Variational Autoencoder", ACM International Workshop on ​System and Network Telemetry and Analysis (SNTA 2020), in conjunction with The 29th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020), 2020, 41--47, doi: 10.1145/3391812.3396273

Jiwoo Bang, Chungyong Kim, Kesheng Wu, Alex Sim, Suren Byna, Sunggon Kim, Hyeonsang Eom, "HPC Workload Characterization Using Feature Selection and Clustering", ACM International Workshop on ​System and Network Telemetry and Analysis (SNTA 2020), in conjunction with The 29th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020), 2020, 33--40, doi: 10.1145/3391812.3396270

S. Bhandari, A. K. Kukreja, A. Lazar, A. Sim, K. Wu, "Feature Selection and Tree-based Classification for Wireless Intrusion Detection", the 3rd ACM International Workshop on System and Network Telemetry and Analysis (SNTA) 2020, in conjunction with The 29th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2020, doi: 10.1145/3391812.3396274

Qiao Kang, Alex Sim, Peter Nugent, Sunwoo Lee, Wei-keng Liao, Ankit Agrawal, Alok Choudhary, Kesheng Wu, "Predicting Resource Requirement in Intermediate Palomar Transient Factory Workflow", 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID 2020), 2020, 619--628, doi: 10.1109/CCGrid49817.2020.00-31

L. Jin, A. Lazar, J. Sears, A. Todd, A. Sim, K. Wu, C. A. Spurlock, "Life Course as a Contextual System to Investigate the Effects of Life Events, Gender, and Generation on Travel Mode Use", Transportation Research Board (TRB) 99th Annual Meeting, 2020,

Jeeyung Kim, Alex Sim, Jinoh Kim, Kesheng Wu, Botnet Detection Using Recurrent Variational Autoencoder, arXiv preprint arXiv:2004.00234, 2020,

2019

A. Lazar, A. Ballow, L. Jin, C. A. Spurlock, A. Sim, K. Wu, "Machine Learning for Prediction of Mid to LongTerm Habitual Transportation Mode Use", International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD), in conjunction with the IEEE International Conference on Big Data (Big Data), 2019, doi: 10.1109/BigData47090.2019.9006411

L. Jin, A. Lazar, J. Sears, A. Todd, A. Sim, K. Wu, C. A. Spurlock, Life course as a contextual system to investigate the effects of life events, gender and generation on travel mode usage, The Behavior, Energy & Climate Change Conference (BECC), 2019,

P. Linton, W. Melodia, A. Lazar, D. Agarwal, L. Bianchi, D. Ghoshal, K. Wu, G. Pastorello, L. Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), 2019,

Alexandra Ballow, Alina Lazar (Advisor), Alex Sim (Advisor), Kesheng Wu (Advisor), "Handling Missing Values in Joint Sequence Analysis", ACM Richard Tapia Celebration of Diversity in Computing (TAPIA 2019), ACM Student Research Competition (SRC), First place winner, 2019,

Antoine Bambade, Kesheng Wu, "An Assessment of the Prediction Quality of VPIN", Advanced Analytics and Artificial Intelligence Applications, (IntechOpen: 2019)

S. Kim, A. Sim, K. Wu, S. Byna, T. Wang, Y. Son, H. Eom, "DCA-IO: A Dynamic I/O Control Scheme for Parallel and Distributed File System", 19th Annual IEEE/ACM International Symposium in Cluster, Cloud, and Grid Computing (CCGrid 2019), 2019, doi: 10.1109/CCGRID.2019.00049

Alexandra Ballow, Alina Lazar, Alex Sim, Kesheng Wu, "Joint Sequence Analysis Challenges: How to Handle Missing Values and Mixed Variable Types", SIAM Conference on Computational Science and Engineering (CSE19), 2019,

Tyler Leibengood, Alina Lazar, Alex Sim, Kesheng Wu, "Network Traffic Performance Prediction with Multivariate Clusters in Time Windows", SIAM Conference on Computational Science and Engineering (CSE19), 2019,

Jung Heon Song, Marcos L\ opez de Prado, Horst D Simon, Kesheng Wu, Extracting Signals from High-Frequency Trading with Digital Signal Processing Tools, The Journal of Financial Data Science, Pages: 124--138 2019,

Olivia Del Guercio, Rafael Orozco, Alex Sim, Kesheng Wu, "Multidimensional Compression with Pattern Matching", 2019 Data Compression Conference (DCC), Pages: 567--567 2019,

Bin Dong, Kesheng Wu, Suren Byna, Houjun Tang, "SLOPE: Structural Locality-Aware Programming Model for Composing Array Data Analysis", International Conference on High Performance Computing, January 1, 2019, 61--80,

Alina Lazar, Ling Jin, C Anna Spurlock, Kesheng Wu, Alex Sim, Annika Todd, "Evaluating the effects of missing values and mixed data types on social sequence clustering using t-SNE visualization", Journal of Data and Information Quality (JDIQ), 2019, 11:1--22,

Sambit Shukla, Dipak Ghosal, Kesheng Wu, Alex Sim, Matthew Farrens, "Co-optimizing Latency and Energy for IoT services using HMP servers in Fog Clusters", 2019 Fourth International Conference on Fog and Mobile Edge Computing (FMEC), 2019, 121--128,

Hanul Sung, Jiwoo Bang, Alexander Sim, Kesheng Wu, Hyeonsang Eom, "Understanding Parallel I/O Performance Trends Under Various HPC Configurations", Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, 2019, 29--36,

Mengtian Jin, Youkow Homma, Alex Sim, Wilko Kroeger, Kesheng Wu, "Performance prediction for data transfers in LCLS workflow", Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, 2019, 37--44,

Olivia Del Guercio, Rafael Orozco, Alex Sim, Kesheng Wu, "Similarity-based Compression with Multidimensional Pattern Matching", Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, 2019, 19--24,

Astha Syal, Alina Lazar, Jinoh Kim, Alex Sim, Kesheng Wu, "Automatic detection of network traffic anomalies and changes", Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, 2019, 3--10,

Bin Dong, Patrick Kilian, Xiaocan Li, Fan Guo, Suren Byna, Kesheng Wu, "Terabyte-scale Particle Data Analysis: An ArrayUDF Case Study", Proceedings of the 31st International Conference on Scientific and Statistical Database Management, January 1, 2019, 202--205,

Dipak Ghosal, Sambit Shukla, Alex Sim, Aditya V Thakur, Kesheng Wu, "A Reinforcement Learning Based Network Scheduler For Deadline-Driven Data Transfers", 2019 IEEE Global Communications Conference (GLOBECOM), 2019, 1--6,

Qiao Kang, Ankit Agrawal, Alok Choudhary, Alex Sim, Kesheng Wu, Rajkumar Kettimuthu, Peter H Beckman, Zhengchun Liu, Wei-keng Liao, "Spatiotemporal Real-Time Anomaly Detection for Supercomputing Systems", 2019 IEEE International Conference on Big Data (Big Data), 2019, 4381--4389,

Burak Cetin, Alina Lazar, Jinoh Kim, Alex Sim, Kesheng Wu, "Federated Wireless Network Intrusion Detection", 2019 IEEE International Conference on Big Data (Big Data), Pages: 6004--6006 2019,

Beytullah Yildiz, Kesheng Wu, Suren Byna, Arie Shoshani, "Parallel membership queries on very large scientific data sets using bitmap indexes", Concurrency and Computation: Practice and Experience, January 1, 2019, 31:e5157,

Many scientific applications produce very large amounts of data as advances in hardware fuel computing and experimental facilities. Managing and analyzing massive quantities of scientific data is challenging as data are often stored in specific formatted files, such as HDF5 and NetCDF, which do not offer appropriate search capabilities. In this research, we investigated a special class of search capability, called membership query, to identify whether queried elements of a set are members of an attribute. Attributes that naturally have classification values appear frequently in scientific domains such as category and object type as well as in daily life such as zip code and occupation. Because classification attribute values are discrete and require random data access, performing a membership query on a large scientific data set creates challenges. We applied bitmap indexing and parallelization to membership queries to overcome these challenges. Bitmap indexing provides high performance not only for low cardinality attributes but also for high cardinality attributes, such as floating‐point variables, electric charge, or momentum in a particle physics data set, due to compression algorithms such as Word‐Aligned Hybrid. We conducted experiments, in a highly parallelized environment, on data obtained from a particle accelerator model and a synthetic data set.

Payton A Linton, William M Melodia, Alina Lazar, Deborah Agarwal, Ludovico Bianchi, Devarshi Ghoshal, Kesheng Wu, Gilberto Pastorello, Lavanya Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", 2019,

Kesheng Wu, Alex Sim, Jonathan Wang, Seongwook Hwangbo, Methods, systems, and devices for accurate signal timing of power component events, 2019,

US Patent app no. 20190138371, “Methods, systems, and devices for accurate signal timing of power component events”

Devarshi Ghoshal, Kesheng Wu, Eric Pouyoul, Erich Strohmaier, "Analysis and Prediction of Data Transfer Throughput for Data-Intensive Workloads", 2019 IEEE International Conference on Big Data (Big Data), 2019, 3648--3657,

Jongbeen Han, Heemin Kim, Hyeonsang Eom, Jonathan Coignard, Kesheng Wu, Yongseok Son, "Enabling SQL-Query Processing for Ethereum-based Blockchain Systems", Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics, 2019, 1--7,

2018

Karen Tu, Alex Sim (Advisor), John Wu (Advisor), "Identification of Network Data Transfer Bottlenecks in HPC Systems", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’18), ACM Student Research Competition (SRC), 2018,

Hongyuan Zhan, Gabriel Gomes, Xiaoye S Li, Kamesh Madduri, Kesheng Wu, "Efficient Online Hyperparameter Optimization for Kernel Ridge Regression with Applications to Traffic Time Series Prediction", arXiv preprint arXiv:1811.00620, 2018,

Weijie Zhao, Florin Rusu, Kesheng Wu, Peter Nugent, "Automatic identification and classification of Palomar Transient Factory astrophysical objects in GLADE", International Journal of Computational Science and Engineering, 2018, 16:337--349,

Haoyuan Xing, Sofoklis Floratos, Spyros Blanas, Suren Byna, Prabhat, Kesheng Wu, and Paul Brown,, "ArrayBridge: Interweaving declarative array processing with imperative high-performance computing", 34th IEEE International Conference on Data Engineering (ICDE) 2018, April 17, 2018,

Haoyuan Xing, Sofoklis Floratos, Spyros Blanas, Suren Byna, M Prabhat, Kesheng Wu, Paul Brown, "ArrayBridge: Interweaving declarative array processing in SciDB with imperative HDF5-based programs", 2018 IEEE 34th International Conference on Data Engineering (ICDE), 2018, 977--988,

Taehoon Kim, Jaesik Choi, Dongeun Lee, Alex Sim, C Anna Spurlock, Annika Todd, Kesheng Wu, "Predicting baseline for analysis of electricity pricing", International Journal of Big Data Intelligence, 2018, 5:3--20,

Hongyuan Zhan, Gabriel Gomes, Xiaoye S Li, Kamesh Madduri, Alex Sim, Kesheng Wu, "Consensus ensemble system for traffic flow prediction", IEEE Transactions on Intelligent Transportation Systems, 2018, 19:3903--3914,

Cecilia Dao, Xinyu Liu, Alex Sim, Craig Tull, Kesheng Wu, "Modeling data transfers: change point and anomaly detection", 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), 2018, 1589--1594,

Kesheng Wu, Horst D Simon, "High-Performance Computational Intelligence and Forecasting Technologies", 2018,

Junmin Gu, Scott Klasky, Norbert Podhorszki, Ji Qiang, Kesheng Wu, "Querying large scientific data sets with adaptable IO system ADIOS", Asian Conference on Supercomputing Frontiers, 2018, 51--69,

Rajkumar Kettimuthu, Zhengchun Liu, Ian Foster, Peter H Beckman, Alex Sim, Kesheng Wu, Wei-keng Liao, Qiao Kang, Ankit Agrawal, Alok Choudhary, "Towards autonomic science infrastructure: architecture, limitations, and open issues", Proceedings of the 1st International Workshop on Autonomous Infrastructure for Science, 2018, 1--9,

Mengying Yang, Xinyu Liu, Wilko Kroeger, Alex Sim, Kesheng Wu, "Identifying anomalous file transfer events in LCLS workflow", Proceedings of the 1st International Workshop on Autonomous Infrastructure for Science, 2018, 1--4,

Sowmya Balasubramanian, Dipak Ghosal, Kamala Narayanan Balasubramanian Sharath, Eric Pouyoul, Alex Sim, Kesheng Wu, Brian Tierney, "Auto-tuned publisher in a pub/sub system: Design and performance evaluation", 2018 IEEE International Conference on Autonomic Computing (ICAC), 2018, 21--30,

Jonathan Wang, Kesheng Wu, Alex Sim, Seongwook Hwangbo, "Feature Engineering and Classification Models for Partial Discharge in Power Transformers", Mij, 2018, 1001:60,

Tal Shachaf, Alexander Sim, Kesheng Wu, Wilko Kroeger, "Detecting Anomalies in the LCLS Workflow", 2018 IEEE International Conference on Big Data (Big Data), 2018, 3256--3260,

Alina Lazar, Kesheng Wu, Alex Sim, "Predicting Network Traffic Using TCP Anomalies", 2018 IEEE International Conference on Big Data (Big Data), Pages: 5369--5371 2018,

Bin Dong, Teng Wang, Houjun Tang, Quincey Koziol, Kesheng Wu, Suren Byna, "ARCHIE: Data analysis acceleration with array caching in hierarchical storage", 2018 IEEE International Conference on Big Data (Big Data), January 1, 2018, 211--220,

Xin Xing, Bin Dong, Jonathan Ajo-Franklin, Kesheng Wu, "Automated Parallel Data Processing Engine with Application to Large-Scale Feature Extraction", 2018 IEEE/ACM Machine Learning in HPC Environments (MLHPC), January 1, 2018, 37--46,

Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, Anna YQ Ho, Peter Nugent, "Distributed caching for processing raw arrays", Proceedings of the 30th International Conference on Scientific and Statistical Database Management, 2018, 1--12,

Hongyuan Zhan, Gabriel Gomes, Xiaoye S Li, Kamesh Madduri, Kesheng Wu, "Efficient online hyperparameter learning for traffic flow prediction", 2018 21st International Conference on Intelligent Transportation Systems (ITSC), 2018, 164--169,

2017

Shashanka Ubaru, Kesheng Wu, Kristofer E. Bouchard, "UoI-NMF Cluster: A Robust Nonnegative Matrix Factorization Algorithm for Improved Parts-Based Decomposition and Reconstruction of Noisy Data", the 16th IEEE International Conference on Machine Learning and Applications (ICMLA 2017), 2017, 241-248, doi: 10.1109/ICMLA.2017.0-152

Ling Jin, Doris Lee, Alex Sim, Sam Borgeson, Kesheng Wu, C Anna Spurlock, Annika Todd, "Comparison of clustering techniques for residential energy behavior using smart meter data", 2017,

Dongeun Lee, Alex Sim, Jaesik Choi, Kesheng Wu, "Expanding statistical similarity based data reduction to capture diverse patterns", 2017 Data Compression Conference (DCC), Pages: 445--445 2017,

Jonathan Wang, Wucherl Yoo, Alex Sim, Peter Nugent, Kesheng Wu, "Parallel variable selection for effective performance prediction", 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2017, 208--217,

Dongeun Lee, Alex Sim, Jaesik Choi, Kesheng Wu, "Improving statistical similarity based data reduction for non-stationary data", Proceedings of the 29th International Conference on Scientific and Statistical Database Management, 2017, 1--6,

Updated experiment version: https://sdm.lbl.gov/oapapers/ssdbm17-lee-upd.pdf
Original version: http://dl.acm.org/citation.cfm?doid=3085504.3085583

Kesheng Wu, Dongeun Lee, Alex Sim, Jaesik Choi, "Statistical data reduction for streaming data", 2017 New York Scientific Data Summit (NYSDS), 2017, 1--6,

Jonathan Wang, Kesheng Wu, Alex Sim, Seongwook Hwangbo, "Convolutional Filtering for Accurate Signal Timing from Noisy Streaming Data", 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech, 2017, 941--948,

Jonathan Wang, Kesheng Wu, Alex Sim, Seongwook Hwangbo, "Feature Engineering and Classification Models for Partial Discharge Events in Power Transformers", Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, Pages: 269--270 2017,

Alina Lazar, Ling Jin, C Anna Spurlock, Kesheng Wu, Alex Sim, "Data quality challenges with missing values and mixed types in joint sequence analysis", 2017 IEEE International Conference on Big Data (Big Data), 2017, 2620--2627,

Peter Harrington, Wucherl Yoo, Alexander Sim, Kesheng Wu, "Diagnosing parallel I/O bottlenecks in HPC applications", International Conference for High Performance Computing Networking Storage and Analysis (SCI7) ACM Student Research Competition (SRC), 2017,

Jonathan Wang, Kesheng Wu, Alex Sim, Seongwook Hwangbo, "Accurate signal timing from high frequency streaming data", 2017 IEEE International Conference on Big Data (Big Data), Pages: 4852--4854 2017,

Bin Dong, Kesheng Wu, Surendra Byna, Jialin Liu, Weijie Zhao, Florin Rusu, "ArrayUDF: User-defined scientific data analysis on arrays", Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, January 1, 2017, 53--64,

Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, Peter Nugent, "Incremental view maintenance over array data", Proceedings of the 2017 ACM International Conference on Management of Data, January 1, 2017, 139--154,

Tzuhsien Wu, Jerry Chou, Shyng Hao, Bin Dong, Scott Klasky, Kesheng Wu, "Optimizing the query performance of block index through data analysis and I/O modeling", Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, January 1, 2017, 1--10,

2016

M. Bae, W. Yoo (Advisor), A. Sim (Advisor), K. Wu (Advisor), "Discovering Energy Resource Usage Patterns on Scientific Clusters", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), ACM Student Research Competition (SRC), Third place winner, 2016, 2016,

M. Bryson, S. Byna (Advisor), A. Sim (Advisor), K. Wu (Advisor), "The Search for Missing Parallel IO Performance on the Cori Supercomputer", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), ACM Student Research Competition (SRC), 2016,

Bin Dong, Surendra Byna, Kesheng Wu, "SDS-Sort: Scalable Dynamic Skew-aware Parallel", HPDC 16, New York, NY, USA, ACM, 2016, 57--68, doi: 10.1145/2907294.2907300

Deborah A Agarwal, Boris Faybishenko, Vicky L Freedman, Harinarayan Krishnan, Gary Kushner, Carina Lansing, Ellen Porter, Alexandru Romosan, Arie Shoshani, Haruko Wainwright, others, "A science data gateway for environmental management", Concurrency and Computation: Practice and Experience, 2016, 28:1994--2004,

Xiaocheng Zou, David A Boyuka II, Dhara Desai, Daniel F Martin, Suren Byna, Kesheng Wu, "AMR-aware in situ indexing and scalable querying", Proceedings of the 24th High Performance Computing Symposium, January 1, 2016, 26,

Bin Dong, Surendra Byna, Kesheng Wu, "Sds-sort: Scalable dynamic skew-aware parallel sorting", Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, January 1, 2016, 57--68,

Houjun Tang, Suren Byna, Steve Harenberg, Xiaocheng Zou, Wenzhao Zhang, Kesheng Wu, Bin Dong, Oliver Rubel, Kristofer Bouchard, Scott Klasky, others, "Usage pattern-driven dynamic data layout reorganization", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), January 1, 2016, 356--365,

Wenzhao Zhang, Houjun Tang, Steve Harenberg, Surendra Byna, Xiaocheng Zou, Dharshi Devendran, Daniel F Martin, Kesheng Wu, Bin Dong, Scott Klasky, others, "Amrzone: A runtime amr data sharing framework for scientific applications", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), January 1, 2016, 116--125,

Wucherl Yoo, Michelle Koo, Yi Cao, Alex Sim, Peter Nugent, Kesheng Wu, "Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters", Conquering Big Data with High Performance Computing, (Springer, Cham: 2016) Pages: 139--161

Dongeun Lee, Alex Sim, Jaesik Choi, Kesheng Wu, "Novel data reduction based on statistical similarity", Proceedings of the 28th International Conference on Scientific and Statistical Database Management, 2016, 1--12,

Wucherl Yoo, Alex Sim, Kesheng Wu, "Machine learning based job status prediction in scientific clusters", 2016 SAI Computing Conference (SAI), 2016, 44--53,

David Pugmire, James Kress, Jong Choi, Scott Klasky, Tahsin Kurc, Randy Michael Churchill, Matthew Wolf, Greg Eisenhower, Hank Childs, Kesheng Wu, others, "Visualization and analysis for near-real-time decision making in distributed workflows", 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2016, 1007--1013,

Lingfei Wu, Kesheng John Wu, Alex Sim, Michael Churchill, Jong Y Choi, Andreas Stathopoulos, Choong-Seock Chang, Scott Klasky, "Towards real-time detection and tracking of spatio-temporal features: Blob-filaments in fusion plasma", IEEE Transactions on Big Data, 2016, 2:262--275,

Bin Dong, Suren Byna, Kesheng Wu, Hans Johansen, Jeffrey N Johnson, Noel Keen, others, "Data elevator: Low-contention data movement in hierarchical storage system", 2016 IEEE 23rd international conference on high performance computing (HiPC), January 1, 2016, 152--161,

Tzuhsien Wu, Hao Shyng, Jerry Chou, Bin Dong, Kesheng Wu, "Indexing blocks to reduce space and time requirements for searching large data files", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), January 1, 2016, 398--402,

Utkarsh Ayachit, Andrew Bauer, Earl PN Duque, Greg Eisenhauer, Nicola Ferrier, Junmin Gu, Kenneth E Jansen, Burlen Loring, Zarija Lukic, Suresh Menon, others, "Performance analysis, design considerations, and applications of extreme-scale in situ infrastructures", SC 16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2016, 921--932, LBNL 1007264,

Wenzhao Zhang, Houjun Tang, Stephen Ranshous, Surendra Byna, Daniel F Mart\ \in, Kesheng Wu, Bin Dong, Scott Klasky, Nagiza F Samatova, "Exploring memory hierarchy and network topology for runtime AMR data sharing across scientific applications", 2016 IEEE International Conference on Big Data (Big Data), January 1, 2016, 1359--1366,

D. Pugmire, J. Kress, J. Choi, S. Klasky, Kurc, R. M. Churchill, M. Wolf, G., H. Childs, K. Wu, A. Sim, J. Gu, J. Low, "Visualization and Analysis for Near-Real-Time Decision in Distributed Workflows", 2016 IEEE International Parallel and Distributed Symposium Workshops (IPDPSW), 2016, 1007--1013, doi: 10.1109/IPDPSW.2016.175

2015

Jinoh Kim, Bin Dong, Suren Byna, and Kesheng Wu, "Security for the Scientific Data Service Framework", 2nd International Workshop on Privacy and Security of Big Data (PSBD 2015), in conjunction with IEEE BigData 2015, 2015,

Xiaocheng (Chris) Zou, Suren Byna, Hans Johansen, Daniel Martin, Nagiza F. Samatova, Arie Shoshani, John Wu, "Six-fold Speedup of Ice Calving Detection Achieved by AMR-aware Parallel Connected Component Labeling", SciDAC PI Meeting, July 2015, 2015,

Xiaocheng Zou, Kesheng Wu, David A. Boyuka, Daniel F. Martin, Suren Byna, Houjun, Kushal Bansal, Terry J. Ligocki, Hans Johansen, and Nagiza F. Samatova, "Parallel In Situ Detection of Connected Components Adaptive Mesh Refinement Data", Proceedings of the Cluster, Cloud and Grid Computing (CCGrid) 2015, 2015,

S. Shannigrahi, A. J. Barczyk, C. Papadopoulos, A. Sim, I. Monga, H. Newman, K. Wu, E. Yeh, "Named Data Networking in Climate Research and HEP Applications", 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015), 2015,

David H Bailey, Stephanie Ger, Marcos L\ opez de Prado, Alexander Sim, "Statistical overfitting and backtest performance", Risk-Based and Factor Investing, 2015,

http://ssrn.com/abstract=2507040

Wucherl Yoo, Michelle Koo, Yi Cao, Alex Sim, Peter Nugent, Kesheng Wu, "Patha: Performance analysis tool for hpc applications", 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC), 2015, 1--8,

Taehoon Kim, Dongeun Lee, Jaesik Choi, Anna Spurlock, Alex Sim, Annika Todd, Kesheng Wu, "Extracting baseline electricity usage using gradient tree boosting", 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), 2015, 734--741,

L. Wu, K. Wu, A. Sim, M. Churchill, J. Y. Choi, A. Stathopoulos, C.S. Chang, S. Klasky, "Towards Real-Time Detection and Tracking of Blob-Filaments in Fusion Plasma Big Data", WM-CS-2015-01, Department of Computer Science, College of William and Mary, 2015,

Bin Dong, Surendra Byna, Kesheng Wu, "Heavy-tailed distribution of parallel I/O system response time", Proceedings of the 10th Parallel Data Storage Workshop, 2015, 37--42,

Bin Dong, Surendra Byna, Kesheng Wu, "Spatially clustered join on heterogeneous scientific data sets", 2015 IEEE International Conference on Big Data (Big Data), 2015, 371--380,

W. Yoo, M. Koo, Y. Cao, A. Sim, P. Nugent, K. Wu, PATHA: Performance Analysis Tool for HPC, 2015 IEEE 34th International Performance Computing and Conference (IPCCC), Pages: 1--8 2015, doi: 10.1109/PCCC.2015.7410313

Taehoon Kim, Dongeun Lee, Jaesik Choi, C. Anna Spurlock, Alex Sim, Annika Todd, Kesheng Wu, "Extracting Baseline Electricity Usage with Gradient Boosting", International Conference on Big Intelligence and Computing (DataCom 2015), 2015, doi: 10.1109/SmartCity.2015.156

2014

John Wu, Alex Sim, Lingfei Wu, Abraham Frankl, Scott Klasky, Jong Y Choi, CS Chang, Michael Churchill, "Exercising ICEE Framework with Fusion Blob Detection", DOE/ASCR NGNS PI meeting, 2014,

Bin Dong, Surendra Byna, Kesheng Wu, "Parallel query evaluation as a Scientific Data Service", 2014 IEEE International Conference on Cluster Computing (CLUSTER), January 1, 2014, 194--202,

Lingfei Wu, Kesheng Wu, Alex Sim, Michael Churchill, Jong Y Choi, Andreas Stathopoulos, CS Chang, Scott Klasky, "High-performance outlier detection algorithm for finding blob-filaments in plasma", Proc. of 5rd International Workshop on Big Data Analytics: Challenges and Opportunites (BDAC-14), held in conjunction with ACM/IEEE SC14, 2014,

Lingfei Wu, Kesheng Wu, Alex Sim, Andreas Stathopoulos, "Real-time outlier detection algorithm for finding blob-filaments in plasma", ACM/IEEE SC14 ACM SRC Poster, 2014,

Spyros Blanas, Kesheng Wu, Surendra Byna, Bin Dong, Arie Shoshani, "Parallel Data Analysis Directly on Scientific File Formats", SIGMOD 14, 2014, 385--396, doi: 10.1145/2588555.2612185

Hsuan-Te Chiu, Jerry Chou, Venkat Vishwanath, Surendra Byna, Kesheng Wu, "Simplifying index file structure to improve I/O performance of parallel indexing", 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS), 2014, 576--583,

Jialin Liu, S. Byna, Bin Dong, Kesheng Wu, Chen, "Model-Driven Data Layout Selection for Improving Read", Parallel Distributed Processing Symposium Workshops 2014 IEEE International, 2014, 1708--1716, doi: 10.1109/IPDPSW.2014.190

Jung Heon Song, Marcos L\ opez de Prado, Horst Simon, Kesheng Wu, "Exploring Irregular Time Series Through Non-uniform Fourier Transform", WHPCF 14, Piscataway, NJ, USA, IEEE Press, 2014, 37--44, doi: 10.1109/WHPCF.2014.8

F. Rusu, P. Nugent, K. Wu, "Implementing the Palomar Transient Factory Real-Time Pipeline in GLADE: Results and", Lecture Notes in Computer Science, ( 2014) Pages: 53--66

Jung Heon Song, Kesheng Wu, Horst D Simon, "Parameter Analysis of the VPIN (Volume synchronized of Informed Trading) Metric", Quantitative Financial Risk Management: Theory and, 2014,

Qian Sun, Fan Zhang, Tong Jin, Hoang Bui, Kesheng Wu, Arie Shoshani, Hemanth Kolla, Scott Klasky, Jacqueline Chen, Manish Parashar, "Scalable run-time data indexing and querying for scientific simulations", Big Data Analytics: Challenges and Opportunities (BDAC-14) Workshop at Supercomputing Conference, 2014,

David H. Bailey, Stephanie Ger, Marcos L\ opez Prado, Alexander Sim, Kesheng Wu, "Statistical Overfitting and Backtest Performance", http://ssrn.com/abstract2507040, ( January 1, 2014)

ISBN 978-1-78548-008-9

L. Wu, K. Wu, A. Sim, M. Churchill, J. Y. Choi, A. Stathopoulos, CS Chang, S. Klasky, "High-Performance Outlier Detection Algorithm for Blob-Filaments in Plasma", 5th International Workshop on Big Data Analytics: and Opportunities (BDAC 14), 2014,

2013

Kesheng Wu, Wes Bethel, Ming Gu, David, Oliver R\ ubel, Testing VPIN on Big Data, Available at SSRN 2318259, 2013,

UC Berkeley, William Gu, Jaesik Choi, Ming Gu, Horst Simon, Kesheng Wu, "Fast Change Point Detection for Electricity Market Analysis", January 1, 2013, LBNL LBNL-6388E,

Kesheng Wu, E Bethel, Ming Gu, David Leinweber, Oliver R\ ubel, "A big data approach to analyzing market volatility", Algorithmic Finance, 2013, 2:241--267, LBNL LBNL-6382E,

Understanding the microstructure of the financial market requires the processing of a vast amount of data related to individual trades, and sometimes even multiple levels of quotes. Analyzing such a large volume of data requires tremendous computing power that is not easily available to financial academics and regulators. Fortunately, public funded High Performance Computing (HPC) power is widely available at the National Laboratories in the US. In this paper we demonstrate that the HPC resource and the techniques for data-intensive sciences can be used to greatly accelerate the computation of an early warning indicator called Volume-synchronized Probability of Informed trading (VPIN). The test data used in this study contains five and a half year's worth of trading data for about 100 most liquid futures contracts, includes about 3 billion trades, and takes 140GB as text files. By using (1) a more efficient file format for storing the trading records, (2) more effective data structures and algorithms, and (3) parallelizing the computations, we are able to explore 16,000 different ways of computing VPIN in less than 20 hours on a 32-core IBM DataPlex machine. Our test demonstrates that a modest computer is sufficient to monitor a vast number of trading activities in real-time -- an ability that could be valuable to regulators.

Our test results also confirm that VPIN is a strong predictor of liquidity-induced volatility. With appropriate parameter choices, the false positive rates are about 7% averaged over all the futures contracts in the test data set. More specifically, when VPIN values rise above a threshold (CDF > 0.99), the volatility in the subsequent time windows is higher than the average in 93% of the cases.

Jong Y Choi, Kesheng Wu, Jacky C Wu, Alex Sim, Qing G Liu, Matthew Wolf, C Chang, Scott Klasky, "Icee: Wide-area in transit data processing framework for near real-time scientific applications", 4th SC Workshop on Petascale (Big) Data Analytics: Challenges and Opportunities in conjunction with SC13, 2013, 11,

Bin Dong, Surendra Byna, Kesheng Wu, "Expediting scientific data analysis with reorganization of data", 2013 IEEE International Conference on Cluster Computing (CLUSTER), January 1, 2013, 1--8,

E Wes Bethel, Prabhat Prabhat, Suren Byna, Oliver R\ ubel, K John Wu, Michael Wehner, "Why high performance visual data analytics is both relevant and difficult", Visualization and Data Analysis 2013, January 2013, 8654:86540B, LBNL LBNL-6063E,

Alex Romosan, Arie Shoshani, Kesheng Wu, Victor Markowitz, Kostas Mavrommatis, "Accelerating gene context analysis using bitmaps", Proceedings of the 25th International Conference on Scientific and Statistical Database Management, 2013, 1--12, LBNL 6397E,

Bin Dong, Surendra Byna, Kesheng Wu, "SDS: a framework for scientific data services", Proceedings of the 8th Parallel Data Storage Workshop, January 1, 2013, 27--32,

Kuan-Wu Lin, Surendra Byna, Jerry Chou, Wu, "Optimizing FastQuery performance on Lustre file", Proceedings of the 25th International Conference on and Statistical Database Management, 2013, 29,

2012

E. W. Bethel and D. Leinweber and O. Rubel and K. Wu, "Federal Market Information Technology in the Post Flash Crash Era: Roles of Supercomputing", The Journal of Trading, 2012, 7:9-24, LBNL 5263E, doi: 10.3905/jot.2012.7.2.009

Benson Ma, Arie Shoshani, Alex Sim, Kesheng, Yong-Ik Byun, Jaegyoon Hahm, Min-Su Shin, "Efficient Attribute-Based Data Access in Astronomy", The 2nd International Workshop on Network-Aware Data Workshop (NDM2012), 2012, 562--571,

E. W. Bethel, Surendra Byna, Jerry Chou, Cormier-Michel, Cameron G. R. Geddes, Howison, Fuyu Li, Prabhat, Ji Qiang, R\ ubel, Rob D. Ryne, Michael Wehner, Wu, "Big Data Analysis and Visualization: What Do LINACS Tropical Storms Have In Common?", 11th International Computational Accelerator Physics ICAP 2012, Germany, 2012,

Ichitaro Yamazaki, Kesheng Wu, "A Communication-Avoiding Thick-Restart Lanczos Method a Distributed-Memory System", Lecture Notes in Computer Science, 2012, 7155:345--354, doi: 10.1007/978-3-642-29737-3_39

Surendra Byna, Jerry Chou, Oliver Rubel, Homa Karimabadi, William S Daughter, Vadim Roytershteyn, E Wes Bethel, Mark Howison, Ke-Jou Hsu, Kuan-Wu Lin, others, "Parallel I/O, analysis, and visualization of a trillion particle simulation", SC 12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, January 2012, 1--12,

Oliver R\ ubel, Surendra Byna, Kesheng Wu, Fuyu Li, Michael Wehner, Wes Bethel, others, "Teca: A parallel toolkit for extreme climate analysis", Procedia Computer Science, Elsevier, January 2012, 9:866--876, LBNL 5352E,

We present TECA, a parallel toolkit for detecting extreme events in large climate datasets. Modern climate datasets expose parallelism across a number of dimensions: spatial locations, timesteps and ensemble members. We design TECA to exploit these modes of parallelism and demonstrate a prototype implementation for detecting and tracking three classes of extreme events: tropical cyclones, extra-tropical cyclones and atmospheric rivers. We process a modern TB-sized CAM5 simulation dataset with TECA, and demonstrate good runtime performance for the three case studies.

Allen R Sanderson, Brad Whitlock, H Childs, GH Weber, K Wu, others, "A system for query based analysis and visualization", January 2012, LBNL 5507E,

Elaheh Pourabbas, Arie Shoshani, Kesheng Wu, "Minimizing index size by reordering rows and columns", International Conference on Scientific and Statistical Database Management, January 2012, 467--484,

G. F. Lofstead, Q. Liu, J. Logan, Y. Tian, Abbasi, N. Podhorszki, J. Y. Choi, S., R. Tchoua, R. A. Oldfield, others, "Hello ADIOS: The Challenges and Lessons of Leadership Class I/O Frameworks", 2012,

E. Wes Bethel, David Leinweber, Oliver Rübel Kesheng Wu, Federal Market Information Technology in the Crash Era: Roles for Supercomputing, The Journal of Trading, Pages: 9--25 2012, doi: 10.3905/jot.2012.7.2.009

2011

E. Wes Bethel, David Leinweber, Oliver Rübel, Kesheng Wu, "Federal Market Information Technology in the Post Flash Crash Era: Roles of Supercomputing", Workshop on High Performance Computational Finance at SC11, Seattle, WA, USA, November 2011, LBNL 5263E,

R. Ryne, B. Austin, J. Byrd, J. Corlett, E. Esarey, C. G. R. Geddes, W. Leemans, X. Li, Prabhat, J. Qiang, O. Rübel, J.-L. Vay, M. Venturini, K. Wu, B. Carlsten, D. Higdon and N. Yampolsky, "High Performance Computing in Accelerator Science: Past Successes, Future Challenges", Workshop on Data and Communications in Basic Energy Sciences: Creating a Pathway for Scientific Discovery, October 2011,

A. Shoshani, I. Altintas, J. Chen, G. Chin, A. Choudhary, D. Crawl, T. Critchlow, K. Gao, B. Grimm, H. Iyer, C. Kamath, A. Khan, S. Klasky, S. Koehler, S. Lang, R. Latham, J. W. Li, W. Liao, J. Ligon, Q. Liu, B. Ludaescher, P. Mouallem, M. Nagappan, N. Podhorszki, R. Ross, D. Rotem, N. Samatova, C. Silva, A. Sim, R. Tchoua, R. Thakur, M. Vouk, K. Wu, W. Yu, "The Scientific Data Management Center: Available Technologies and Highlights", SciDAC Conference, 2011,

Surendra Byna, Michael F Wehner, Kesheng John Wu, "Detecting atmospheric rivers in large climate datasets", Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities, 2011, 7--14,

Extreme precipitation events on the western coast of North America are often traced to an unusual weather phenomenon known as atmospheric rivers. Although these storms may provide a significant fraction of the total water to the highly managed western US hydrological system, the resulting intense weather poses severe risks to the human and natural infrastructure through severe flooding and wind damage. To aid the understanding of this phenomenon, we have developed an efficient detection algorithm suitable for analyzing large amounts of data. In addition to detecting actual events in the recent observed historical record, this detection algorithm can be applied to global climate model output providing a new model validation methodology. Comparing the statistical behavior of simulated atmospheric river events in models to observations will enhance confidence in projections of future extreme storms. Our detection algorithm is based on a thresholding condition on the total column integrated water vapor established by Ralph et al. (2004) followed by a connected component labeling procedure to group the mesh points into connected regions in space. We develop an efficient parallel implementation of the algorithm and demonstrate good weak and strong scaling. We process a 30-year simulation output on 10,000 cores in under 3 seconds.

Jerry Chou, Mark Howison, Brian Austin, Kesheng Wu, Ji Qiang, E Wes Bethel, Arie Shoshani, Oliver R\ ubel, Rob D Ryne, "Parallel index and query for large scale data analysis", Proceedings of 2011 international conference for high performance computing, networking, storage and analysis, 2011, 1--11, LBNL 5317E,

M Prabhat, S Byna, C Paciorek, G Weber, K Wu, T Yopes, MF Wehner, G Ostrouchov, D Pugmire, R Strelitz, others, "Pattern Detection and Extreme Value Analysis on Large Climate Data", AGUFM, Pages: IN41C--03 January 2011,

Kesheng Wu, Rishi R Sinha, Chad Jones, Stephane Ethier, Scott Klasky, Kwan-Liu Ma, Arie Shoshani, Marianne Winslett, "Finding regions of interest on toroidal meshes", Computational Science \& Discovery, 2011, 4:015003,

Kesheng Wu, Surendra Byna, Doron Rotem, Arie, "Scientific Data Services -- A High-Performance I/O with Array Semantics", HPCDB, IEEE, 2011, doi: 10.11v45/2125636.2125640

J. Chou, K. Wu, O. R\ ubel, M. Howison, Qiang, Prabhat, B. Austin, E. W. Bethel, D. Ryne, A. Shoshani, "Parallel Index and Query for Large Scale Data", SC11, 2011, doi: 10.1145/2063384.2063424

Jinoh Kim, Hasan Abbasi, Luis Chac\ on, Docan, Scott Klasky, Qing Liu, Norbert, Arie Shoshani, Kesheng Wu, "Parallel In Situ Indexing for Data-intensive", LDAV, 2011, 65--72, doi: 10.1109/LDAV.2011.6092319

Jerry Chou, Kesheng Wu, Prabhat, "FastQuery: A General Indexing and Querying System Scientific Data", SSDBM, 2011, 573--574, doi: 10.1007/978-3-642-22351-8_42

Jerry Chou, Kesheng Wu, Prabhat, "FastQuery: A Parallel Indexing System for Data", IASDS, IEEE, 2011, doi: 10.1109/CLUSTER.2011.86

Jerry Chou, Kesheng Wu, others, "Fastquery: A parallel indexing system for scientific data", 2011 IEEE International Conference on Cluster Computing, 2011, 455--464,

Kamesh Madduri, Kesheng Wu, "Massive-Scale RDF Processing Using Compressed Bitmap", SSDBM, Springer, 2011, 470--479, doi: 10.1007/978-3-642-22351-8_30

2010

D. Hasenkamp, A. Sim, M. Wehner, K. Wu, "Finding Tropical Cyclones on Clouds", Supercomputing 2010, ACM SRC 3rd place, 2010,

Daren Hasenkamp, Alexander Sim, Michael Wehner, Kesheng Wu, "Finding tropical cyclones on a cloud computing cluster: Using parallel virtualization for large-scale climate simulation analysis", 2010 IEEE Second International Conference on Cloud Computing Technology and Science, 2010, 201--208, LBNL 4218E,

 

 

Oliver R\ ubel, Sean Ahern, E Wes Bethel, Mark D Biggin, Hank Childs, Estelle Cormier-Michel, Angela DePace, Michael B Eisen, Charless C Fowlkes, Cameron GR Geddes, others, "Coupling visualization and data analysis for knowledge discovery from multi-dimensional scientific data", Procedia computer science, Elsevier, January 2010, 1:1757--1764, LBNL 3669E,

Gunther Weber, "Recent advances in visit: Amr streamlines and query-driven visualization", 2010,

Kesheng Wu, Arie Shoshani, Kurt Stockinger, "Analyses of multi-level and multi-component compressed indexes", ACM Transactions on Database Systems, ACM, 2010, 35:1--52, doi: 10.1145/1670243.1670245

Kesheng Wu, Kamesh Madduri, Shane Canon, "Multi-level bitmap indexes for flash memory storage", Proceedings of the Fourteenth International Database Engineering \& Applications Symposium, 2010, 114--116,

Ichitaro Yamazaki, Zhaojun Bai, Horst D. Simon Lin-Wang Wang, Kesheng Wu, "Adaptive Projection Subspace Dimension for the Lanczos Method", ACM Transactions on Mathematical Software, 2010, 37, doi: 10.1145/1824801.1824805

2009

E. W. Bethel, C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, M. Day, E. Deines, T. Fogal, C. Garth, C. G. R. Geddes, H. Hagen, B. Hamann, C. Hansen, J. Jacobsen, K. Joy, J. Kruger, J. Meredith, P. Messmer, G. Ostrouchov, V. Pascucci, K. Potter, Prabhat, D. Pugmire, O. Rubel, A. Sanderson, C. Silva, D. Ushizima, G. Weber, B. Whitlock, K. Wu, "Occam's Razor and Petascale Visual Data Analysis", SciDAC 2009, J. of Physics: Conference Series, San Diego, California, July 2009, LBNL 2210E,

Luke Gosink, Kesheng Wu, E. Wes Bethel, John D. Owens, Kenneth I. Joy, "Data Parallel Bin-based Indexing for Answering Queries on Multi-core Architecture", Proceedings of the 21st International Conference on Scientific and Statistical Database Management (SSDBM), June 2009, 5566:110-129, LBNL 2211E,

K Wu, S Ahern, EW Bethel, J Chen, H Childs, C Geddes, J Gu, H Hagen, B Hamann, J Lauret, others, "FastBit: Interactively Searching Massive Data", Proc. of SciDAC 2009, 2009, LBNL 2164E,

Luke J Gosink, Kesheng Wu, E Wes Bethel, John D Owens, Kenneth I Joy, "Data parallel bin-based indexing for answering queries on multi-core architectures", International Conference on Scientific and Statistical Database Management, 2009, 110--129,

 

 

Oliver R\ ubel, Cameron GR Geddes, Estelle Cormier-Michel, Kesheng Wu, Gunther H Weber, Daniela M Ushizima, Peter Messmer, Hans Hagen, Bernd Hamann, Wes Bethel, others, "Automatic beam path analysis of laser wakefield particle acceleration data", Computational Science \& Discovery, January 2009, 2:015005, LBNL 2734E,

Meiyappan Nagappan, Kesheng Wu, Mladen A Vouk, "Efficiently extracting operational profiles from execution logs using suffix arrays", 2009 20th International Symposium on Software Reliability Engineering, January 1, 2009, 41--50,

An important software reliability engineering tool is operational profiles. In this paper we propose a cost effective automated approach for creating second generation operational profiles using execution logs of a software product. Our algorithm parses the execution logs into sequences of events and produces an ordered list of all possible subsequences by constructing a suffix array of the events. The difficulty in using execution logs is that the amount of data that needs to be analyzed is often extremely large (more than a million records per day in many applications). Our approach is very efficient. We show that our approach requires O(N) in space and time to discover all possible patterns in N events. We discuss a practical implementation of the algorithm in the context of the logs from a large cloud computing system.

E Bethel, "Modern Scientific Visualization is More than Just Pretty Pictures", January 2009, LBNL 1450E,

C. G. R. Geddes, E Cormier-Michel, E. H. Esarey, C. B. Schroeder, J.-L. Vay, W. P. Leemans, D. L.. Bruhwiler, J. R. Cary, B. Cowan, M. Durant, P. Hamill, P. Messmer, P. Mullowney, C. Nieter, K. Paul, S. Shasharina, S. Veitzer, G. Weber, O. Rübel, D. Ushizima, Prabhat, E. W.Bethel, K. Wu, Large Fields for Smaller Facility Sources, SciDAC Review, Pages: 13-21, 2009,

Lifeng He, Yuyan Chao, Kenji Suzuki, Kesheng Wu, "Fast connected-component labeling", Pattern recognition, 2009, 42:1977--1987,

Kesheng Wu, Ekow Otoo, Kenji Suzuki, "Optimizing two-pass connected-component labeling", Pattern Analysis \& Applications, 2009, 12:117--135,

2008

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "High Performance Multivariate Visual Data Exploration for Extemely Large Data", Supercomputing (SC), Austin, Texas, USA, November 2008, LBNL 716E,

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "Application of High-performance Visual Analysis Methods to Laser Wakefield Particle Acceleration Data", IEEE Visualization 2008, October 2008,

Meiyappan Nagappan, Mladen A. Vouk, Kesheng Wu Alex Sim, Arie Shoshani, "Efficient Operational Profiling of Systems Using Arrays on Execution Logs", ISSRE, 2008, 313--314, doi: 10.1109/ISSRE.2008.45

Luke J Gosink, "Bin-hash indexing: A parallel method for fast query processing", 2008, LBNL 729E,

I. Yamazaki, K. Wu, H. Simon, "nu-TRLan User Guide version 1.0", 2008, LBNL 1288E,

Kurt Stockinger, John Cieslewicz, Kesheng Wu, Rotem, Arie Shoshani, "Using Bitmap Indexing Technology for Combined and Text Queries", Annals of Information Systems, (Springer: 2008) Pages: 1--23

Rishi Rakesh Sinha, Marianne Winslett, Kesheng, Kurt Stockinger, Arie Shoshani, "Adaptive Bitmap Indexes for Space-Constrained", ICDE 2008, 2008, 1418--1420,

Kesheng Wu, Kurt Stockinger, Arie Shoshani, "Breaking the curse of cardinality on bitmap indexes", International Conference on Scientific and Statistical Database Management, 2008, 348--365,

E. Wes Bethel, Oliver Rübel, Prabhat, Wu, Gunther H. Weber, Valerio Pascucci Hank Childs, Ajith Mascarenhas, Jeremy, Sean Ahern, "Modern Scientific Visualization is More than Just Pictures", Numerical Modeling of Space Plasma Flows: (Astronomical Society of the Pacific Series), St. Thomas, USVI, 2008, 301--317,

2007

Kesheng Wu, "Fastbit reference manual", 2007, LBNL LBNL PUB/3192,

Kesheng Wu, Kurt Stockinger, Arie Shoshani, Performance of Multi-Level and Multi-Component Bitmap Indexes, 2007, doi: 10.1145/1670243.1670245

Frederick Reiss, Kurt Stockinger, Kesheng Wu, Shoshani, Joseph M. Hellerstein, "Enabling Real-Time Querying of Live and Historical Data", SSDBM 2007, 2007,

Kurt Stockinger, Kesheng Wu, "Bitmap indices for data warehouses", Data Warehouses and OLAP: Concepts, Architectures and Solutions, (IGI Global: 2007) Pages: 157--178

2006

Kesheng Wu, Ekow J Otoo, Arie Shoshani, "Optimizing bitmap indices with efficient compression", ACM Transactions on Database Systems (TODS), 2006, 31:1--38,

K. Wu, K. Stockinger, A. Shoshani, Wes, "FastBit--Helps Finding the Proverbial Needle in a", 2006, LBNL LBNL-PUB/963,

Kurt Stockinger, Kesheng Wu, Rene Brun, Canal, "Bitmap indices for fast end-user physics analysis in", Nuclear Instruments and Methods in Physics Research A: Accelerators, Spectrometers, Detectors and Equipment, 2006, 559:99--102,

Luke Gosink, John Shalf, Kurt Stockinger, Wu, Wes Bethel, "HDF5-FastQuery: Accelerating Complex Queries on Datasets using Fast Bitmap Indices", SSDBM 2006, Vienna, Austria, July 2006, IEEE Computer Society Press., 2006, 149--158,

F. Reiss, K. Stockinger, K. Wu, A. Shoshani J. M. Hellerstein, "Efficient analysis of live and historical streaming and its application to cybersecurity", 2006,

2005

Kesheng Wu, Junmin Gu, Jerome Lauret, Arthur Poskanzer, Arie Shoshani, Alexander Sim, Zhang, "Grid Collector: Facilitating Efficient Selective from Data Grids", International Supercomputer Conference 2005, 2005,

K. Wu, E. Otoo, "A simpler proof of the average case complexity of with path compression", 2005,

Kesheng Wu, "FastBit: an efficient indexing technology for data-intensive science", Journal of Physics: Conference Series, IOP Publishing, 2005, 16:556--560, LBNL LBNL-2164E, doi: 10.1088/1742-6596/16/1/077

Kesheng Wu, Ekow Otoo, Arie Shoshani, "Optimizing connected component labeling algorithms", Medical Imaging 2005: Image Processing, 2005, 5747:1965--1976,

Kesheng Wu, Ekow Otoo, Kenji Suzuki, "Two Strategies to Speed up Connected Component Algorithms", 2005,

E. Wes Bethel, Scott Campbell, Eli Dart, Lee, Steven A. Smith, Kurt Stockinger, Tierney, Kesheng Wu, "Interactive Analysis of Large Network Data Collections Query-Driven Visualization", 2005,

Kurt Stockinger, John Shalf, Kesheng Wu, E Wes Bethel, "Query-driven visualization of large data sets", VIS 05. IEEE Visualization, 2005., 2005, 167--174,

2004

Kesheng Wu, Ekow J Otoo, Arie Shoshani, "An efficient compression scheme for bitmap indices", 2004,

Kesheng Wu, Wei-Ming Zhang, Victor, Jerome Lauret, Arie Shoshani, "The Grid Collector: Using an Event Catalog to Speed up Analysis in Distributed Environment", Proceedings of Computing in High Energy and Nuclear (CHEP) 2004, 2004,

K. Wu, A. Shoshani, E. J. Otoo, Word aligned bitmap compression method, data and apparatus, US Patent 6,831,575, 2004,

2003

Kesheng Wu, Wei-Ming Zlang, Alexander Sim, Junmin Gu, Arie Shoshani, "Grid collector: An event catalog with automated file management", 2003 IEEE Nuclear Science Symposium. Conference Record (IEEE Cat. No. 03CH37515), 2003, LBNL 55563,

Kesheng Wu, Wei-Ming Zhang, Alexander Sim, Gu, Arie Shoshani, "Grid Collector: An Event Catalog With Automated File", Proceedings of IEEE Nuclear Science Symposium 2003, 2003, doi: 10.1109/NSSMIC.2003.1351830

2001

L Bernardo, H Nordberg, D Olson, A Shoshani, A Sim, A Vaniachine, D Zimmerman, B Gibbard, R Porter, T Wenaus, others, "New capabilities in the HENP grand challenge storage access system and its application at RHIC", Computer physics communications, 2001, 140:179--188,

2000

L. M. Bernardo, B. Gibbard, D. Malon, H. Nordberg, D. Olson, R. Porter, A. Shoshani, A. Sim, A. Vaniachine, T. Wenaus, K. Wu, D. Zimmerman, "New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC", Computing in High Energy Physics, 2000,

1993

Kesheng Wu, Robert Savit, William Brock, "Statistical tests for deterministic effects in broad time series", Physica D, 1993, 69:172--188, doi: 10.1016/0167-2789(93)90188-7

1969

Jonathan Wang, Wucherl Yoo, Alex Sim, K John Wu, "Analysis of Variable Selection Methods on Scientific Cluster Measurement Data", 1969,

Wucherl Yoo

2017

J. Kim, W. Yoo, A. Sim, S.C. Suh, I. Kim, "A Lightweight Network Anomaly Detection Technique", International Workshop on Computing, Networking and Communications (CNC 2017), 2017, doi: 10.1109/ICCNC.2017.7876251

Jonathan Wang, Wucherl Yoo, Alex Sim, Peter Nugent, Kesheng Wu, "Parallel variable selection for effective performance prediction", 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2017, 208--217,

2016

M. Bae, W. Yoo (Advisor), A. Sim (Advisor), K. Wu (Advisor), "Discovering Energy Resource Usage Patterns on Scientific Clusters", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), ACM Student Research Competition (SRC), Third place winner, 2016, 2016,

Wucherl Yoo, Michelle Koo, Yi Cao, Alex Sim, Peter Nugent, Kesheng Wu, "Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters", Conquering Big Data with High Performance Computing, (Springer, Cham: 2016) Pages: 139--161

Wucherl Yoo, Alex Sim, Kesheng Wu, "Machine learning based job status prediction in scientific clusters", 2016 SAI Computing Conference (SAI), 2016, 44--53,

2015

M. Koo, W. Yoo (advisor), A. Sim (advisor), "I/O Performance Analysis Framework on Measurement Data from Scientific Clusters", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15), ACM Student Research Competition (SRC), 2015, 2015,

W. Yoo, A. Sim, "Network Bandwidth Utilization Forecast Model on High Bandwidth Networks", IEEE International Conference on Computing, Networking and Communications (ICNC’15), 2015,

Wucherl Yoo, Michelle Koo, Yi Cao, Alex Sim, Peter Nugent, Kesheng Wu, "Patha: Performance analysis tool for hpc applications", 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC), 2015, 1--8,

W. Yoo, M. Koo, Y. Cao, A. Sim, P. Nugent, K. Wu, PATHA: Performance Analysis Tool for HPC, 2015 IEEE 34th International Performance Computing and Conference (IPCCC), Pages: 1--8 2015, doi: 10.1109/PCCC.2015.7410313

2014

W. Yoo, A. Sim, "Efficient Changing Pattern Detection on High Bandwidth Network Measurements", 7th International Conference on Grid and Distributed Computing, 2014,

2013

M. Montanari, E. Chan, K. Larson, W. Yoo, R. H. Campbell, "Distributed security policy conformance", Computers & Security, March 31, 2013,

2012

W. Yoo, K. Larson, L. Baugh, S. Kim, R. H. Campbell, "ADP: automated diagnosis of performance pathologies using hardware events", SIGMETRICS '12: Proc. of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems, 2012,

2011

W. Yoo, K. Larson, L. Baugh, S. Kim, W. Ahn, R. H. Campbell, "Automated Fingerprinting of Performance Pathologies Using Performance Monitoring Units (PMUs)", HotPar'11: Proc. of USENIX Workshop on Hot topics in parallelism., May 26, 2011,

M. Montanari, E. Chan, K. Larson, W. Yoo, R. H. Campbell, "Distributed security policy conformance", Future Challenges in Security and Privacy for Academia and Industry, January 1, 2011,

2010

W. Yoo, S. Shi, W. J. Jeon, K. Nahrstedt, R. H. Campbell, "Real-time parallel remote rendering for mobile devices using graphics processing units", ICME '10: IEEE International Conference onMultimedia and Expo, July 19, 2010,

1969

Jonathan Wang, Wucherl Yoo, Alex Sim, K John Wu, "Analysis of Variable Selection Methods on Scientific Cluster Measurement Data", 1969,

Thomas Rodman Yopes

2011

M Prabhat, S Byna, C Paciorek, G Weber, K Wu, T Yopes, MF Wehner, G Ostrouchov, D Pugmire, R Strelitz, others, "Pattern Detection and Extreme Value Analysis on Large Climate Data", AGUFM, Pages: IN41C--03 January 2011,