Careers | Phone Book | A - Z Index

SDM publications

Deb Agarwal

2016

Deborah A Agarwal, Boris Faybishenko, Vicky L, Harinarayan Krishnan, Carina Lansing Gary Kushner, Ellen Porter, Alexandru Romosan Arie Shoshani, Haruko Wainwright, Arthur, Kesheng Wu, "A Science Data Gateway for Environmental Management", Concurrency and Computation: Practice and Experience, 2016, 28:1994--2004, doi: 10.1002/cpe.3697

2012

Karen L. Schuchardt, Deborah A. Agarwal, Stefan A. Finsterle, Carl W. Gable, Ian Gorton, Luke J. Gosink, Elizabeth H. Keating, Carina S. Lansing, Joerg Meyer, William A.M. Moeglein, George S.H. Pau, Ellen A. Porter, Sumit Purohit, Mark L. Rockhold, Arie Shoshani, and Chandrika Sivaramakrishnan, Akuna, "Integrated Toolsets Supporting Advanced Subsurface Flow and Transport Simulations for Environmental Management", XIX International Conference on Computational Methods in Water Resources (CMWR 2012), University of Illinois at Urbana-Champaign, June 2012,

Karen L. Schuchardt, Deborah A. Agarwal, Stefan A. Finsterle, Carl W. Gable, Ian Gorton, Luke J. Gosink, Elizabeth H. Keating, Carina S. Lansing, Joerg Meyer, William A.M. Moeglein, George S.H. Pau, Ellen A. Porter, Sumit Purohit, Mark L. Rockhold, Arie Shoshani, Chandrika Sivaramakrishnan, "Akuna-Integrated Toolsets Supporting Advanced Subsurface Flow and Transport Simulations for Environmental Management", XIX International Conference on Computational Methods in Water Resources (CMWR 2012), University of Illinois at Urbana-Champaign, June 17-22, 2012, 2012,

Brian Austin

2015

Suren Byna, Brian Austin, "Evaluation of Parallel I/O Performance and Energy Consumption with Frequency Scaling on Cray XC30", Cray User Group (CUG) meeting 2015, 2015,

2011

Jerry Chou, Kesheng Wu, Oliver Rübel, Mark Howison, Ji Qiang, Prabhat, Brian Austin, E. Wes Bethel, Rob D. Ryne, and Arie Shoshani, "Parallel Index and Query for Large Scale Data Analysis", In Proceedings of Supercomputing 2011, Seattle, WA, USA, 2011, 1-11, LBNL 5317E, doi: 10.1145/2063384.2063424

R. Ryne, B. Austin, J. Byrd, J. Corlett, E. Esarey, C. G. R. Geddes, W. Leemans, X. Li, Prabhat, J. Qiang, O. Rübel, J.-L. Vay, M. Venturini, K. Wu, B. Carlsten, D. Higdon and N. Yampolsky, "High Performance Computing in Accelerator Science: Past Successes, Future Challenges", Workshop on Data and Communications in Basic Energy Sciences: Creating a Pathway for Scientific Discovery, October 2011,

Zhaojun Bai

2010

Ichitaro Yamazaki, Zhaojun Bai, Horst D. Simon Lin-Wang Wang, Kesheng Wu, "Adaptive Projection Subspace Dimension for the Lanczos Method", ACM Transactions on Mathematical Software, 2010, 37, doi: 10.1145/1824801.1824805

David H. Bailey

2015

David H. Bailey, Stephanie Ger, Marcos Lopez de, Alexander Sim, Kesheng Wu, "Statistical Overfitting and Backtest Performance", Quantitative Finance, 2015,

http://ssrn.com/abstract=2507040

2014

David H. Bailey, Jonathan M. Borwein, Marcos Lopez de Prado, Qiji Jim Zhu, "Pseudo-mathematics and financial charlatanism: The effects of backtest over fitting on out-of-sample performance", Notices of the American Mathematical Society, May 1, 2014, 458-471,

Recent computational advances allow investment managers to search for profitable investment strategies. In many instances, that search involves a pseudo-mathematical argument, which is spuriously validated through a simulation of its historical performance (also called backtest).

We prove that high performance is easily achievable after backtesting a relatively small number of alternative strategy configurations, a practice we denote “backtest overfitting”. The higher the number of configurations tried, the greater is the probability that the backtest is overfit. Because financial analysts rarely report the number of configurations tried for a given backtest, investors cannot evaluate the degree of overfitting in most investment proposals.

The implication is that investors can be easily misled into allocating capital to strategies that appear to be mathematically sound and empirically supported by an outstanding backtest. This practice is particularly pernicious, because due to the nature of financial time series, backtest overfitting has a detrimental effect on the future strategy’s performance.

David H. Bailey, Stephanie Ger, Marcos L\ opez Prado, Alexander Sim, Kesheng Wu, "Statistical Overfitting and Backtest Performance", http://ssrn.com/abstract2507040, ( January 1, 2014)

ISBN 978-1-78548-008-9

Mehmet Balman

2013

Mehmet Balman, "Advance Resource Provisioning in Bulk Data Scheduling", 27th IEEE International Conference on Advanced Information Networking and Applications (AINA), 2013, LBNL 6364E, doi: http://dx.doi.org/10.1109/AINA.2013.5

Today's scientific and business applications generate massive data sets that need to be transferred to remote sites for sharing, processing, and long term storage. Because of increasing data volumes and enhancement in current network technology that provide on-demand high-speed data access between collaborating institutions, data handling and scheduling problems have reached a new scale. In this paper, we present a new data scheduling model with advance resource provisioning, in which data movement operations are defined with earliest start and latest completion times. We analyze time-dependent resource assignment problem, and propose a new methodology to improve the current systems by allowing researchers and higher-level meta-schedulers to use data-placement as-a-service, so they can plan ahead and submit transfer requests in advance. In general, scheduling with time and resource conflicts is {NP-hard}. We introduce an efficient algorithm to organize multiple requests on the fly, while satisfying users' time and resource constraints. We successfully tested our algorithm in a simple benchmark simulator that we have developed, and demonstrated its performance with initial test results.

Keywords: scheduling with constraints, bulk data movement, time-dependent graphs, network reservation, Gale-Shapley algorithm

2012

Mehmet Balman, "MemzNet: Memory-Mapped Zero-copy Network Channel for Moving Large Datasets over 100Gbps Networks", technical poster in ACM/IEEE international Conference For High Performance Computing, Networking, Storage and Analysis (SC'12), LBNL 6175E, November 13, 2012, doi: http://doi.ieeecomputersociety.org/10.1109/SC.Companion.2012.294

High-bandwidth networks are poised to provide new opportunities in tackling large data challenges in today's scientific applications. However, increasing the bandwidth is not sufficient by itself; we need careful evaluation of future high-bandwidth networks from the applications' perspective. We have experimented with current state-of-the-art data movement tools, and realized that file-centric data transfer protocols do not perform well with managing the transfer of many small files in high-bandwidth networks, even when using parallel streams or concurrent transfers. We require enhancements in current middleware tools to take advantage of future networking frameworks. To improve performance and efficiency, we develop an experimental prototype, called MemzNet: Memory-mapped Zero-copy Network Channel, which uses a block-based data movement method in moving large scientific datasets. We have implemented MemzNet that takes the approach of aggregating files into blocks and providing dynamic data channel management. In this work, we present our initial results in 100Gbps networks.
http://dx.doi.org/10.1109/SC.Companion.2012.294               
http://dx.doi.org/10.1109/SC.Companion.2012.295

Mehmet Balman, "Streaming Exascale Data over 100Gbps Networks", IEEE Computing Now, November 8, 2012, LBNL 6173E,

Mehmet Balman, "Analyzing Data Movements and Identifying Techniques for Next-generation High-bandwidth Networks", LBNL Tech Report, 2012, LBNL 6177E,

High-bandwidth networks are poised to provide new opportunities in tackling large data challenges in today's scientific applications. However, increasing the bandwidth is not sufficient by itself; we need careful evaluation of future high-bandwidth networks from the applications’ perspective. We have investigated data transfer requirements of climate applications as a typical scientific example and evaluated how the scientific community can benefit from next generation high-bandwidth networks.  We develop a new block-based data movement method (in contrast to the current file-based methods) to improve data movement performance and efficiency in moving large scientific datasets that contain many small files. We implemented the new block-based data movement tool, which takes the approach of aggregating files into blocks and providing dynamic data channel management. One of the major obstacles in use of high-bandwidth networks is the limitation in host system resources. We have conducted a large number of experiments with our new block-based method and with current available file-based data movement tools.  In this white paper, we describe future research problems and challenges for efficient use of next-generation science networks, based on the lessons learnt and the experiences gained with 100Gbps network applications.

Mehmet Balman, Eric Pouyoul, Yushu Yao, E. Wes Bethel, Burlen Loring, Prabhat, John Shalf, Alex Sim, and Brian L. Tierney, "Experiences with 100G Network Applications", In Proceedings of the Fifth international Workshop on Data-intensive Distributed Computing, in conjunction with ACM High Performance Distributing Computing (HPDC) Conference, 2012, Delft, Netherlands, June 2012, LBNL 5603E, doi: 10.1145/2286996.2287004

100Gbps networking has finally arrived, and many research and educational in- stitutions have begun to deploy 100Gbps routers and services. ESnet and Internet2 worked together to make 100Gbps networks available to researchers at the Super- computing 2011 conference in Seattle Washington. In this paper, we describe two of the first applications to take advantage of this network. We demonstrate a visu- alization application that enables remotely located scientists to gain insights from large datasets. We also demonstrate climate data movement and analysis over the 100Gbps network. We describe a number of application design issues and host tuning strategies necessary for enabling applications to scale to 100Gbps rates. 

M. Balman, A. Sim, "Scaling the Earth System Grid to 100Gbps Networks", 2012, LBNL 5794E,

2011

Mehmet Balman, Suredra Byna, "Open Problems in network-aware data management in exa-scale computing and terabit networking era", In Proceedings of the First international Workshop on Network-Aware Data Management, in conjunction with ACM/IEEE international Conference For High Performance Computing, Networking, Storage and Analysis, 2011, Seattle, WA, November 11, 2011, LBNL 6176E, doi: http://dx.doi.org/10.1145/2110217.2110229

Accessing and managing large amounts of data is a great challenge in collaborative computing environments where resources and users are geographically distributed. Recent advances in network technology led to next-generation high- performance networks, allowing high-bandwidth connectiv- ity. Efficient use of the network infrastructure is necessary in order to address the increasing data and compute require- ments of large-scale applications. We discuss several open problems, evaluate emerging trends, and articulate our per- spectives in network-aware data management. 

T. Kosar, M. Balman, E. Yildirim, S. Kulasekaran, B. Ross, "Stork Data Scheduler: Mitigating the Data Bottleneck in e-Science", Philosophical Transactions of the Royal Society A, Vol.369 (2011), pp. 3254-3267, July 18, 2011, doi: 10.1098/rsta.2011.0148

In this paper, we present the Stork data scheduler as a solution for mitigating the data bottleneck in e-Science and data-intensive scientific discovery. Stork focuses on planning, scheduling, monitoring and management of data placement tasks and application-level end-to-end optimization of networked inputs/outputs for petascale distributed e-Science applications. Unlike existing approaches, Stork treats data resources and the tasks related to data access and movement as first-class entities just like computational resources and compute tasks, and not simply the side-effect of computation. Stork provides unique features such as aggregation of data transfer jobs considering their source and destination addresses, and an application-level throughput estimation and optimization service. We describe how these two features are implemented in Stork and their effects on end-to-end data transfer performance.

T. Kosar, I. Akturk, M. Balman, X. Wang, "PetaShare: A Reliable, Efficient, and Transparent Distributed Storage Management System", Journal Scientific Programming archive Volume 19 Issue 1, January 2011 Pages 27-43, 2011,

Modern collaborative science has placed increasing burden on data management infrastructure to handle the increasingly large data archives generated. Beside functionality, reliability and availability are also key factors in delivering a data management system that can efficiently and effectively meet the challenges posed and compounded by the unbounded increase in the size of data generated by scientific applications. We have developed a reliable and efficient distributed data storage system, PetaShare, which spans multiple institutions across the state of Louisiana. At the back-end, PetaShare provides a unified name space and efficient data movement across geographically distributed storage sites. At the front-end, it provides light-weight clients the enable easy, transparent and scalable access. In PetaShare, we have designed and implemented an asynchronously replicated multi-master metadata system for enhanced reliability and availability, and an advanced buffering system for improved data transfer performance. In this paper, we present the details of our design and implementation, show performance results, and describe our experience in developing a reliable and efficient distributed data management system for data-intensive science.

Dean N. Williams, Ian T. Foster, Don E. Middleton, Rachana Ananthakrishnan, Neill Miller, Mehmet Balman, Junmin Gu, Vijaya Natarajan, Arie Shoshani, Alex Sim, Gavin Bell, Robert Drach, Michael Ganzberger, Jim Ahrens, Phil Jones, Daniel Crichton, Luca Cinquini, David Brown, Danielle Harper, Nathan Hook, Eric Nienhouse, Gary Strand, Hannah Wilcox, Nathan Wilhelmi, Stephan Zednik, Steve Hankin, Roland Schweitzer, John Harney, Ross Miller, Galen Shipman, Feiyi Wang, Peter Fox, Patrick West, Stephan Zednik, Ann Chervenak, Craig Ward, "Earth System Grid Center for Enabling Technologies (ESG-CET): A Data Infrastructure for Data-Intensive Climate Research", SciDAC Conference, 2011,

2010

Alex Sim, Mehmet Balman, Dean N. Williams, Arie Shoshani, Vijaya Natarajan, "Adaptive Transfer Adjustment in Efficient Bulk Data Transfer Management for Climate Datasets", The 22nd IASTED International Conference on Parallel and Distributed Computing and System, Marina Del Rey, CA, November 20, 2010, LBNL 3985E,

Many scientific applications and experiments, such as high energy and nuclear physics, astrophysics, climate observation and modeling, combustion, nano-scale material sciences, and computational biology, generate extreme volumes of data with a large number of files. These data sources are distributed among national and international data repositories, and are shared by large numbers of geographically distributed scientists. A large portion of the data is frequently accessed, and a large volume of data is moved from one place to another for analysis and storage. A challenging issue in such efforts is the limited network capacity for moving large datasets. A tool that addresses this challenge is the Bulk Data Mover (BDM), a data transfer management tool used in the Earth System Grid (ESG) community. It has been managing massive dataset transfers efficiently in the environment where the network bandwidth is limited. Adaptive transfer adjustment was studied to enhance the BDM to handle significant end-to-end performance changes in the dynamic network environments as well as to control the data transfers for the desired transfer performance. We describe the results from our hands-on data transfer management experience in the climate research community. We study a practical transfer estimation model and state our initial results from the adaptive transfer adjustment methodology. 

Mehmet Balman, Evangelos Chaniotakis, Arie Shoshani, Alex Sim, "A Flexible Reservation Algorithm for Advance Network Provisioning", ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, November 2010 (SC'10)., New Orleans, LA, IEEE Computer Society Washington, DC, USA ISBN: 978-1-4244-7559-, November 14, 2010, LBNL 4017E, doi: http://dx.doi.org/10.1109/SC.2010.4

Many scientific applications need support from a communication infrastructure that provides predictable performance, which requires effective algorithms for bandwidth reservations. Network reservation sys- tems such as ESnet’s OSCARS, establish guaranteed bandwidth of secure virtual circuits for a certain bandwidth and length of time. However, users currently cannot inquire about bandwidth availability, nor have alternative suggestions when reservation requests fail. In general, the number of reservation options is exponential with the number of nodes n, and current reservation commitments. We present a novel approach for path finding in time-dependent networks taking advantage of user-provided parameters of total volume and time constraints, which produces options for earliest completion and shortest duration. The theoretical complexity is only O(n2r2) in the worst-case, where r is the number of reservations in the desired time interval. We have implemented our algorithm and developed efficient methodologies for incorporation into network reservation frameworks. Performance measurements confirm the theoretical predictions. 

M. Balman, E. Chaniotakis, A. Shoshani, A. Sim, "A New Approach in Advance Network Reservation and Provisioning for High-Performance Scientific Data Transfers", 2010, LBNL 4091E,

Mehmet Balman, Tevfik Kosar, "Error Detection and Error Classification: Failure Awareness in Data Transfer Scheduling,", International Journal of Autonomic Computing 2010 - Vol. 1, No.4 pp. 425 - 446, DOI: 10.1504/IJAC.2010.037516, 2010, doi: http://dx.doi.org/10.1504/IJAC.2010.037516

Data transfer in distributed environment is prone to frequent failures resulting from back-end system level problems, like connectivity failure which is technically untraceable by users. Error messages are not logged efficiently, and sometimes are not relevant/useful from users' point-of-view. Our study explores the possibility of efficient error detection and reporting system for such environments. Prior knowledge about the environment and awareness of the actual reason behind a failure would enable higher level planners to make better and accurate decisions. It is necessary to have well defined error detection and error reporting methods to increase the usability and serviceability of existing data transfer protocols and data management systems. We investigate the applicability of early error detection and error classification techniques and propose an error reporting framework and a failure-aware data transfer life cycle to improve arrangement of data transfer operations and to enhance decision making of data transfer schedulers.

John B. Bell

2009

E. W. Bethel, C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, M. Day, E. Deines, T. Fogal, C. Garth, C. G. R. Geddes, H. Hagen, B. Hamann, C. Hansen, J. Jacobsen, K. Joy, J. Kruger, J. Meredith, P. Messmer, G. Ostrouchov, V. Pascucci, K. Potter, Prabhat, D. Pugmire, O. Rubel, A. Sanderson, C. Silva, D. Ushizima, G. Weber, B. Whitlock, K. Wu, "Occam's Razor and Petascale Visual Data Analysis", SciDAC 2009, J. of Physics: Conference Series, San Diego, California, July 2009, LBNL 2210E,

Wes Bethel

2016

Utkarsh Ayachit, Andrew Bauer, Earl P. N. Duque, Greg Eisenhauer, Nicola Ferrier, Junmin Gu, Kenneth Jansen, Burlen Loring, Zarija Luki\ c, Suresh Menon, Dmitriy Morozov, Patrick O Leary, Michel Rasquin, Christopher P. Stone, Venkat Vishwanath, Gunther H. Weber, Brad Whitlock, Matthew Wolf, K. John Wu, E. Wes Bethel, "Performance Analysis, Design Considerations, and Applications of Extreme-scale In Situ Infrastructures", ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), Salt Lake City, UT, USA, 2016, doi: 10.1109/SC.2016.78

2013

E. Wes Bethel, Prabhat, Suren Byna, Oliver Rübel, K. John Wu, and Michael Wehner, "Why High Performance Visual Data Analytics is both Relevant and Difficult", Proceedings of Visualization and Data Analysis 2013, IS&T/SPIE Electronic Imaging 2013, San Francisco, CA, USA, SPIE, February 2013, LBNL LBNL-6063E,

Kesheng Wu, Wes Bethel, Ming Gu, David, Oliver R\ ubel, Testing VPIN on Big Data, Available at SSRN 2318259, 2013,

Kesheng Wu, Wes Bethel, Ming Gu, David, Oliver R\ ubel, "A Big Data Approach to Analyzing Market Volatility", Algorithmic Finance, 2013, 2:241--267, LBNL LBNL-6382E, doi: 10.3233/AF-13030

Understanding the microstructure of the financial market requires the processing of a vast amount of data related to individual trades, and sometimes even multiple levels of quotes. Analyzing such a large volume of data requires tremendous computing power that is not easily available to financial academics and regulators. Fortunately, public funded High Performance Computing (HPC) power is widely available at the National Laboratories in the US. In this paper we demonstrate that the HPC resource and the techniques for data-intensive sciences can be used to greatly accelerate the computation of an early warning indicator called Volume-synchronized Probability of Informed trading (VPIN). The test data used in this study contains five and a half year's worth of trading data for about 100 most liquid futures contracts, includes about 3 billion trades, and takes 140GB as text files. By using (1) a more efficient file format for storing the trading records, (2) more effective data structures and algorithms, and (3) parallelizing the computations, we are able to explore 16,000 different ways of computing VPIN in less than 20 hours on a 32-core IBM DataPlex machine. Our test demonstrates that a modest computer is sufficient to monitor a vast number of trading activities in real-time -- an ability that could be valuable to regulators.

Our test results also confirm that VPIN is a strong predictor of liquidity-induced volatility. With appropriate parameter choices, the false positive rates are about 7% averaged over all the futures contracts in the test data set. More specifically, when VPIN values rise above a threshold (CDF > 0.99), the volatility in the subsequent time windows is higher than the average in 93% of the cases.

2012

Surendra Byna, Jerry Chou, Oliver Rübel, Prabhat, Homa Karimabadi, William S. Daughton, Vadim Roytershteyn, E. Wes Bethel, Mark Howison, Ke-Jou Hsu, Kuan-Wu Lin, Arie Shoshani, Andrew Uselton, and Kesheng Wu, "Parallel I/O, Analysis, and Visualization of a Trillion Particle Simulation", SuperComputing 2012 (SC12), Salt Lake City, Utah, November 2012,

Mehmet Balman, Eric Pouyoul, Yushu Yao, E. Wes Bethel, Burlen Loring, Prabhat, John Shalf, Alex Sim, and Brian L. Tierney, "Experiences with 100G Network Applications", In Proceedings of the Fifth international Workshop on Data-intensive Distributed Computing, in conjunction with ACM High Performance Distributing Computing (HPDC) Conference, 2012, Delft, Netherlands, June 2012, LBNL 5603E, doi: 10.1145/2286996.2287004

100Gbps networking has finally arrived, and many research and educational in- stitutions have begun to deploy 100Gbps routers and services. ESnet and Internet2 worked together to make 100Gbps networks available to researchers at the Super- computing 2011 conference in Seattle Washington. In this paper, we describe two of the first applications to take advantage of this network. We demonstrate a visu- alization application that enables remotely located scientists to gain insights from large datasets. We also demonstrate climate data movement and analysis over the 100Gbps network. We describe a number of application design issues and host tuning strategies necessary for enabling applications to scale to 100Gbps rates. 

Prabhat, Oliver Rübel, Surendra Byna, Kesheng Wu, Fuyu Li, Michael Wehner and E. Wes Bethel, "TECA: A Parallel Toolkit for Extreme Climate Analysis", Procedia Computer Science, Proceedings of the International Conference on Computational Science, ICCS 2012, Presented at Third Worskhop on Data Mining in Earth System Science (DMESS 2012), Omaha, Nebraska, June 2012, 9:866–876, LBNL 5352E, doi: 10.1016/j.procs.2012.04.093

We present TECA, a parallel toolkit for detecting extreme events in large climate datasets. Modern climate datasets expose parallelism across a number of dimensions: spatial locations, timesteps and ensemble members. We design TECA to exploit these modes of parallelism and demonstrate a prototype implementation for detecting and tracking three classes of extreme events: tropical cyclones, extra-tropical cyclones and atmospheric rivers. We process a modern TB-sized CAM5 simulation dataset with TECA, and demonstrate good runtime performance for the three case studies.

E. W. Bethel, Surendra Byna, Jerry Chou, Cormier-Michel, Cameron G. R. Geddes, Howison, Fuyu Li, Prabhat, Ji Qiang, R\ ubel, Rob D. Ryne, Michael Wehner, Wu, "Big Data Analysis and Visualization: What Do LINACS Tropical Storms Have In Common?", 11th International Computational Accelerator Physics ICAP 2012, Germany, 2012,

E. W. Bethel and D. Leinweber and O. Rubel and K. Wu, "Federal Market Information Technology in the Post Flash Crash Era: Roles of Supercomputing", The Journal of Trading, 2012, 7:9-24, LBNL 5263E, doi: 10.3905/jot.2012.7.2.009

2011

E. Wes Bethel, David Leinweber, Oliver Rübel, Kesheng Wu, "Federal Market Information Technology in the Post Flash Crash Era: Roles of Supercomputing", Workshop on High Performance Computational Finance at SC11, Seattle, WA, USA, November 2011, LBNL 5263E,

Jerry Chou, Kesheng Wu, Oliver Rübel, Mark Howison, Ji Qiang, Prabhat, Brian Austin, E. Wes Bethel, Rob D. Ryne, and Arie Shoshani, "Parallel Index and Query for Large Scale Data Analysis", In Proceedings of Supercomputing 2011, Seattle, WA, USA, 2011, 1-11, LBNL 5317E, doi: 10.1145/2063384.2063424

Prabhat, Suren Byna. Chris Paciorek, Gunther Weber, Kesheng Wu, Thomas Yopes, Michael Wehner, William Collins, George Ostrouchov, Richard Strelitz, E. Wes Bethel, "Pattern Detection and Extreme Value Analysis on Large Climate Data", DOE/BER Climate and Earth System Modeling PI Meeting, September 2011,

J. Chou, K. Wu, O. R\ ubel, M. Howison, Qiang, Prabhat, B. Austin, E. W. Bethel, D. Ryne, A. Shoshani, "Parallel Index and Query for Large Scale Data", SC11, 2011, doi: 10.1145/2063384.2063424

2010

Oliver Rübel, Sean Ahern, E. Wes Bethel, Mark. D Biggin, Hank Childs, Estelle Cormier-Michel, Angela DePace, Michael B. Eisen, Charless C. Fowlkes, Cameron G. R. Geddes, Hans Hagen, Bernd Hamann, Min-Yu Huang, Soile V. E. Keränen, David W. Knowles, Cris L. Luengo Hendriks, Jitendra Malik, Jeremy Meredith, Peter Messmer, Prabhat, Daniela Ushizima, Gunther H. Weber, and Kesheng Wu, "Coupling Visualization and Data Analysis for Knowledge Discovery from Multi-dimensional Scientific Data", Procedia Computer Science, Proceedings of International Conference on Computational Science, ICCS 2010, June 2010, LBNL 3669E,

G. H. Weber, S. Ahern, E.W. Bethel, S. Borovikov, H.R. Childs, E. Deines, C. Garth, H. Hagen, B. Hamann, K.I. Joy, D. Martin, J. Meredith, Prabhat, D. Pugmire, O. Rübel, B. Van Straalen and K. Wu, "Recent Advances in VisIt: AMR Streamlines and Query-Driven Visualization", Numerical Modeling of Space Plasma Flows: Astronum-2009 (Astronomical Society of the Pacific Conference Series, 3185E, 2010, 429:329-334,

2009

Luke J. Gosink, Kesheng Wu, E. Wes Bethel, John D. Owens, Kenneth I. Joy, "Data Parallel Bin-Based Indexing for Answering Queries on Multi-core Architectures", SSDBM 2009, 2009, 110-129,

 

 

O. Rübel, C.G.R. Geddes, E. Cormier-Michel, K. Wu, Prabhat, G.H. Weber, D.M. Ushizima, P. Messmer, H. Hagen, B. Hamann, and E.W. Bethel, "Automatic Beam Path Analysis of laser Wakefield Particle Acceleration Data", IOP Computational Science & Discovery, November 2009, 2, LBNL 2734E,

E. W. Bethel, C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, M. Day, E. Deines, T. Fogal, C. Garth, C. G. R. Geddes, H. Hagen, B. Hamann, C. Hansen, J. Jacobsen, K. Joy, J. Kruger, J. Meredith, P. Messmer, G. Ostrouchov, V. Pascucci, K. Potter, Prabhat, D. Pugmire, O. Rubel, A. Sanderson, C. Silva, D. Ushizima, G. Weber, B. Whitlock, K. Wu, "Occam's Razor and Petascale Visual Data Analysis", SciDAC 2009, J. of Physics: Conference Series, San Diego, California, July 2009, LBNL 2210E,

E. Wes Bethel, Oliver Rübel, Prabhat, Kesheng Wu, Gunther H. Weber, Valerio Pascucci, Hank Childs, Ajith Mascarenhas, Jeremy Meredith, and Sean Ahern, "Modern Scientific Visualization is More than Just Pretty Pictures", Numerical Modeling of Space Plasma Flows: Astronum-2008 (Astronomical Society of the Pacific Conference Series, St. Thomas, USVI, June 2009, 301-317, LBNL 1450E,

Luke Gosink, Kesheng Wu, E. Wes Bethel, John D. Owens, Kenneth I. Joy, "Data Parallel Bin-based Indexing for Answering Queries on Multi-core Architecture", Proceedings of the 21st International Conference on Scientific and Statistical Database Management (SSDBM), June 2009, 5566:110-129, LBNL 2211E,

K Wu et al., "FastBit: Interactively Searching Massive Data", SciDAC 2009, 2009, LBNL 2164E, doi: 10.1088/1742-6596/180/1/012053

C. G. R. Geddes, E Cormier-Michel, E. H. Esarey, C. B. Schroeder, J.-L. Vay, W. P. Leemans, D. L.. Bruhwiler, J. R. Cary, B. Cowan, M. Durant, P. Hamill, P. Messmer, P. Mullowney, C. Nieter, K. Paul, S. Shasharina, S. Veitzer, G. Weber, O. Rübel, D. Ushizima, Prabhat, E. W.Bethel, K. Wu, Large Fields for Smaller Facility Sources, SciDAC Review, Pages: 13-21, 2009,

2008

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "High Performance Multivariate Visual Data Exploration for Extemely Large Data", Supercomputing (SC), Austin, Texas, USA, November 2008, LBNL 716E,

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "Application of High-performance Visual Analysis Methods to Laser Wakefield Particle Acceleration Data", IEEE Visualization 2008, October 2008,

Luke J. Gosink, Kesheng Wu, E. Wes Bethel, John D. Owens, Kenneth I. Joy, "Bin-Hash Indexing: A Parallel Method for Fast Query Processing", 2008, LBNL 729E,

2006

K. Wu, K. Stockinger, A. Shoshani, Wes, "FastBit--Helps Finding the Proverbial Needle in a", 2006, LBNL LBNL-PUB/963,

Luke Gosink, John Shalf, Kurt Stockinger, Wu, Wes Bethel, "HDF5-FastQuery: Accelerating Complex Queries on Datasets using Fast Bitmap Indices", SSDBM 2006, Vienna, Austria, July 2006, IEEE Computer Society Press., 2006, 149--158,

2005

Kurt Stockinger, John Shalf, Wes Bethel, Wu, "Query-Driven Visualization of Large Data Sets", IEEE Visualization 2005, Minneapolis, MN, October 2005, 2005, 22, doi: 10.1109/VIS.2005.84

E. Wes Bethel, Scott Campbell, Eli Dart, Lee, Steven A. Smith, Kurt Stockinger, Tierney, Kesheng Wu, "Interactive Analysis of Large Network Data Collections Query-Driven Visualization", 2005,

Surendra Byna

2017

Houjun Tang, Suren Byna, Bin Dong, Jialin Liu, and Quincey Koziol, "SoMeta: Scalable Object-centric Metadata Management for High Performance Computing", IEEE Cluster 2017, September 5, 2017,

Bin Dong, Kesheng Wu, Surendra Byna, Jialin Liu, Weijie Zhao, Florin Rusu, "ArrayUDF: User-Defined Scientific Data Analysis on Arrays", The ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC) 2017 (Acceptance rate:19%), June 26, 2017,

2016

Bin Dong, Suren Byna, Kesheng Wu, Prabhat, Hans Johansen, Jeffrey N. Johnson, and Noel Keen, "Data Elevator: Low-contention Data Movement in Hierarchical Storage System", The 23rd annual IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC) (Acceptance rate: 25%), December 19, 2016,

Wenzhao Zhang, Houjun Tang, Stephen Ranshous, Surendra Byna, Daniel F Martín, Kesheng Wu, Bin Dong, Scott Klasky, Nagiza F Samatova, "Exploring memory hierarchy and network topology for runtime AMR data sharing across scientific applications", 2016 IEEE International Conference on Big Data (Big Data) (Acceptance rate: 19.39% as short papers.), December 5, 2016,

M. Bryson, S. Byna (Advisor), A. Sim (Advisor), K. Wu (Advisor), "The Search for Missing Parallel IO Performance on the Cori Supercomputer", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), ACM Student Research Competition (SRC), 2016,

Bin Dong, Suren Byna, and Kesheng Wu,, "SDS-Sort: Scalable Dynamic Skew-aware Parallel Sorting", The ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC) 2016, July 1, 2016,

Md. Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Narayanan Sundaram, Jialin Liu, Peter Sadowski, Evan Racah, Suren Byna, Craig Tull, Wahid Bhimji, Prabhat, and Pradeep Dubey,, "PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures", 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS) 2016, Chicago, May 23, 2016,

Wahid Bhimji, Debbie Bard, Melissa Romanus, David Paul, Andrey Ovsyannikov, Brian Friesen, Matt Bryson, Joaquin Correa, Glenn K. Lockwood, Vakho Tsulaia, Suren Byna, Steve Farrell, Doga Gursoy, Chris Daley, Vince Beckner, Brian Van Straalen, Nicholas Wright, Katie Antypas, Prabhat,, "Accelerating Science with the NERSC Burst Buffer Early User Program", Cray User Group (CUG) 2016, May 10, 2016,

Cong Xu, Suren Byna, Vishwanath Venkatesan, Robert Sisneros, Omkar Kulkarni, Mohamad Chaarawi, and Kalyana Chadalavada, "LIOProf: Exposing Lustre File System Behavior for I/O Middleware", Cray User Group (CUG) 2016, May 10, 2016,

Dharshi Devendran, Suren Byna, Bin Dong, Brian van Straalen, Hans Johansen, Noel Keen, and Nagiza Samatova,, "Collective I/O Optimizations for Adaptive Mesh Refinement Data Writes on Lustre File System", Cray User Group (CUG) 2016, May 10, 2016,

Harinarayan Krishnan, Burlen Loring, Suren Byna, Michael F. Wehner, Travis A. O'Brien, Prabhat, Chris Paciorek, and Daithi Stone, "Enabling End-to-End Climate Science Workflows in High Performance Computing Environments", The AMS (American Meteorological Society) 96th Annual Meeting, January 6, 2016,

Burlen Loring, Suren Byna, Prabhat, Junmin Gu, Hari Krishnan, Michael Wehner, and Oliver Ruebel, "TECA an Extreme Event Detection and Climate Analysis Package for High Performance Computing", The AMS (American Meteorological Society) 96th Annual Meeting, January 6, 2016,

Xiaocheng Zou, David Boyuka, Dhara Desai, Martin, Suren Byna, Kesheng Wu, Kushal, Bin Dong, Wenzhao Zhang, Houjun Tang Dharshi Devendran, David Trebotich, Scott, Hans Johansen, Nagiza Samatova, "AMR-aware In Situ Indexing and Scalable Querying", The 24th High Performance Computing Symposium (HPC, January 1, 2016,

Houjun Tang, Suren Byna, Steve Harenberg, Xiaocheng Zou, Wenzhao Zhang, Kesheng Wu, Bin Dong, Oliver Rubel, Kristofer Bouchard, Scott Klasky, others, "Usage Pattern-Driven Dynamic Data Layout Reorganization", Cluster, Cloud and Grid Computing (CCGrid), 2016 16th IEEE/ACM International Symposium on, January 1, 2016, 356--365,

Wenzhao Zhang, Houjun Tang, Steve Harenberg, Surendra Byna, Xiaocheng Zou, Dharshi Devendran, Daniel F Martin, Kesheng Wu, Bin Dong, Scott Klasky, others, "AMRZone: A Runtime AMR Data Sharing Framework for Scientific Applications", Cluster, Cloud and Grid Computing (CCGrid), 2016 16th IEEE/ACM International Symposium on, January 1, 2016, 116--125,

2015

Hari Krishnan, Suren Byna, Michael Wehner, Junmin Gu, Travis O'Brien, Burlen Loring, Daithi Stone, William Collins, Prabhat, Yunjie Liu, Jeffrey Johnson, and Christopher Paciorek, "Enabling Efficient Climate Science Workflows in High Performance Computing Environments", AGU Fall Meeting, 2015, December 13, 2015,

Soyoung Jeon, Prabhat, Suren Byna, Junmin Gu, William Collins, and Michael Wehner,, "Characterization of extreme precipitation within atmospheric river events over California", Advances in Statistical Climatology, Meteorology and Oceanography (ASCMO), November 21, 2015, 1:45-57, doi: 10.5194/ascmo-1-45-2015

Md. Mostofa Ali Patwary, Suren Byna, Nadathur Rajagopalan Satish, Narayanan Sundaram, Zarija Lukic, Vadim Roytershteyn, Michael J. Anderson, Yushu Yao, Mr Prabhat, and Pradeep Dubey, "BD-CATS: Big Data Clustering at Trillion Particle Scale", Supercomputing 2015 (SC15), Supercomputing 2015 (SC15), November 17, 2015,

Babak Behzad, Suren Byna, Prabhat and Marc Snir, "Pattern-driven Parallel I/O Tuning", 10th Parallel Data Storage Workshop (PDSW) 2015, held in conjunction with SC15, 10th Parallel Data Storage Workshop (PDSW) 2015, to be held in conjunction with SC15, November 16, 2015,

Shane Snyder, Philip Carns, Robert Latham, Misbah Mubarak, Chris Carothers, Babak Behzad, Huong Vu Thanh Luu, Suren Byna, and Prabhat, "Techniques for Modeling Large-scale HPC I/O Workloads", the 6th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS15), in conjunction with SC15, the 6th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performa, November 15, 2015,

Bin Dong, Suren Byna, and Kesheng Wu, "Heavy-tailed Distribution of Parallel I/O System Response Time", 10th Parallel Data Storage Workshop (PDSW) 2015, to be held in conjunction with SC15, 2015,

Jinoh Kim, Bin Dong, Suren Byna, and Kesheng Wu, "Security for the Scientific Data Service Framework", 2nd International Workshop on Privacy and Security of Big Data (PSBD 2015), in conjunction with IEEE BigData 2015, 2015,

Bin Dong, Suren Byna, and Kesheng Wu, "Spatially Clustered Join on Heterogeneous Scientific Data Sets", 2015 IEEE International Conference on Big Data (IEEE BigData 2015), IEEE, 2015,

Prabhat, Suren Byna, Venkat Vishwanath, Eli Dart, Michael Wehner, and William Collins,, "TECA: Petscale Pattern Recognition for Climate Science", 16th International Conference on Computer Analysis of Images and Patterns (CAIP) 2015, 2015,

Babak Behzad, Suren Byna, Stefan Wild, Prabhat and Marc Snir, "Dynamic Model-driven Parallel I/O Performance Tuning", IEEE Cluster 2015, 2015,

Xiaocheng (Chris) Zou, Suren Byna, Hans Johansen, Daniel Martin, Nagiza F. Samatova, Arie Shoshani, John Wu, "Six-fold Speedup of Ice Calving Detection Achieved by AMR-aware Parallel Connected Component Labeling", SciDAC PI Meeting, July 2015, 2015,

H. Luu, M. Winslett, W. Gropp, R. Ross, P. Carns, K. Harms, Prabhat, S. Byna, Y. Yao,, "A Multi-platform Study of I/O Behavior on Petascale Supercomputers", The 24th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC) 2015, 2015,

Xiaocheng Zou, Kesheng Wu, David A. Boyuka, Daniel F. Martin, Suren Byna, Houjun, Kushal Bansal, Terry J. Ligocki, Hans Johansen, and Nagiza F. Samatova, "Parallel In Situ Detection of Connected Components Adaptive Mesh Refinement Data", Proceedings of the Cluster, Cloud and Grid Computing (CCGrid) 2015, 2015,

Suren Byna, Brian Austin, "Evaluation of Parallel I/O Performance and Energy Consumption with Frequency Scaling on Cray XC30", Cray User Group (CUG) meeting 2015, 2015,

Suren Byna, Robert Sisneros, Kalyana Chadalavada, Quincey Koziol, "Tuning Parallel I/O on Blue Waters for Writing 10 Trillion Particles", Cray User Group (CUG) meeting 2015, 2015,

2014

Soyoung Jeon, Christopher Paciorek, Prabhat, Surendra Byna, William Collins, Michael Wehner, "Uncertainty Quantification for Characterizing Spatial Tail Dependence under Statistical Framework", AGU, Fall Meeting 2014, 2014,

Babak Behzad, Surendra Byna, Stefan M. Wild, Mr. Prabhat, Marc Snir, "Improving Parallel I/O Autotuning with Performance Modeling", ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC 2014), New York, NY, USA, ACM, 2014, 253--256, doi: 10.1145/2600212.2600708

M Scot Breitenfeld, Kalyana Chadalavada, Robert Sisneros, Surendra Byna, Quincey Koziol, Neil Fortner, Prabhat, Venkat Vishwanath, "Recent Progress in Tuning Performance of Large-scale I/O with Parallel HDF5", The 9th Parallel Data Storage Workshop (PDSW) held in conjunction with SC14, 2014,

Hsuan-Te Chiu, Jerry Chou, Venkat Vishwanath, Surendra Byna, Kesheng Wu, "Simplifying index file structure to improve I/O performance of parallel indexing", Parallel and Distributed Systems (ICPADS), 2014 20th IEEE International Conference on, 2014, 576-583, doi: 10.1109/PADSW.2014.7097856

Ted Habermann, Andrew Collette, Steve Vincena, Jay Jay Billings, Matt Gerring, Konrad Hinsen, Werner Benger, Filipe RNC Maia, Suren Byna, Pierre de Buyl, "The Hierarchical Data Format (HDF): A Foundation for Sustainable Data and Software", 2nd Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2), in conjunction with Supercomputing 2014 (SC14), 2014,

Surendra Byna Jialin Liu, Yong Chen, "Model-driven Data Layout Selection for Improving Read Performance", In The Proceedings of The 2014 International Workshop on High Performance Data Intensive Computing (HPDIC2014), in conjunction with the 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS 14), 2014,

Jialin Liu, S. Byna, Bin Dong, Kesheng Wu, Chen, "Model-Driven Data Layout Selection for Improving Read", Parallel Distributed Processing Symposium Workshops 2014 IEEE International, 2014, 1708--1716, doi: 10.1109/IPDPSW.2014.190

Spyros Blanas, Kesheng Wu, Surendra Byna, Bin Dong, Arie Shoshani, "Parallel Data Analysis Directly on Scientific File Formats", SIGMOD 14, 2014, 385--396, doi: 10.1145/2588555.2612185

Bin Dong, S. Byna, Kesheng Wu, "Parallel query evaluation as a Scientific Data Service", Cluster Computing (CLUSTER), 2014 IEEE International Conference on, January 1, 2014, 194-202, doi: 10.1109/CLUSTER.2014.6968765

2013

Babak Behzad, Huong Vu Thanh Luu, Joseph Huchette, Surendra Byna, Prabhat, Ruth Aydt, Quincey Koziol, and Marc Snir, "Taming parallel I/O complexity with auto-tuning", In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '13), 2013,

Bin Dong; Byna, S.; Kesheng Wu, "Expediting scientific data analysis with reorganization of data", 2013 IEEE International Conference on Cluster Computing (CLUSTER), pp.1,8, 23-27 Sept. 2013, September 1, 2013,

Babak Behzad, Joseph Huchette, Huong Vu Thanh Luu, Ruth Aydt, Surendra Byna, Yushu Yao, Quincey Koziol, and Prabhat, "A framework for auto-tuning HDF5 applications", Proceedings of the 22nd international symposium on High-performance parallel and distributed computing (HPDC), 2013,

E. Wes Bethel, Prabhat, Suren Byna, Oliver Rübel, K. John Wu, and Michael Wehner, "Why High Performance Visual Data Analytics is both Relevant and Difficult", Proceedings of Visualization and Data Analysis 2013, IS&T/SPIE Electronic Imaging 2013, San Francisco, CA, USA, SPIE, February 2013, LBNL LBNL-6063E,

Kuan-Wu Lin, Surendra Byna, Jerry Chou, Wu, "Optimizing FastQuery performance on Lustre file", Proceedings of the 25th International Conference on and Statistical Database Management, 2013, 29,

B. Dong, S. Byna, K. Wu, "SDS: a framework for scientific data services", Proceedings of the 8th Parallel Data Storage, January 1, 2013, doi: http://dx.doi.org/10.1145/2538542.2538563

2012

Babak Behzad, Joey Huchette, Huong Luu, Ruth Aydt, Quincey Koziol, Prabhat, Suren Byna, Mohamad Chaarawi, Yushu Yao, "Auto-Tuning of Parallel IO Parameters for HDF5 Applications", Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, 2012,

Surendra Byna, Jerry Chou, Oliver Rübel, Prabhat, Homa Karimabadi, William S. Daughton, Vadim Roytershteyn, E. Wes Bethel, Mark Howison, Ke-Jou Hsu, Kuan-Wu Lin, Arie Shoshani, Andrew Uselton, and Kesheng Wu, "Parallel I/O, Analysis, and Visualization of a Trillion Particle Simulation", SuperComputing 2012 (SC12), Salt Lake City, Utah, November 2012,

Prabhat, Oliver Rübel, Surendra Byna, Kesheng Wu, Fuyu Li, Michael Wehner and E. Wes Bethel, "TECA: A Parallel Toolkit for Extreme Climate Analysis", Procedia Computer Science, Proceedings of the International Conference on Computational Science, ICCS 2012, Presented at Third Worskhop on Data Mining in Earth System Science (DMESS 2012), Omaha, Nebraska, June 2012, 9:866–876, LBNL 5352E, doi: 10.1016/j.procs.2012.04.093

We present TECA, a parallel toolkit for detecting extreme events in large climate datasets. Modern climate datasets expose parallelism across a number of dimensions: spatial locations, timesteps and ensemble members. We design TECA to exploit these modes of parallelism and demonstrate a prototype implementation for detecting and tracking three classes of extreme events: tropical cyclones, extra-tropical cyclones and atmospheric rivers. We process a modern TB-sized CAM5 simulation dataset with TECA, and demonstrate good runtime performance for the three case studies.

Y. Yin, S. Byna, H. Song, X.-H. Sun, and R. Thakur, "Boosting Application-Specific Parallel I/O Optimization Using IOSIG", IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Ottowa, Canada, May 13, 2012,

E. W. Bethel, Surendra Byna, Jerry Chou, Cormier-Michel, Cameron G. R. Geddes, Howison, Fuyu Li, Prabhat, Ji Qiang, R\ ubel, Rob D. Ryne, Michael Wehner, Wu, "Big Data Analysis and Visualization: What Do LINACS Tropical Storms Have In Common?", 11th International Computational Accelerator Physics ICAP 2012, Germany, 2012,

2011

Suren Byna, Prabhat, Michael F. Wehner and Kesheng Wu, "Detecting Atmospheric Rivers in Large Climate Datasets", Proceedings of the 2nd International Workshop on Petascale Data Analytics: Challenges, and Opportunities (PDAC-11/ Supercomputing11/ ACM/IEEE), November 14, 2011, Seattle, Washington, 2011, doi: 10.1145/2110205.2110208

Extreme precipitation events on the western coast of North America are often traced to an unusual weather phenomenon known as atmospheric rivers. Although these storms may provide a significant fraction of the total water to the highly managed western US hydrological system, the resulting intense weather poses severe risks to the human and natural infrastructure through severe flooding and wind damage. To aid the understanding of this phenomenon, we have developed an efficient detection algorithm suitable for analyzing large amounts of data. In addition to detecting actual events in the recent observed historical record, this detection algorithm can be applied to global climate model output providing a new model validation methodology. Comparing the statistical behavior of simulated atmospheric river events in models to observations will enhance confidence in projections of future extreme storms. Our detection algorithm is based on a thresholding condition on the total column integrated water vapor established by Ralph et al. (2004) followed by a connected component labeling procedure to group the mesh points into connected regions in space. We develop an efficient parallel implementation of the algorithm and demonstrate good weak and strong scaling. We process a 30-year simulation output on 10,000 cores in under 3 seconds.

Mehmet Balman, Suredra Byna, "Open Problems in network-aware data management in exa-scale computing and terabit networking era", In Proceedings of the First international Workshop on Network-Aware Data Management, in conjunction with ACM/IEEE international Conference For High Performance Computing, Networking, Storage and Analysis, 2011, Seattle, WA, November 11, 2011, LBNL 6176E, doi: http://dx.doi.org/10.1145/2110217.2110229

Accessing and managing large amounts of data is a great challenge in collaborative computing environments where resources and users are geographically distributed. Recent advances in network technology led to next-generation high- performance networks, allowing high-bandwidth connectiv- ity. Efficient use of the network infrastructure is necessary in order to address the increasing data and compute require- ments of large-scale applications. We discuss several open problems, evaluate emerging trends, and articulate our per- spectives in network-aware data management. 

Prabhat, Suren Byna. Chris Paciorek, Gunther Weber, Kesheng Wu, Thomas Yopes, Michael Wehner, William Collins, George Ostrouchov, Richard Strelitz, E. Wes Bethel, "Pattern Detection and Extreme Value Analysis on Large Climate Data", DOE/BER Climate and Earth System Modeling PI Meeting, September 2011,

Kesheng Wu, Surendra Byna, Doron Rotem, Arie, "Scientific Data Services -- A High-Performance I/O with Array Semantics", HPCDB, IEEE, 2011, doi: 10.11v45/2125636.2125640

Henry Childs

2012

Allen R. Sanderson, Brad Whitlock, Oliver, Hank Childs, Gunther H. Weber, , Kesheng Wu, "A System for Query Based Analysis and Visualization", Third International Eurovis Workshop on Visual EuroVA 2012, Vienna, Austria, January 2012, LBNL 5507E,

2010

Oliver Rübel, Sean Ahern, E. Wes Bethel, Mark. D Biggin, Hank Childs, Estelle Cormier-Michel, Angela DePace, Michael B. Eisen, Charless C. Fowlkes, Cameron G. R. Geddes, Hans Hagen, Bernd Hamann, Min-Yu Huang, Soile V. E. Keränen, David W. Knowles, Cris L. Luengo Hendriks, Jitendra Malik, Jeremy Meredith, Peter Messmer, Prabhat, Daniela Ushizima, Gunther H. Weber, and Kesheng Wu, "Coupling Visualization and Data Analysis for Knowledge Discovery from Multi-dimensional Scientific Data", Procedia Computer Science, Proceedings of International Conference on Computational Science, ICCS 2010, June 2010, LBNL 3669E,

G. H. Weber, S. Ahern, E.W. Bethel, S. Borovikov, H.R. Childs, E. Deines, C. Garth, H. Hagen, B. Hamann, K.I. Joy, D. Martin, J. Meredith, Prabhat, D. Pugmire, O. Rübel, B. Van Straalen and K. Wu, "Recent Advances in VisIt: AMR Streamlines and Query-Driven Visualization", Numerical Modeling of Space Plasma Flows: Astronum-2009 (Astronomical Society of the Pacific Conference Series, 3185E, 2010, 429:329-334,

2009

E. W. Bethel, C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, M. Day, E. Deines, T. Fogal, C. Garth, C. G. R. Geddes, H. Hagen, B. Hamann, C. Hansen, J. Jacobsen, K. Joy, J. Kruger, J. Meredith, P. Messmer, G. Ostrouchov, V. Pascucci, K. Potter, Prabhat, D. Pugmire, O. Rubel, A. Sanderson, C. Silva, D. Ushizima, G. Weber, B. Whitlock, K. Wu, "Occam's Razor and Petascale Visual Data Analysis", SciDAC 2009, J. of Physics: Conference Series, San Diego, California, July 2009, LBNL 2210E,

E. Wes Bethel, Oliver Rübel, Prabhat, Kesheng Wu, Gunther H. Weber, Valerio Pascucci, Hank Childs, Ajith Mascarenhas, Jeremy Meredith, and Sean Ahern, "Modern Scientific Visualization is More than Just Pretty Pictures", Numerical Modeling of Space Plasma Flows: Astronum-2008 (Astronomical Society of the Pacific Conference Series, St. Thomas, USVI, June 2009, 301-317, LBNL 1450E,

K Wu et al., "FastBit: Interactively Searching Massive Data", SciDAC 2009, 2009, LBNL 2164E, doi: 10.1088/1742-6596/180/1/012053

2008

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "High Performance Multivariate Visual Data Exploration for Extemely Large Data", Supercomputing (SC), Austin, Texas, USA, November 2008, LBNL 716E,

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "Application of High-performance Visual Analysis Methods to Laser Wakefield Particle Acceleration Data", IEEE Visualization 2008, October 2008,

Marcus S. Day

2009

E. W. Bethel, C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, M. Day, E. Deines, T. Fogal, C. Garth, C. G. R. Geddes, H. Hagen, B. Hamann, C. Hansen, J. Jacobsen, K. Joy, J. Kruger, J. Meredith, P. Messmer, G. Ostrouchov, V. Pascucci, K. Potter, Prabhat, D. Pugmire, O. Rubel, A. Sanderson, C. Silva, D. Ushizima, G. Weber, B. Whitlock, K. Wu, "Occam's Razor and Petascale Visual Data Analysis", SciDAC 2009, J. of Physics: Conference Series, San Diego, California, July 2009, LBNL 2210E,

Dharshi Devendran

2016

Dharshi Devendran, Suren Byna, Bin Dong, Brian van Straalen, Hans Johansen, Noel Keen, and Nagiza Samatova,, "Collective I/O Optimizations for Adaptive Mesh Refinement Data Writes on Lustre File System", Cray User Group (CUG) 2016, May 10, 2016,

Xiaocheng Zou, David Boyuka, Dhara Desai, Martin, Suren Byna, Kesheng Wu, Kushal, Bin Dong, Wenzhao Zhang, Houjun Tang Dharshi Devendran, David Trebotich, Scott, Hans Johansen, Nagiza Samatova, "AMR-aware In Situ Indexing and Scalable Querying", The 24th High Performance Computing Symposium (HPC, January 1, 2016,

Bin Dong

2017

Tzu-Hsien Wu, Jerry Chou, Shyng Hao, Bin Dong, KeshengWu, Scott Klasky, "Optimizing the Query Performance of Block Index Through Data Analysis and I/O Modeling", The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'17), November 13, 2017,

Houjun Tang, Suren Byna, Bin Dong, Jialin Liu, and Quincey Koziol, "SoMeta: Scalable Object-centric Metadata Management for High Performance Computing", IEEE Cluster 2017, September 5, 2017,

Bin Dong, Kesheng Wu, Surendra Byna, Jialin Liu, Weijie Zhao, Florin Rusu, "ArrayUDF: User-Defined Scientific Data Analysis on Arrays", The ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC) 2017 (Acceptance rate:19%), June 26, 2017,

Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, and Peter Nugent, "Incremental View Maintenance over Array Data", In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17) (Acceptance rate: 20%). ACM, New York, NY, USA, May 14, 2017,

2016

Bin Dong, Suren Byna, Kesheng Wu, Prabhat, Hans Johansen, Jeffrey N. Johnson, and Noel Keen, "Data Elevator: Low-contention Data Movement in Hierarchical Storage System", The 23rd annual IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC) (Acceptance rate: 25%), December 19, 2016,

Wenzhao Zhang, Houjun Tang, Stephen Ranshous, Surendra Byna, Daniel F Martín, Kesheng Wu, Bin Dong, Scott Klasky, Nagiza F Samatova, "Exploring memory hierarchy and network topology for runtime AMR data sharing across scientific applications", 2016 IEEE International Conference on Big Data (Big Data) (Acceptance rate: 19.39% as short papers.), December 5, 2016,

Bin Dong, Suren Byna, and Kesheng Wu,, "SDS-Sort: Scalable Dynamic Skew-aware Parallel Sorting", The ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC) 2016, July 1, 2016,

Tzuhsien Wu, Shyng Hao, Jerry Chou, Bin Dong and Kesheng Wu, "Indexing Blocks to Reduce Space and Time Requirements for Searching Large Data Files", 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) 2016, May 16, 2016,

Xiaocheng Zou, David Boyuka, Dhara Desai, Martin, Suren Byna, Kesheng Wu, Kushal, Bin Dong, Wenzhao Zhang, Houjun Tang Dharshi Devendran, David Trebotich, Scott, Hans Johansen, Nagiza Samatova, "AMR-aware In Situ Indexing and Scalable Querying", The 24th High Performance Computing Symposium (HPC, January 1, 2016,

Houjun Tang, Suren Byna, Steve Harenberg, Xiaocheng Zou, Wenzhao Zhang, Kesheng Wu, Bin Dong, Oliver Rubel, Kristofer Bouchard, Scott Klasky, others, "Usage Pattern-Driven Dynamic Data Layout Reorganization", Cluster, Cloud and Grid Computing (CCGrid), 2016 16th IEEE/ACM International Symposium on, January 1, 2016, 356--365,

Wenzhao Zhang, Houjun Tang, Steve Harenberg, Surendra Byna, Xiaocheng Zou, Dharshi Devendran, Daniel F Martin, Kesheng Wu, Bin Dong, Scott Klasky, others, "AMRZone: A Runtime AMR Data Sharing Framework for Scientific Applications", Cluster, Cloud and Grid Computing (CCGrid), 2016 16th IEEE/ACM International Symposium on, January 1, 2016, 116--125,

2015

Bin Dong, Suren Byna, and Kesheng Wu, "Heavy-tailed Distribution of Parallel I/O System Response Time", 10th Parallel Data Storage Workshop (PDSW) 2015, to be held in conjunction with SC15, 2015,

Jinoh Kim, Bin Dong, Suren Byna, and Kesheng Wu, "Security for the Scientific Data Service Framework", 2nd International Workshop on Privacy and Security of Big Data (PSBD 2015), in conjunction with IEEE BigData 2015, 2015,

Bin Dong, Suren Byna, and Kesheng Wu, "Spatially Clustered Join on Heterogeneous Scientific Data Sets", 2015 IEEE International Conference on Big Data (IEEE BigData 2015), IEEE, 2015,

2014

Bin Dong, Xiuqiao Li, Limin Xiao, Li Ruan, "Towards minimizing disk I/O contention: A partitioned file assignment approach", Future Generation Computer Systems, Volume 37, July 2014, Pages 178-190, 2014,

Jialin Liu, S. Byna, Bin Dong, Kesheng Wu, Chen, "Model-Driven Data Layout Selection for Improving Read", Parallel Distributed Processing Symposium Workshops 2014 IEEE International, 2014, 1708--1716, doi: 10.1109/IPDPSW.2014.190

Bin Dong, S. Byna, Kesheng Wu, "Parallel query evaluation as a Scientific Data Service", Cluster Computing (CLUSTER), 2014 IEEE International Conference on, January 1, 2014, 194-202, doi: 10.1109/CLUSTER.2014.6968765

2013

Bin Dong; Byna, S.; Kesheng Wu, "Expediting scientific data analysis with reorganization of data", 2013 IEEE International Conference on Cluster Computing (CLUSTER), pp.1,8, 23-27 Sept. 2013, September 1, 2013,

B. Dong, S. Byna, K. Wu, "SDS: a framework for scientific data services", Proceedings of the 8th Parallel Data Storage, January 1, 2013, doi: http://dx.doi.org/10.1145/2538542.2538563

2012

Bin Dong, Xiuqiao Li, Qimeng Wu, Limin Xiao, Li Ruan, "A dynamic and adaptive load balancing strategy for parallel file system with large-scale I/O servers", Journal of Parallel and Distributed Computing (JPDC), Volume 72, Issue 10, October 2012, Pages 1254-1268, 2012,

Bin Dong, Xiuqiao Li, Limin Xiao, Li Ruan, "A New File-Specific Stripe Size Selection Method for Highly Concurrent Data Access", The 13th ACM/IEEE International Conference on Grid Computing (Grid 2012), 2012, 2012,

Junmin Gu

2016

Utkarsh Ayachit, Andrew Bauer, Earl P. N. Duque, Greg Eisenhauer, Nicola Ferrier, Junmin Gu, Kenneth Jansen, Burlen Loring, Zarija Luki\ c, Suresh Menon, Dmitriy Morozov, Patrick O Leary, Michel Rasquin, Christopher P. Stone, Venkat Vishwanath, Gunther H. Weber, Brad Whitlock, Matthew Wolf, K. John Wu, E. Wes Bethel, "Performance Analysis, Design Considerations, and Applications of Extreme-scale In Situ Infrastructures", ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), Salt Lake City, UT, USA, 2016, doi: 10.1109/SC.2016.78

D. Pugmire, J. Kress, H. Childs, M. Wolf, G. Eisenhauer, J. Low, R. M. Churchill, T. Kurc, K. Wu, A. Sim, J. Gu, J. Choi, S. Klasky, "Visualization and Analysis for Near-Real-Time Decision Making in Distributed Workflows", High Performance Data Analysis and Visualization Workshop (HPDAV2016) in conjunction with the 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2016), 2016, doi: 10.1109/IPDPSW.2016.175

Burlen Loring, Suren Byna, Prabhat, Junmin Gu, Hari Krishnan, Michael Wehner, and Oliver Ruebel, "TECA an Extreme Event Detection and Climate Analysis Package for High Performance Computing", The AMS (American Meteorological Society) 96th Annual Meeting, January 6, 2016,

2014

A. L. Chervenak, A. Sim, J. Gu, R. Schuler, N. Hirpathak, "Adaptation and Policy-Based Resource Allocation for Efficient Bulk Data Transfers in High Performance Computing Environments", 4th International Workshop on Network-aware Data Management (NDM'14), 2014,

A. L. Chervenak, A. Sim, J. Gu, R. Schuler, N. Hirpathak, "Efficient Data Staging Using Performance-Based Adaptation and Policy-Based Resource Allocation", 22nd Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2014,

2012

Junmin Gu, David Smith, Ann L. Chervenak, Alex Sim, "Adaptive Data Transfers that Utilize Policies for Resource Sharing", The 2nd International Workshop on Network-Aware Data Management Workshop (NDM2012), 2012,

D. Yu, D. Katramatos, A. Shoshani, A. Sim, J. Gu, V. Natarajan, "StorNet: Integrating Storage Resource Management with Dynamic Network Provisioning for Automated Data Transfer", International Committee for Future Accelerators (ICFA) Standing Committee on Inter-Regional Connectivity (SCIC) 2012 Report: Networking for High Energy Physics, 2012,

2011

J. Gu, D. Katramatos, X. Liu, V. Natarajan, A. Shoshani, A. Sim, D. Yu, S. Bradley, S. McKee, "StorNet: Integrated Dynamic Storage and Network Resource Provisioning and Management for Automated Data Transfers", Journal of Physics: Conf. Ser., 2011, 331, doi: 10.1088/1742- 6596/331/1/012002

G. Garzoglio, J. Bester, K. Chadwick, D. Dykstra, D. Groep, J. Gu, T. Hesselroth, O. Koeroo, T. Levshina, S. Martin, M. Salle, N. Sharma, A. Sim, S. Timm, A. Verstegen, "Adoption of a SAML-XACML Profile for Authorization Interoperability across Grid Middleware in OSG and EGEE", Journal of Physics: Conf. Ser., 2011, 331, doi: 10.1088/1742-6596/331/6/062011

Junmin Gu, Dimitrios Katramatos, Xin Liu, Vijaya Natarajan, Arie Shoshani, Alex Sim, Dantong Yu, Scott Bradley, Shawn McKee, "StorNet: Co-Scheduling of End-to-End Bandwidth Reservation on Storage and Network Systems for High Performance Data Transfers", IEEE INFOCOM HSN 2011, 2011,

Dean N. Williams, Ian T. Foster, Don E. Middleton, Rachana Ananthakrishnan, Neill Miller, Mehmet Balman, Junmin Gu, Vijaya Natarajan, Arie Shoshani, Alex Sim, Gavin Bell, Robert Drach, Michael Ganzberger, Jim Ahrens, Phil Jones, Daniel Crichton, Luca Cinquini, David Brown, Danielle Harper, Nathan Hook, Eric Nienhouse, Gary Strand, Hannah Wilcox, Nathan Wilhelmi, Stephan Zednik, Steve Hankin, Roland Schweitzer, John Harney, Ross Miller, Galen Shipman, Feiyi Wang, Peter Fox, Patrick West, Stephan Zednik, Ann Chervenak, Craig Ward, "Earth System Grid Center for Enabling Technologies (ESG-CET): A Data Infrastructure for Data-Intensive Climate Research", SciDAC Conference, 2011,

2009

M. Riedel, E. Laure, Th. Soddemann, L. Field, J. P. Navarro, J. Casey, M. Litmaath, J. Ph. Baud, B. Koblitz, C. Catlett, D. Skow, C. Zheng, P. M. Papadopoulos, M. Katz, N. Sharma, O. Smirnova, B. Kónya, P. Arzberger, F. Würthwein, A. S. Rana, T. Martin, M. Wan, V. Welch, T. Rimovsky, S. Newhouse, A. Vanni, Y. Tanaka, Y. Tanimura, T. Ikegami, D. Abramson, C. Enticott, G. Jenkins, R. Pordes, N. Sharma, S. Timm, N. Sharma, G. Moont, M. Aggarwal, D. Colling, O. van der Aa, A. Sim, V. Natarajan, A. Shoshani, J. Gu, S. Chen, G. Galang, R. Zappi, L. Magnoni, V. Ciaschini, M. Pace, V. Venturi, M. Marzolla, P. Andreetto, B. Cowles, S. Wang, Y. Saeki, H. Sato, S. Matsuoka, P. Uthayopas, S. Sriprayoonsakul, O. Koeroo, M. Viljoen, L. Pearlman, S. Pickles, David Wallom, G. Moloney, J. Lauret, J. Marsteller, P. Sheldon, S. Pathak, S. De Witt, J. Mencák, J. Jensen, M. Hodges, D. Ross, S. Phatanapherom, G. Netzer, A. R. Gregersen, M. Jones, S. Chen, P. Kacsuk, A. Streit, D. Mallmann, F. Wolf, T. Lippert, Th. Delaitre, E. Huedo, N. Geddes, "Interoperation of world-wide production e-Science infrastructures", Concurrency and Computation: Practice and Experience, 2009, 21(8):961-990,

Arie Shoshani, Flavia Donno, Junmin Gu, Jason Hick, Maarten Litmaath, Alex Sim, "Dynamic Storage Management", Scientific Data Management: Challenges, Technology, and Deployment, edited by Arie Shoshani, Doron Rotem, (Chapman & Hall/CRC Computational Science: 2009)

K Wu et al., "FastBit: Interactively Searching Massive Data", SciDAC 2009, 2009, LBNL 2164E, doi: 10.1088/1742-6596/180/1/012053

2008

P. Jakl, J. Lauret, A. Hanushevsky, A. Shoshani, A. Sim, J. Gu, "Grid data access on widely distributed worker nodes using scalla and SRM", Journal of Physics: Conf. Ser., 2008, 119, doi: 10.1088/1742-6596/119/7/072019

Alex Sim, Arie Shoshani (Editors), Paolo Badino, Olof Barring, Jean‐Philippe Baud, Ezio Corso, Shaun De Witt, Flavia Donno, Junmin Gu, Michael Haddox‐Schatz, Bryan Hess, Jens Jensen, Andy Kowalski, Maarten Litmaath, Luca Magnoni, Timur Perelmutov, Don Petravick, Chip Watson, The Storage Resource Manager Interface Specification Version 2.2, Open Grid Forum, Document in Full Recommendation, GFD.129, 2008,

2007

L. Abadie, P. Badino, J. Baud, E. Corso, M. Crawford, S. De Witt, F. Donno, A. Forti, P. Fuhrmann,
G. Grosdidier, J. Gu , J. Jensen, S. Lemaitre, M. Litmaath, D. Litvinsev, G. Lo Presti, L. Magnoni, T. Mkrtchan, A. Moibenko, V. Natarajan, G. Oleynik, T. Perelmutov, D. Petravick, A. Shoshani, A. Sim, M. Sponza, R. Zappi,
"Storage Resource Managers: Recent International Experience on Requirements and Multiple Co-Operating Implementations", the 24th IEEE Conference on Mass Storage Systems and Technologies, 2007,

F. Donno, L. Abadie, P. Badino, J. Baud, E. Corso, M. Crawford, S. De Witt, A. Forti, P. Fuhrmann, G. Grosdidier, J. Gu , J. Jensen, S. Lemaitre, M. Litmaath, D. Litvinsev, G. Lo Presti, L. Magnoni, T. Mkrtchan, A. Moibenko, V. Natarajan, G. Oleynik, T. Perelmutov, D. Petravick, A. Shoshani, A. Sim, M. Sponza, R. Zappi, "Storage Resource Manager version 2.2: design, implementation, and testing experience", Journal of Physics: Conf. Ser., 2007, 119, doi: 10.1088/1742-6596/119/6/062028

2005

Kesheng Wu, Junmin Gu, Jerome Lauret, Arthur Poskanzer, Arie Shoshani, Alexander Sim, Zhang, "Grid Collector: Facilitating Efficient Selective from Data Grids", International Supercomputer Conference 2005, 2005,

2004

K. Wu, W. Zhang, A. Sim, J. Gu, A. Shoshani, "Grid Collector: an Event Catalog with Automated File Management", 2004, LBNL 55563,

Alex Sim, Junmin Gu, Arie Shoshani, Vijaya Natarajan, "DataMover: Robust Terabytes-Scale Multi-file Replication over Wide-Area Networks", the 16th International Conference on Scientific and Statistical Database Management (SSDBM 2004), 2004,

2003

Arie Shoshani, Alexander Sim, Junmin Gu, "Storage Resource Managers: Essential Components for the Grid", Grid Resource Management: State of the Art and Future Trends, edited by Jarek Nabrzyski, Jennifer M. Schopf, Jan Weglarz, (Kluwer Academic Publishers: 2003)

A. Sim, J. Gu, A. Shoshani, E. Hjort, D. Olson, "Experience with Deploying Storage Resource Managers to Achieve Robust File Replication", Computing in High Energy Physics, 2003,

Arie Shoshani, Alex Sim, Junmin Gu, Storage Resource Managers: Essential Components for Grid Applications, Globus World, 2003,

Kesheng Wu, Wei-Ming Zhang, Alexander Sim, Gu, Arie Shoshani, "Grid Collector: An Event Catalog With Automated File", Proceedings of IEEE Nuclear Science Symposium 2003, 2003, doi: 10.1109/NSSMIC.2003.1351830

2002

A. Shoshani, A. Sim, J. Gu, "Storage Resource Managers: Middleware components for Grid Storage", the 19th IEEE Symposium on Mass Storage Systems, 2002,

Ming Gu

2013

William Gu, Jaesik Choi, Ming Gu, Horst Simon, Kesheng Wu, "Fast Change Point Detection for Electricity Market Analysis", October 6, 2013, LBNL LBNL-6388E,

Kesheng Wu, Wes Bethel, Ming Gu, David, Oliver R\ ubel, Testing VPIN on Big Data, Available at SSRN 2318259, 2013,

Kesheng Wu, Wes Bethel, Ming Gu, David, Oliver R\ ubel, "A Big Data Approach to Analyzing Market Volatility", Algorithmic Finance, 2013, 2:241--267, LBNL LBNL-6382E, doi: 10.3233/AF-13030

Understanding the microstructure of the financial market requires the processing of a vast amount of data related to individual trades, and sometimes even multiple levels of quotes. Analyzing such a large volume of data requires tremendous computing power that is not easily available to financial academics and regulators. Fortunately, public funded High Performance Computing (HPC) power is widely available at the National Laboratories in the US. In this paper we demonstrate that the HPC resource and the techniques for data-intensive sciences can be used to greatly accelerate the computation of an early warning indicator called Volume-synchronized Probability of Informed trading (VPIN). The test data used in this study contains five and a half year's worth of trading data for about 100 most liquid futures contracts, includes about 3 billion trades, and takes 140GB as text files. By using (1) a more efficient file format for storing the trading records, (2) more effective data structures and algorithms, and (3) parallelizing the computations, we are able to explore 16,000 different ways of computing VPIN in less than 20 hours on a 32-core IBM DataPlex machine. Our test demonstrates that a modest computer is sufficient to monitor a vast number of trading activities in real-time -- an ability that could be valuable to regulators.

Our test results also confirm that VPIN is a strong predictor of liquidity-induced volatility. With appropriate parameter choices, the false positive rates are about 7% averaged over all the futures contracts in the test data set. More specifically, when VPIN values rise above a threshold (CDF > 0.99), the volatility in the subsequent time windows is higher than the average in 93% of the cases.

Daniel Gunter

2010

A. Sim, D. Gunter, V. Natarajan, A. Shoshani, D. Williams, J. Long, J. Hick, J. Lee, E. Dart, "Efficient Bulk Data Replication for the Earth System Grid", Data Driven E-science: Use Cases and Successful Applications of Distributed Computing Infrastructures (ISGC 2010), (Springer-Verlag New York Inc: 2010) Pages: 435

Raj Kettimuthu, Alex Sim, Dan Gunter, Bill Allcock, Peer T. Bremer, John Bresnahan, Andrew Cherry, Lisa Childers, Eli Dart, Ian Foster, Kevin Harms, Jason Hick, Jason Lee, Michael Link, Jeff Long, Keith Miller, Vijaya Natarajan, Valerio Pascucci, Ken Raffenetti, David Ressman, Dean Williams, Loren Wilson, Linda Winkler, "Lessons learned from moving earth system grid data sets over a 20 Gbps wide-area network", HPDC 10, New York, NY, USA, ACM, 2010, 316--319, doi: 10.1145/1851476.1851519

Mark Howison

2012

Surendra Byna, Jerry Chou, Oliver Rübel, Prabhat, Homa Karimabadi, William S. Daughton, Vadim Roytershteyn, E. Wes Bethel, Mark Howison, Ke-Jou Hsu, Kuan-Wu Lin, Arie Shoshani, Andrew Uselton, and Kesheng Wu, "Parallel I/O, Analysis, and Visualization of a Trillion Particle Simulation", SuperComputing 2012 (SC12), Salt Lake City, Utah, November 2012,

E. W. Bethel, Surendra Byna, Jerry Chou, Cormier-Michel, Cameron G. R. Geddes, Howison, Fuyu Li, Prabhat, Ji Qiang, R\ ubel, Rob D. Ryne, Michael Wehner, Wu, "Big Data Analysis and Visualization: What Do LINACS Tropical Storms Have In Common?", 11th International Computational Accelerator Physics ICAP 2012, Germany, 2012,

2011

Jerry Chou, Kesheng Wu, Oliver Rübel, Mark Howison, Ji Qiang, Prabhat, Brian Austin, E. Wes Bethel, Rob D. Ryne, and Arie Shoshani, "Parallel Index and Query for Large Scale Data Analysis", In Proceedings of Supercomputing 2011, Seattle, WA, USA, 2011, 1-11, LBNL 5317E, doi: 10.1145/2063384.2063424

Hans Johansen

2016

Bin Dong, Suren Byna, Kesheng Wu, Prabhat, Hans Johansen, Jeffrey N. Johnson, and Noel Keen, "Data Elevator: Low-contention Data Movement in Hierarchical Storage System", The 23rd annual IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC) (Acceptance rate: 25%), December 19, 2016,

Dharshi Devendran, Suren Byna, Bin Dong, Brian van Straalen, Hans Johansen, Noel Keen, and Nagiza Samatova,, "Collective I/O Optimizations for Adaptive Mesh Refinement Data Writes on Lustre File System", Cray User Group (CUG) 2016, May 10, 2016,

Xiaocheng Zou, David Boyuka, Dhara Desai, Martin, Suren Byna, Kesheng Wu, Kushal, Bin Dong, Wenzhao Zhang, Houjun Tang Dharshi Devendran, David Trebotich, Scott, Hans Johansen, Nagiza Samatova, "AMR-aware In Situ Indexing and Scalable Querying", The 24th High Performance Computing Symposium (HPC, January 1, 2016,

2015

Xiaocheng Zou, Kesheng Wu, David A. Boyuka, Daniel F. Martin, Suren Byna, Houjun, Kushal Bansal, Terry J. Ligocki, Hans Johansen, and Nagiza F. Samatova, "Parallel In Situ Detection of Connected Components Adaptive Mesh Refinement Data", Proceedings of the Cluster, Cloud and Grid Computing (CCGrid) 2015, 2015,

David Leinweber

2013

Kesheng Wu, Wes Bethel, Ming Gu, David, Oliver R\ ubel, Testing VPIN on Big Data, Available at SSRN 2318259, 2013,

Kesheng Wu, Wes Bethel, Ming Gu, David, Oliver R\ ubel, "A Big Data Approach to Analyzing Market Volatility", Algorithmic Finance, 2013, 2:241--267, LBNL LBNL-6382E, doi: 10.3233/AF-13030

Understanding the microstructure of the financial market requires the processing of a vast amount of data related to individual trades, and sometimes even multiple levels of quotes. Analyzing such a large volume of data requires tremendous computing power that is not easily available to financial academics and regulators. Fortunately, public funded High Performance Computing (HPC) power is widely available at the National Laboratories in the US. In this paper we demonstrate that the HPC resource and the techniques for data-intensive sciences can be used to greatly accelerate the computation of an early warning indicator called Volume-synchronized Probability of Informed trading (VPIN). The test data used in this study contains five and a half year's worth of trading data for about 100 most liquid futures contracts, includes about 3 billion trades, and takes 140GB as text files. By using (1) a more efficient file format for storing the trading records, (2) more effective data structures and algorithms, and (3) parallelizing the computations, we are able to explore 16,000 different ways of computing VPIN in less than 20 hours on a 32-core IBM DataPlex machine. Our test demonstrates that a modest computer is sufficient to monitor a vast number of trading activities in real-time -- an ability that could be valuable to regulators.

Our test results also confirm that VPIN is a strong predictor of liquidity-induced volatility. With appropriate parameter choices, the false positive rates are about 7% averaged over all the futures contracts in the test data set. More specifically, when VPIN values rise above a threshold (CDF > 0.99), the volatility in the subsequent time windows is higher than the average in 93% of the cases.

2012

E. W. Bethel and D. Leinweber and O. Rubel and K. Wu, "Federal Market Information Technology in the Post Flash Crash Era: Roles of Supercomputing", The Journal of Trading, 2012, 7:9-24, LBNL 5263E, doi: 10.3905/jot.2012.7.2.009

2011

E. Wes Bethel, David Leinweber, Oliver Rübel, Kesheng Wu, "Federal Market Information Technology in the Post Flash Crash Era: Roles of Supercomputing", Workshop on High Performance Computational Finance at SC11, Seattle, WA, USA, November 2011, LBNL 5263E,

Xiaoye Li

2011

R. Ryne, B. Austin, J. Byrd, J. Corlett, E. Esarey, C. G. R. Geddes, W. Leemans, X. Li, Prabhat, J. Qiang, O. Rübel, J.-L. Vay, M. Venturini, K. Wu, B. Carlsten, D. Higdon and N. Yampolsky, "High Performance Computing in Accelerator Science: Past Successes, Future Challenges", Workshop on Data and Communications in Basic Energy Sciences: Creating a Pathway for Scientific Discovery, October 2011,

Terry J. Ligocki

2015

Xiaocheng Zou, Kesheng Wu, David A. Boyuka, Daniel F. Martin, Suren Byna, Houjun, Kushal Bansal, Terry J. Ligocki, Hans Johansen, and Nagiza F. Samatova, "Parallel In Situ Detection of Connected Components Adaptive Mesh Refinement Data", Proceedings of the Cluster, Cloud and Grid Computing (CCGrid) 2015, 2015,

Burlen Loring

2016

Utkarsh Ayachit, Andrew Bauer, Earl P. N. Duque, Greg Eisenhauer, Nicola Ferrier, Junmin Gu, Kenneth Jansen, Burlen Loring, Zarija Luki\ c, Suresh Menon, Dmitriy Morozov, Patrick O Leary, Michel Rasquin, Christopher P. Stone, Venkat Vishwanath, Gunther H. Weber, Brad Whitlock, Matthew Wolf, K. John Wu, E. Wes Bethel, "Performance Analysis, Design Considerations, and Applications of Extreme-scale In Situ Infrastructures", ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), Salt Lake City, UT, USA, 2016, doi: 10.1109/SC.2016.78

2012

Mehmet Balman, Eric Pouyoul, Yushu Yao, E. Wes Bethel, Burlen Loring, Prabhat, John Shalf, Alex Sim, and Brian L. Tierney, "Experiences with 100G Network Applications", In Proceedings of the Fifth international Workshop on Data-intensive Distributed Computing, in conjunction with ACM High Performance Distributing Computing (HPDC) Conference, 2012, Delft, Netherlands, June 2012, LBNL 5603E, doi: 10.1145/2286996.2287004

100Gbps networking has finally arrived, and many research and educational in- stitutions have begun to deploy 100Gbps routers and services. ESnet and Internet2 worked together to make 100Gbps networks available to researchers at the Super- computing 2011 conference in Seattle Washington. In this paper, we describe two of the first applications to take advantage of this network. We demonstrate a visu- alization application that enables remotely located scientists to gain insights from large datasets. We also demonstrate climate data movement and analysis over the 100Gbps network. We describe a number of application design issues and host tuning strategies necessary for enabling applications to scale to 100Gbps rates. 

Kamesh Madduri

2009

K. Madduri and D.A. Bader, "Compact Graph Representations and Parallel Connectivity Algorithms for Massive Dynamic Network Analysis", The 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2009), Rome, Italy, 2009,

Victor M. Markowitz

2013

Alex Romosan, Arie Shoshani, Kesheng Wu, Markowitz, Kostas Mavrommatis, "Accelerating gene context analysis using bitmaps", Proceedings of the 25th International Conference on and Statistical Database Management, 2013, 26, LBNL 6397E, doi: 10.1145/2484838.2484856

Daniel F. Martin

2016

Xiaocheng Zou, David Boyuka, Dhara Desai, Martin, Suren Byna, Kesheng Wu, Kushal, Bin Dong, Wenzhao Zhang, Houjun Tang Dharshi Devendran, David Trebotich, Scott, Hans Johansen, Nagiza Samatova, "AMR-aware In Situ Indexing and Scalable Querying", The 24th High Performance Computing Symposium (HPC, January 1, 2016,

2015

Xiaocheng Zou, Kesheng Wu, David A. Boyuka, Daniel F. Martin, Suren Byna, Houjun, Kushal Bansal, Terry J. Ligocki, Hans Johansen, and Nagiza F. Samatova, "Parallel In Situ Detection of Connected Components Adaptive Mesh Refinement Data", Proceedings of the Cluster, Cloud and Grid Computing (CCGrid) 2015, 2015,

2010

G. H. Weber, S. Ahern, E.W. Bethel, S. Borovikov, H.R. Childs, E. Deines, C. Garth, H. Hagen, B. Hamann, K.I. Joy, D. Martin, J. Meredith, Prabhat, D. Pugmire, O. Rübel, B. Van Straalen and K. Wu, "Recent Advances in VisIt: AMR Streamlines and Query-Driven Visualization", Numerical Modeling of Space Plasma Flows: Astronum-2009 (Astronomical Society of the Pacific Conference Series, 3185E, 2010, 429:329-334,

Joerg Meyer

2012

Karen L. Schuchardt, Deborah A. Agarwal, Stefan A. Finsterle, Carl W. Gable, Ian Gorton, Luke J. Gosink, Elizabeth H. Keating, Carina S. Lansing, Joerg Meyer, William A.M. Moeglein, George S.H. Pau, Ellen A. Porter, Sumit Purohit, Mark L. Rockhold, Arie Shoshani, and Chandrika Sivaramakrishnan, Akuna, "Integrated Toolsets Supporting Advanced Subsurface Flow and Transport Simulations for Environmental Management", XIX International Conference on Computational Methods in Water Resources (CMWR 2012), University of Illinois at Urbana-Champaign, June 2012,

Dmitriy Morozov

2016

Utkarsh Ayachit, Andrew Bauer, Earl P. N. Duque, Greg Eisenhauer, Nicola Ferrier, Junmin Gu, Kenneth Jansen, Burlen Loring, Zarija Luki\ c, Suresh Menon, Dmitriy Morozov, Patrick O Leary, Michel Rasquin, Christopher P. Stone, Venkat Vishwanath, Gunther H. Weber, Brad Whitlock, Matthew Wolf, K. John Wu, E. Wes Bethel, "Performance Analysis, Design Considerations, and Applications of Extreme-scale In Situ Infrastructures", ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), Salt Lake City, UT, USA, 2016, doi: 10.1109/SC.2016.78

Vijaya Natarajan

2011

Dean N. Williams, Ian T. Foster, Don E. Middleton, Rachana Ananthakrishnan, Neill Miller, Mehmet Balman, Junmin Gu, Vijaya Natarajan, Arie Shoshani, Alex Sim, Gavin Bell, Robert Drach, Michael Ganzberger, Jim Ahrens, Phil Jones, Daniel Crichton, Luca Cinquini, David Brown, Danielle Harper, Nathan Hook, Eric Nienhouse, Gary Strand, Hannah Wilcox, Nathan Wilhelmi, Stephan Zednik, Steve Hankin, Roland Schweitzer, John Harney, Ross Miller, Galen Shipman, Feiyi Wang, Peter Fox, Patrick West, Stephan Zednik, Ann Chervenak, Craig Ward, "Earth System Grid Center for Enabling Technologies (ESG-CET): A Data Infrastructure for Data-Intensive Climate Research", SciDAC Conference, 2011,

2010

Alex Sim, Mehmet Balman, Dean N. Williams, Arie Shoshani, Vijaya Natarajan, "Adaptive Transfer Adjustment in Efficient Bulk Data Transfer Management for Climate Datasets", The 22nd IASTED International Conference on Parallel and Distributed Computing and System, Marina Del Rey, CA, November 20, 2010, LBNL 3985E,

Many scientific applications and experiments, such as high energy and nuclear physics, astrophysics, climate observation and modeling, combustion, nano-scale material sciences, and computational biology, generate extreme volumes of data with a large number of files. These data sources are distributed among national and international data repositories, and are shared by large numbers of geographically distributed scientists. A large portion of the data is frequently accessed, and a large volume of data is moved from one place to another for analysis and storage. A challenging issue in such efforts is the limited network capacity for moving large datasets. A tool that addresses this challenge is the Bulk Data Mover (BDM), a data transfer management tool used in the Earth System Grid (ESG) community. It has been managing massive dataset transfers efficiently in the environment where the network bandwidth is limited. Adaptive transfer adjustment was studied to enhance the BDM to handle significant end-to-end performance changes in the dynamic network environments as well as to control the data transfers for the desired transfer performance. We describe the results from our hands-on data transfer management experience in the climate research community. We study a practical transfer estimation model and state our initial results from the adaptive transfer adjustment methodology. 

Peter Nugent

2017

Jonathan Wang, Wucherl Yoo, Alex Sim, Peter Nugent, K. John Wu, "Parallel Variable Selection for Effective Performance Prediction", the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid2017), 2017, doi: 10.1109/CCGRID.2017.47

2016

W. Yoo, M. Koo, Y. Cao, A. Sim, P. Nugent, K. Wu, "Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters", Conquering Big Data with High Performance Computing, edited by R. Arora, (Springer International: 2016) Pages: 139-161 doi: 10.1007/978-3-319-33742-5

2015

W. Yoo, M. Koo, Y. Cao, A. Sim, P. Nugent, K. Wu, "PATHA: Performance Analysis Tool for HPC Applications", the 34th IEEE International Performance Computing and Communications Conference (IPCCC 2015), 2015,

2014

F. Rusu, P. Nugent, K. Wu, "Implementing the Palomar Transient Factory Real-Time Pipeline in GLADE: Results and", Lecture Notes in Computer Science, ( 2014) Pages: 53--66

Douglas Olson

2008

W. Betts, L. Didenko, T. Freeman, P. Jakl, L. Hajdu, E. Hjort, K. Keahey, J. Lauret, D. Olson, A. Rose, I. Sakrejda, A. Sim, "STAR Grid Activities, OSG and Beyond", International Symposium on Grid Computing (ISGC), 2008,

2006

E. Hjort, L. Hajdu, J. Lauret, D. Olson, A. Sim, A. Shoshani, "Data and Computational Grid Coupling in RHIC/STAR – An Analysis Scenario using SRM Technology", Computing in High Energy Physics (CHEP), 2006,

2004

Eric Hjort, Doug Olson, Jerome Lauret, Arie Shoshani, Alex Sim, "Production mode Data- Replication framework in STAR using the HRM Grid middleware", Computing in High Energy Physics, 2004,

2003

A. Sim, J. Gu, A. Shoshani, E. Hjort, D. Olson, "Experience with Deploying Storage Resource Managers to Achieve Robust File Replication", Computing in High Energy Physics, 2003,

D. Yu, J. Lauret, A. Shoshani, D. Oldon, E. Hjort, A. Sim, "The Design of High Performance Data Replication in the Grid Environment for the STAR Collaboration", Computing in High Energy Physics, 2003,

2001

E. Hjort, D. Olson, A. Sim, J. Yang, J. Lauret, M. Messer, "Data Grid Services in STAR, Initial Deployment: Site-to-Site File Replication", Computing in High Energy Physics, 2001,

D. Olson, E. Hjort, J. Lauret, M. Messer, A. Shoshani, A. Sim, "Non-shared Disk Cluster - A Fault Tolerant, Commodity Approach to Hi-Bandwidth Data Analysis", Computing in High Energy Physics, 2001,

John Owens

2008

Luke J. Gosink, Kesheng Wu, E. Wes Bethel, John D. Owens, Kenneth I. Joy, "Bin-Hash Indexing: A Parallel Method for Fast Query Processing", 2008, LBNL 729E,

Prabhat

2016

Md. Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Narayanan Sundaram, Jialin Liu, Peter Sadowski, Evan Racah, Suren Byna, Craig Tull, Wahid Bhimji, Prabhat, and Pradeep Dubey,, "PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures", 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS) 2016, Chicago, May 23, 2016,

Wahid Bhimji, Debbie Bard, Melissa Romanus, David Paul, Andrey Ovsyannikov, Brian Friesen, Matt Bryson, Joaquin Correa, Glenn K. Lockwood, Vakho Tsulaia, Suren Byna, Steve Farrell, Doga Gursoy, Chris Daley, Vince Beckner, Brian Van Straalen, Nicholas Wright, Katie Antypas, Prabhat,, "Accelerating Science with the NERSC Burst Buffer Early User Program", Cray User Group (CUG) 2016, May 10, 2016,

2015

Soyoung Jeon, Prabhat, Suren Byna, Junmin Gu, William Collins, and Michael Wehner,, "Characterization of extreme precipitation within atmospheric river events over California", Advances in Statistical Climatology, Meteorology and Oceanography (ASCMO), November 21, 2015, 1:45-57, doi: 10.5194/ascmo-1-45-2015

Md. Mostofa Ali Patwary, Suren Byna, Nadathur Rajagopalan Satish, Narayanan Sundaram, Zarija Lukic, Vadim Roytershteyn, Michael J. Anderson, Yushu Yao, Mr Prabhat, and Pradeep Dubey, "BD-CATS: Big Data Clustering at Trillion Particle Scale", Supercomputing 2015 (SC15), Supercomputing 2015 (SC15), November 17, 2015,

Babak Behzad, Suren Byna, Prabhat and Marc Snir, "Pattern-driven Parallel I/O Tuning", 10th Parallel Data Storage Workshop (PDSW) 2015, held in conjunction with SC15, 10th Parallel Data Storage Workshop (PDSW) 2015, to be held in conjunction with SC15, November 16, 2015,

Shane Snyder, Philip Carns, Robert Latham, Misbah Mubarak, Chris Carothers, Babak Behzad, Huong Vu Thanh Luu, Suren Byna, and Prabhat, "Techniques for Modeling Large-scale HPC I/O Workloads", the 6th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS15), in conjunction with SC15, the 6th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performa, November 15, 2015,

Babak Behzad, Suren Byna, Stefan Wild, Prabhat and Marc Snir, "Dynamic Model-driven Parallel I/O Performance Tuning", IEEE Cluster 2015, 2015,

H. Luu, M. Winslett, W. Gropp, R. Ross, P. Carns, K. Harms, Prabhat, S. Byna, Y. Yao,, "A Multi-platform Study of I/O Behavior on Petascale Supercomputers", The 24th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC) 2015, 2015,

2014

Soyoung Jeon, Christopher Paciorek, Prabhat, Surendra Byna, William Collins, Michael Wehner, "Uncertainty Quantification for Characterizing Spatial Tail Dependence under Statistical Framework", AGU, Fall Meeting 2014, 2014,

Babak Behzad, Surendra Byna, Stefan M. Wild, Mr. Prabhat, Marc Snir, "Improving Parallel I/O Autotuning with Performance Modeling", ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC 2014), New York, NY, USA, ACM, 2014, 253--256, doi: 10.1145/2600212.2600708

M Scot Breitenfeld, Kalyana Chadalavada, Robert Sisneros, Surendra Byna, Quincey Koziol, Neil Fortner, Prabhat, Venkat Vishwanath, "Recent Progress in Tuning Performance of Large-scale I/O with Parallel HDF5", The 9th Parallel Data Storage Workshop (PDSW) held in conjunction with SC14, 2014,

2013

Babak Behzad, Huong Vu Thanh Luu, Joseph Huchette, Surendra Byna, Prabhat, Ruth Aydt, Quincey Koziol, and Marc Snir, "Taming parallel I/O complexity with auto-tuning", In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '13), 2013,

E. Wes Bethel, Prabhat, Suren Byna, Oliver Rübel, K. John Wu, and Michael Wehner, "Why High Performance Visual Data Analytics is both Relevant and Difficult", Proceedings of Visualization and Data Analysis 2013, IS&T/SPIE Electronic Imaging 2013, San Francisco, CA, USA, SPIE, February 2013, LBNL LBNL-6063E,

2012

Surendra Byna, Jerry Chou, Oliver Rübel, Prabhat, Homa Karimabadi, William S. Daughton, Vadim Roytershteyn, E. Wes Bethel, Mark Howison, Ke-Jou Hsu, Kuan-Wu Lin, Arie Shoshani, Andrew Uselton, and Kesheng Wu, "Parallel I/O, Analysis, and Visualization of a Trillion Particle Simulation", SuperComputing 2012 (SC12), Salt Lake City, Utah, November 2012,

Mehmet Balman, Eric Pouyoul, Yushu Yao, E. Wes Bethel, Burlen Loring, Prabhat, John Shalf, Alex Sim, and Brian L. Tierney, "Experiences with 100G Network Applications", In Proceedings of the Fifth international Workshop on Data-intensive Distributed Computing, in conjunction with ACM High Performance Distributing Computing (HPDC) Conference, 2012, Delft, Netherlands, June 2012, LBNL 5603E, doi: 10.1145/2286996.2287004

100Gbps networking has finally arrived, and many research and educational in- stitutions have begun to deploy 100Gbps routers and services. ESnet and Internet2 worked together to make 100Gbps networks available to researchers at the Super- computing 2011 conference in Seattle Washington. In this paper, we describe two of the first applications to take advantage of this network. We demonstrate a visu- alization application that enables remotely located scientists to gain insights from large datasets. We also demonstrate climate data movement and analysis over the 100Gbps network. We describe a number of application design issues and host tuning strategies necessary for enabling applications to scale to 100Gbps rates. 

Prabhat, Oliver Rübel, Surendra Byna, Kesheng Wu, Fuyu Li, Michael Wehner and E. Wes Bethel, "TECA: A Parallel Toolkit for Extreme Climate Analysis", Procedia Computer Science, Proceedings of the International Conference on Computational Science, ICCS 2012, Presented at Third Worskhop on Data Mining in Earth System Science (DMESS 2012), Omaha, Nebraska, June 2012, 9:866–876, LBNL 5352E, doi: 10.1016/j.procs.2012.04.093

We present TECA, a parallel toolkit for detecting extreme events in large climate datasets. Modern climate datasets expose parallelism across a number of dimensions: spatial locations, timesteps and ensemble members. We design TECA to exploit these modes of parallelism and demonstrate a prototype implementation for detecting and tracking three classes of extreme events: tropical cyclones, extra-tropical cyclones and atmospheric rivers. We process a modern TB-sized CAM5 simulation dataset with TECA, and demonstrate good runtime performance for the three case studies.

E. W. Bethel, Surendra Byna, Jerry Chou, Cormier-Michel, Cameron G. R. Geddes, Howison, Fuyu Li, Prabhat, Ji Qiang, R\ ubel, Rob D. Ryne, Michael Wehner, Wu, "Big Data Analysis and Visualization: What Do LINACS Tropical Storms Have In Common?", 11th International Computational Accelerator Physics ICAP 2012, Germany, 2012,

Allen R. Sanderson, Brad Whitlock, Oliver, Hank Childs, Gunther H. Weber, , Kesheng Wu, "A System for Query Based Analysis and Visualization", Third International Eurovis Workshop on Visual EuroVA 2012, Vienna, Austria, January 2012, LBNL 5507E,

2011

Suren Byna, Prabhat, Michael F. Wehner and Kesheng Wu, "Detecting Atmospheric Rivers in Large Climate Datasets", Proceedings of the 2nd International Workshop on Petascale Data Analytics: Challenges, and Opportunities (PDAC-11/ Supercomputing11/ ACM/IEEE), November 14, 2011, Seattle, Washington, 2011, doi: 10.1145/2110205.2110208

Extreme precipitation events on the western coast of North America are often traced to an unusual weather phenomenon known as atmospheric rivers. Although these storms may provide a significant fraction of the total water to the highly managed western US hydrological system, the resulting intense weather poses severe risks to the human and natural infrastructure through severe flooding and wind damage. To aid the understanding of this phenomenon, we have developed an efficient detection algorithm suitable for analyzing large amounts of data. In addition to detecting actual events in the recent observed historical record, this detection algorithm can be applied to global climate model output providing a new model validation methodology. Comparing the statistical behavior of simulated atmospheric river events in models to observations will enhance confidence in projections of future extreme storms. Our detection algorithm is based on a thresholding condition on the total column integrated water vapor established by Ralph et al. (2004) followed by a connected component labeling procedure to group the mesh points into connected regions in space. We develop an efficient parallel implementation of the algorithm and demonstrate good weak and strong scaling. We process a 30-year simulation output on 10,000 cores in under 3 seconds.

Jerry Chou, Kesheng Wu, Oliver Rübel, Mark Howison, Ji Qiang, Prabhat, Brian Austin, E. Wes Bethel, Rob D. Ryne, and Arie Shoshani, "Parallel Index and Query for Large Scale Data Analysis", In Proceedings of Supercomputing 2011, Seattle, WA, USA, 2011, 1-11, LBNL 5317E, doi: 10.1145/2063384.2063424

R. Ryne, B. Austin, J. Byrd, J. Corlett, E. Esarey, C. G. R. Geddes, W. Leemans, X. Li, Prabhat, J. Qiang, O. Rübel, J.-L. Vay, M. Venturini, K. Wu, B. Carlsten, D. Higdon and N. Yampolsky, "High Performance Computing in Accelerator Science: Past Successes, Future Challenges", Workshop on Data and Communications in Basic Energy Sciences: Creating a Pathway for Scientific Discovery, October 2011,

Prabhat, Suren Byna. Chris Paciorek, Gunther Weber, Kesheng Wu, Thomas Yopes, Michael Wehner, William Collins, George Ostrouchov, Richard Strelitz, E. Wes Bethel, "Pattern Detection and Extreme Value Analysis on Large Climate Data", DOE/BER Climate and Earth System Modeling PI Meeting, September 2011,

J. Chou, K. Wu, O. R\ ubel, M. Howison, Qiang, Prabhat, B. Austin, E. W. Bethel, D. Ryne, A. Shoshani, "Parallel Index and Query for Large Scale Data", SC11, 2011, doi: 10.1145/2063384.2063424

Jerry Chuo, John Wu, Prabhat, "FastQuery: A Parallel Indexing System for Scientific Data", Workshop on Interfaces and Abstractions for Scientific Data Storage, IEEE Cluster, 2011,

2010

Oliver Rübel, Sean Ahern, E. Wes Bethel, Mark. D Biggin, Hank Childs, Estelle Cormier-Michel, Angela DePace, Michael B. Eisen, Charless C. Fowlkes, Cameron G. R. Geddes, Hans Hagen, Bernd Hamann, Min-Yu Huang, Soile V. E. Keränen, David W. Knowles, Cris L. Luengo Hendriks, Jitendra Malik, Jeremy Meredith, Peter Messmer, Prabhat, Daniela Ushizima, Gunther H. Weber, and Kesheng Wu, "Coupling Visualization and Data Analysis for Knowledge Discovery from Multi-dimensional Scientific Data", Procedia Computer Science, Proceedings of International Conference on Computational Science, ICCS 2010, June 2010, LBNL 3669E,

G. H. Weber, S. Ahern, E.W. Bethel, S. Borovikov, H.R. Childs, E. Deines, C. Garth, H. Hagen, B. Hamann, K.I. Joy, D. Martin, J. Meredith, Prabhat, D. Pugmire, O. Rübel, B. Van Straalen and K. Wu, "Recent Advances in VisIt: AMR Streamlines and Query-Driven Visualization", Numerical Modeling of Space Plasma Flows: Astronum-2009 (Astronomical Society of the Pacific Conference Series, 3185E, 2010, 429:329-334,

2009

O. Rübel, C.G.R. Geddes, E. Cormier-Michel, K. Wu, Prabhat, G.H. Weber, D.M. Ushizima, P. Messmer, H. Hagen, B. Hamann, and E.W. Bethel, "Automatic Beam Path Analysis of laser Wakefield Particle Acceleration Data", IOP Computational Science & Discovery, November 2009, 2, LBNL 2734E,

E. W. Bethel, C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, M. Day, E. Deines, T. Fogal, C. Garth, C. G. R. Geddes, H. Hagen, B. Hamann, C. Hansen, J. Jacobsen, K. Joy, J. Kruger, J. Meredith, P. Messmer, G. Ostrouchov, V. Pascucci, K. Potter, Prabhat, D. Pugmire, O. Rubel, A. Sanderson, C. Silva, D. Ushizima, G. Weber, B. Whitlock, K. Wu, "Occam's Razor and Petascale Visual Data Analysis", SciDAC 2009, J. of Physics: Conference Series, San Diego, California, July 2009, LBNL 2210E,

E. Wes Bethel, Oliver Rübel, Prabhat, Kesheng Wu, Gunther H. Weber, Valerio Pascucci, Hank Childs, Ajith Mascarenhas, Jeremy Meredith, and Sean Ahern, "Modern Scientific Visualization is More than Just Pretty Pictures", Numerical Modeling of Space Plasma Flows: Astronum-2008 (Astronomical Society of the Pacific Conference Series, St. Thomas, USVI, June 2009, 301-317, LBNL 1450E,

K Wu et al., "FastBit: Interactively Searching Massive Data", SciDAC 2009, 2009, LBNL 2164E, doi: 10.1088/1742-6596/180/1/012053

C. G. R. Geddes, E Cormier-Michel, E. H. Esarey, C. B. Schroeder, J.-L. Vay, W. P. Leemans, D. L.. Bruhwiler, J. R. Cary, B. Cowan, M. Durant, P. Hamill, P. Messmer, P. Mullowney, C. Nieter, K. Paul, S. Shasharina, S. Veitzer, G. Weber, O. Rübel, D. Ushizima, Prabhat, E. W.Bethel, K. Wu, Large Fields for Smaller Facility Sources, SciDAC Review, Pages: 13-21, 2009,

2008

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "High Performance Multivariate Visual Data Exploration for Extemely Large Data", Supercomputing (SC), Austin, Texas, USA, November 2008, LBNL 716E,

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "Application of High-performance Visual Analysis Methods to Laser Wakefield Particle Acceleration Data", IEEE Visualization 2008, October 2008,

Alex Romosan

2016

Deborah A Agarwal, Boris Faybishenko, Vicky L, Harinarayan Krishnan, Carina Lansing Gary Kushner, Ellen Porter, Alexandru Romosan Arie Shoshani, Haruko Wainwright, Arthur, Kesheng Wu, "A Science Data Gateway for Environmental Management", Concurrency and Computation: Practice and Experience, 2016, 28:1994--2004, doi: 10.1002/cpe.3697

2014

DP Schissel, Gheni Abla, SM Flanagan, M Greenwald, X Lee, A Romosan, A Shoshani, J Stillerman, J Wright, "Automated metadata, provenance cataloging and navigable interfaces: Ensuring the usefulness of extreme-scale data", Fusion Engineering and Design, North-Holland, 2014,

John C Wright, Martin Greenwald, Joshua Stillerman, Gheni Abla, Bobby Chanthavong, Sean Flanagan, David Schissel, Xia Lee, Alex Romosan, Arie Shoshani, The MPO API: A tool for recording scientific workflows, Fusion Engineering and Design, 2014,

2013

Alex Romosan, Arie Shoshani, Kesheng Wu, Markowitz, Kostas Mavrommatis, "Accelerating gene context analysis using bitmaps", Proceedings of the 25th International Conference on and Statistical Database Management, 2013, 26, LBNL 6397E, doi: 10.1145/2484838.2484856

Doron Rotem

2011

A. Shoshani, I. Altintas, J. Chen, G. Chin, A. Choudhary, D. Crawl, T. Critchlow, K. Gao, B. Grimm, H. Iyer, C. Kamath, A. Khan, S. Klasky, S. Koehler, S. Lang, R. Latham, J. W. Li, W. Liao, J. Ligon, Q. Liu, B. Ludaescher, P. Mouallem, M. Nagappan, N. Podhorszki, R. Ross, D. Rotem, N. Samatova, C. Silva, A. Sim, R. Tchoua, R. Thakur, M. Vouk, K. Wu, W. Yu, "The Scientific Data Management Center: Available Technologies and Highlights", SciDAC Conference, 2011,

Kesheng Wu, Surendra Byna, Doron Rotem, Arie, "Scientific Data Services -- A High-Performance I/O with Array Semantics", HPCDB, IEEE, 2011, doi: 10.11v45/2125636.2125640

2009

Scientific Data Management: Challenges, Technology, and Deployment, edited by Arie Shoshani and Doron Rotem, (Chapman & Hall/CRC Computational Science: December 2009)

Ekow J. Otoo, Doron Rotem, and Shih-Chiang Tsao, "Energy smart management of scientific data", 21st Int'l. Conf. on Sc. and Stat. Database Management (SSDBM’2009), New Orleans, Louisiana, USA, June 2009, LBNL 2185E,

Ekow Otoo, Doron Rotem and Shih-Chiang Tsao, "Analysis of Trade-Off Between Power Saving and Response Time in Disk Storage Systems", Fifth Workshop on High-Performance, Power-Aware Computing, Rome, Italy, May 2, 2009,

Ekow J. Otoo, Doron Rotem, and Shih-Chiang Tsao, "Workload-adaptive management of energy-smart disk storage systems", IASDS09: Workshop on Interfaces and Architecture, 2009,

2008

Kurt Stockinger, John Cieslewicz, Kesheng Wu, Rotem, Arie Shoshani, "Using Bitmap Indexing Technology for Combined and Text Queries", Annals of Information Systems, (Springer: 2008) Pages: 1--23

2001

A. Sim, H. Nordberg, L.M. Bernardo, A. Shoshani, D. Rotem, "Experience with using CORBA to implement a file caching coordination system", Concurrency and Computation: Practice and Experience, 2001, 13:1-15,

1999

A. Sim, H. Nordberg, L. M. Bernardo, A. Shoshani, D. Rotem, "Storage Access Coordination Using CORBA", Distributed Objects and Application, 1999, 168-175,

A. Shoshani, L.M. Bernardo, H. Nordberg, D. Rotem and A. Sim, "Multidimensional Indexing and Query Coordination for Tertiary Storage Management", International Conference on Scientific and Statistical Database Management, 1999, 214-225,

1998

L.M. Bernardo, D. Rotem, A. Shoshani, H. Nordberg, A. Sim, "Using Access Patterns to Partition Large Datasets on Tertiary Storage in Order to Minimize Retrieval Costs", 1998, LBNL 41504,

A. Shoshani, L.M. Bernardo, H. Nordberg, D. Rotem, A. Sim, "Storage Management for High Energy Physics Applications", Computing in High Energy Physics, 1998,

Oliver Rübel

2013

E. Wes Bethel, Prabhat, Suren Byna, Oliver Rübel, K. John Wu, and Michael Wehner, "Why High Performance Visual Data Analytics is both Relevant and Difficult", Proceedings of Visualization and Data Analysis 2013, IS&T/SPIE Electronic Imaging 2013, San Francisco, CA, USA, SPIE, February 2013, LBNL LBNL-6063E,

Kesheng Wu, Wes Bethel, Ming Gu, David, Oliver R\ ubel, Testing VPIN on Big Data, Available at SSRN 2318259, 2013,

Kesheng Wu, Wes Bethel, Ming Gu, David, Oliver R\ ubel, "A Big Data Approach to Analyzing Market Volatility", Algorithmic Finance, 2013, 2:241--267, LBNL LBNL-6382E, doi: 10.3233/AF-13030

Understanding the microstructure of the financial market requires the processing of a vast amount of data related to individual trades, and sometimes even multiple levels of quotes. Analyzing such a large volume of data requires tremendous computing power that is not easily available to financial academics and regulators. Fortunately, public funded High Performance Computing (HPC) power is widely available at the National Laboratories in the US. In this paper we demonstrate that the HPC resource and the techniques for data-intensive sciences can be used to greatly accelerate the computation of an early warning indicator called Volume-synchronized Probability of Informed trading (VPIN). The test data used in this study contains five and a half year's worth of trading data for about 100 most liquid futures contracts, includes about 3 billion trades, and takes 140GB as text files. By using (1) a more efficient file format for storing the trading records, (2) more effective data structures and algorithms, and (3) parallelizing the computations, we are able to explore 16,000 different ways of computing VPIN in less than 20 hours on a 32-core IBM DataPlex machine. Our test demonstrates that a modest computer is sufficient to monitor a vast number of trading activities in real-time -- an ability that could be valuable to regulators.

Our test results also confirm that VPIN is a strong predictor of liquidity-induced volatility. With appropriate parameter choices, the false positive rates are about 7% averaged over all the futures contracts in the test data set. More specifically, when VPIN values rise above a threshold (CDF > 0.99), the volatility in the subsequent time windows is higher than the average in 93% of the cases.

2012

Surendra Byna, Jerry Chou, Oliver Rübel, Prabhat, Homa Karimabadi, William S. Daughton, Vadim Roytershteyn, E. Wes Bethel, Mark Howison, Ke-Jou Hsu, Kuan-Wu Lin, Arie Shoshani, Andrew Uselton, and Kesheng Wu, "Parallel I/O, Analysis, and Visualization of a Trillion Particle Simulation", SuperComputing 2012 (SC12), Salt Lake City, Utah, November 2012,

Prabhat, Oliver Rübel, Surendra Byna, Kesheng Wu, Fuyu Li, Michael Wehner and E. Wes Bethel, "TECA: A Parallel Toolkit for Extreme Climate Analysis", Procedia Computer Science, Proceedings of the International Conference on Computational Science, ICCS 2012, Presented at Third Worskhop on Data Mining in Earth System Science (DMESS 2012), Omaha, Nebraska, June 2012, 9:866–876, LBNL 5352E, doi: 10.1016/j.procs.2012.04.093

We present TECA, a parallel toolkit for detecting extreme events in large climate datasets. Modern climate datasets expose parallelism across a number of dimensions: spatial locations, timesteps and ensemble members. We design TECA to exploit these modes of parallelism and demonstrate a prototype implementation for detecting and tracking three classes of extreme events: tropical cyclones, extra-tropical cyclones and atmospheric rivers. We process a modern TB-sized CAM5 simulation dataset with TECA, and demonstrate good runtime performance for the three case studies.

E. W. Bethel, Surendra Byna, Jerry Chou, Cormier-Michel, Cameron G. R. Geddes, Howison, Fuyu Li, Prabhat, Ji Qiang, R\ ubel, Rob D. Ryne, Michael Wehner, Wu, "Big Data Analysis and Visualization: What Do LINACS Tropical Storms Have In Common?", 11th International Computational Accelerator Physics ICAP 2012, Germany, 2012,

Allen R. Sanderson, Brad Whitlock, Oliver, Hank Childs, Gunther H. Weber, , Kesheng Wu, "A System for Query Based Analysis and Visualization", Third International Eurovis Workshop on Visual EuroVA 2012, Vienna, Austria, January 2012, LBNL 5507E,

E. W. Bethel and D. Leinweber and O. Rubel and K. Wu, "Federal Market Information Technology in the Post Flash Crash Era: Roles of Supercomputing", The Journal of Trading, 2012, 7:9-24, LBNL 5263E, doi: 10.3905/jot.2012.7.2.009

2011

E. Wes Bethel, David Leinweber, Oliver Rübel, Kesheng Wu, "Federal Market Information Technology in the Post Flash Crash Era: Roles of Supercomputing", Workshop on High Performance Computational Finance at SC11, Seattle, WA, USA, November 2011, LBNL 5263E,

Jerry Chou, Kesheng Wu, Oliver Rübel, Mark Howison, Ji Qiang, Prabhat, Brian Austin, E. Wes Bethel, Rob D. Ryne, and Arie Shoshani, "Parallel Index and Query for Large Scale Data Analysis", In Proceedings of Supercomputing 2011, Seattle, WA, USA, 2011, 1-11, LBNL 5317E, doi: 10.1145/2063384.2063424

R. Ryne, B. Austin, J. Byrd, J. Corlett, E. Esarey, C. G. R. Geddes, W. Leemans, X. Li, Prabhat, J. Qiang, O. Rübel, J.-L. Vay, M. Venturini, K. Wu, B. Carlsten, D. Higdon and N. Yampolsky, "High Performance Computing in Accelerator Science: Past Successes, Future Challenges", Workshop on Data and Communications in Basic Energy Sciences: Creating a Pathway for Scientific Discovery, October 2011,

J. Chou, K. Wu, O. R\ ubel, M. Howison, Qiang, Prabhat, B. Austin, E. W. Bethel, D. Ryne, A. Shoshani, "Parallel Index and Query for Large Scale Data", SC11, 2011, doi: 10.1145/2063384.2063424

2010

Oliver Rübel, Sean Ahern, E. Wes Bethel, Mark. D Biggin, Hank Childs, Estelle Cormier-Michel, Angela DePace, Michael B. Eisen, Charless C. Fowlkes, Cameron G. R. Geddes, Hans Hagen, Bernd Hamann, Min-Yu Huang, Soile V. E. Keränen, David W. Knowles, Cris L. Luengo Hendriks, Jitendra Malik, Jeremy Meredith, Peter Messmer, Prabhat, Daniela Ushizima, Gunther H. Weber, and Kesheng Wu, "Coupling Visualization and Data Analysis for Knowledge Discovery from Multi-dimensional Scientific Data", Procedia Computer Science, Proceedings of International Conference on Computational Science, ICCS 2010, June 2010, LBNL 3669E,

G. H. Weber, S. Ahern, E.W. Bethel, S. Borovikov, H.R. Childs, E. Deines, C. Garth, H. Hagen, B. Hamann, K.I. Joy, D. Martin, J. Meredith, Prabhat, D. Pugmire, O. Rübel, B. Van Straalen and K. Wu, "Recent Advances in VisIt: AMR Streamlines and Query-Driven Visualization", Numerical Modeling of Space Plasma Flows: Astronum-2009 (Astronomical Society of the Pacific Conference Series, 3185E, 2010, 429:329-334,

2009

O. Rübel, C.G.R. Geddes, E. Cormier-Michel, K. Wu, Prabhat, G.H. Weber, D.M. Ushizima, P. Messmer, H. Hagen, B. Hamann, and E.W. Bethel, "Automatic Beam Path Analysis of laser Wakefield Particle Acceleration Data", IOP Computational Science & Discovery, November 2009, 2, LBNL 2734E,

E. W. Bethel, C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, M. Day, E. Deines, T. Fogal, C. Garth, C. G. R. Geddes, H. Hagen, B. Hamann, C. Hansen, J. Jacobsen, K. Joy, J. Kruger, J. Meredith, P. Messmer, G. Ostrouchov, V. Pascucci, K. Potter, Prabhat, D. Pugmire, O. Rubel, A. Sanderson, C. Silva, D. Ushizima, G. Weber, B. Whitlock, K. Wu, "Occam's Razor and Petascale Visual Data Analysis", SciDAC 2009, J. of Physics: Conference Series, San Diego, California, July 2009, LBNL 2210E,

E. Wes Bethel, Oliver Rübel, Prabhat, Kesheng Wu, Gunther H. Weber, Valerio Pascucci, Hank Childs, Ajith Mascarenhas, Jeremy Meredith, and Sean Ahern, "Modern Scientific Visualization is More than Just Pretty Pictures", Numerical Modeling of Space Plasma Flows: Astronum-2008 (Astronomical Society of the Pacific Conference Series, St. Thomas, USVI, June 2009, 301-317, LBNL 1450E,

K Wu et al., "FastBit: Interactively Searching Massive Data", SciDAC 2009, 2009, LBNL 2164E, doi: 10.1088/1742-6596/180/1/012053

C. G. R. Geddes, E Cormier-Michel, E. H. Esarey, C. B. Schroeder, J.-L. Vay, W. P. Leemans, D. L.. Bruhwiler, J. R. Cary, B. Cowan, M. Durant, P. Hamill, P. Messmer, P. Mullowney, C. Nieter, K. Paul, S. Shasharina, S. Veitzer, G. Weber, O. Rübel, D. Ushizima, Prabhat, E. W.Bethel, K. Wu, Large Fields for Smaller Facility Sources, SciDAC Review, Pages: 13-21, 2009,

2008

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "High Performance Multivariate Visual Data Exploration for Extemely Large Data", Supercomputing (SC), Austin, Texas, USA, November 2008, LBNL 716E,

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "Application of High-performance Visual Analysis Methods to Laser Wakefield Particle Acceleration Data", IEEE Visualization 2008, October 2008,

John M. Shalf

2012

Mehmet Balman, Eric Pouyoul, Yushu Yao, E. Wes Bethel, Burlen Loring, Prabhat, John Shalf, Alex Sim, and Brian L. Tierney, "Experiences with 100G Network Applications", In Proceedings of the Fifth international Workshop on Data-intensive Distributed Computing, in conjunction with ACM High Performance Distributing Computing (HPDC) Conference, 2012, Delft, Netherlands, June 2012, LBNL 5603E, doi: 10.1145/2286996.2287004

100Gbps networking has finally arrived, and many research and educational in- stitutions have begun to deploy 100Gbps routers and services. ESnet and Internet2 worked together to make 100Gbps networks available to researchers at the Super- computing 2011 conference in Seattle Washington. In this paper, we describe two of the first applications to take advantage of this network. We demonstrate a visu- alization application that enables remotely located scientists to gain insights from large datasets. We also demonstrate climate data movement and analysis over the 100Gbps network. We describe a number of application design issues and host tuning strategies necessary for enabling applications to scale to 100Gbps rates. 

2006

Luke Gosink, John Shalf, Kurt Stockinger, Wu, Wes Bethel, "HDF5-FastQuery: Accelerating Complex Queries on Datasets using Fast Bitmap Indices", SSDBM 2006, Vienna, Austria, July 2006, IEEE Computer Society Press., 2006, 149--158,

2005

Kurt Stockinger, John Shalf, Wes Bethel, Wu, "Query-Driven Visualization of Large Data Sets", IEEE Visualization 2005, Minneapolis, MN, October 2005, 2005, 22, doi: 10.1109/VIS.2005.84

Arie Shoshani

2016

Deborah A Agarwal, Boris Faybishenko, Vicky L, Harinarayan Krishnan, Carina Lansing Gary Kushner, Ellen Porter, Alexandru Romosan Arie Shoshani, Haruko Wainwright, Arthur, Kesheng Wu, "A Science Data Gateway for Environmental Management", Concurrency and Computation: Practice and Experience, 2016, 28:1994--2004, doi: 10.1002/cpe.3697

2015

Xiaocheng (Chris) Zou, Suren Byna, Hans Johansen, Daniel Martin, Nagiza F. Samatova, Arie Shoshani, John Wu, "Six-fold Speedup of Ice Calving Detection Achieved by AMR-aware Parallel Connected Component Labeling", SciDAC PI Meeting, July 2015, 2015,

Elaheh Pourabbas, Arie Shoshani, "The Composite Data Model: A Unified Approach for Combining and Querying Multiple Data Models", IEEE Trans. Knowl. Data Eng, 2015, 27(5):1424-1437,

2014

Qian Sun, Fan Zhang, Tong Jin, Hoang Bui, Kesheng Wu, Arie Shoshani, Hemanth Kolla, Scott Klasky, Jacqueline Chen and Manish Parashar, "Scalable Run-time Data Indexing and Querying for Scientific Simulations", Proceedings of the Fifth International Workshop on Big Data Analytics: Challenges, and Opportunities (BDAC’14), 2014,

US Patent 8,705,342 B2. “Co-scheduling of network resource provisioning and host-to-host bandwidth reservation on high-performance network and storage systems”, D. Yu, D. Katramatos, A. Sim, and A. Shoshani, Apr. 22, 2014, prior publication No. US 2012/0268053 A1 issued on Oct. 25, 2012, provisional application No. 61/393,750, filed on Oct. 15, 2010, LBNL IB-3152, BNL BSA 11-02.

Spyros Blanas, Kesheng Wu, Surendra Byna, Bin Dong, Arie Shoshani, "Parallel Data Analysis Directly on Scientific File Formats", SIGMOD 14, 2014, 385--396, doi: 10.1145/2588555.2612185

DP Schissel, Gheni Abla, SM Flanagan, M Greenwald, X Lee, A Romosan, A Shoshani, J Stillerman, J Wright, "Automated metadata, provenance cataloging and navigable interfaces: Ensuring the usefulness of extreme-scale data", Fusion Engineering and Design, North-Holland, 2014,

John C Wright, Martin Greenwald, Joshua Stillerman, Gheni Abla, Bobby Chanthavong, Sean Flanagan, David Schissel, Xia Lee, Alex Romosan, Arie Shoshani, The MPO API: A tool for recording scientific workflows, Fusion Engineering and Design, 2014,

2013

Alex Romosan, Arie Shoshani, Kesheng Wu, Markowitz, Kostas Mavrommatis, "Accelerating gene context analysis using bitmaps", Proceedings of the 25th International Conference on and Statistical Database Management, 2013, 26, LBNL 6397E, doi: 10.1145/2484838.2484856

2012

Surendra Byna, Jerry Chou, Oliver Rübel, Prabhat, Homa Karimabadi, William S. Daughton, Vadim Roytershteyn, E. Wes Bethel, Mark Howison, Ke-Jou Hsu, Kuan-Wu Lin, Arie Shoshani, Andrew Uselton, and Kesheng Wu, "Parallel I/O, Analysis, and Visualization of a Trillion Particle Simulation", SuperComputing 2012 (SC12), Salt Lake City, Utah, November 2012,

Karen L. Schuchardt, Deborah A. Agarwal, Stefan A. Finsterle, Carl W. Gable, Ian Gorton, Luke J. Gosink, Elizabeth H. Keating, Carina S. Lansing, Joerg Meyer, William A.M. Moeglein, George S.H. Pau, Ellen A. Porter, Sumit Purohit, Mark L. Rockhold, Arie Shoshani, and Chandrika Sivaramakrishnan, Akuna, "Integrated Toolsets Supporting Advanced Subsurface Flow and Transport Simulations for Environmental Management", XIX International Conference on Computational Methods in Water Resources (CMWR 2012), University of Illinois at Urbana-Champaign, June 2012,

D. Yu, D. Katramatos, A. Shoshani, A. Sim, J. Gu, V. Natarajan, "StorNet: Integrating Storage Resource Management with Dynamic Network Provisioning for Automated Data Transfer", International Committee for Future Accelerators (ICFA) Standing Committee on Inter-Regional Connectivity (SCIC) 2012 Report: Networking for High Energy Physics, 2012,

Benson Ma, Arie Shoshani, Alex Sim, Kesheng, Yong-Ik Byun, Jaegyoon Hahm, Min-Su Shin, "Efficient Attribute-Based Data Access in Astronomy", The 2nd International Workshop on Network-Aware Data Workshop (NDM2012), 2012, 562--571,

G. F. Lofstead, Q. Liu, J. Logan, Y. Tian, Abbasi, N. Podhorszki, J. Y. Choi, S., R. Tchoua, R. A. Oldfield, others, "Hello ADIOS: The Challenges and Lessons of Leadership Class I/O Frameworks", 2012,

Karen L. Schuchardt, Deborah A. Agarwal, Stefan A. Finsterle, Carl W. Gable, Ian Gorton, Luke J. Gosink, Elizabeth H. Keating, Carina S. Lansing, Joerg Meyer, William A.M. Moeglein, George S.H. Pau, Ellen A. Porter, Sumit Purohit, Mark L. Rockhold, Arie Shoshani, Chandrika Sivaramakrishnan, "Akuna-Integrated Toolsets Supporting Advanced Subsurface Flow and Transport Simulations for Environmental Management", XIX International Conference on Computational Methods in Water Resources (CMWR 2012), University of Illinois at Urbana-Champaign, June 17-22, 2012, 2012,

E. Pourabbas, A. Shoshani, K. Wu, "Minimizing index size by reordering rows and columns", SSDBM, Springer Berlin/Heidelberg, January 2012, 467--484,

2011

Jerry Chou, Kesheng Wu, Oliver Rübel, Mark Howison, Ji Qiang, Prabhat, Brian Austin, E. Wes Bethel, Rob D. Ryne, and Arie Shoshani, "Parallel Index and Query for Large Scale Data Analysis", In Proceedings of Supercomputing 2011, Seattle, WA, USA, 2011, 1-11, LBNL 5317E, doi: 10.1145/2063384.2063424

J. Gu, D. Katramatos, X. Liu, V. Natarajan, A. Shoshani, A. Sim, D. Yu, S. Bradley, S. McKee, "StorNet: Integrated Dynamic Storage and Network Resource Provisioning and Management for Automated Data Transfers", Journal of Physics: Conf. Ser., 2011, 331, doi: 10.1088/1742- 6596/331/1/012002

A. Shoshani, I. Altintas, J. Chen, G. Chin, A. Choudhary, D. Crawl, T. Critchlow, K. Gao, B. Grimm, H. Iyer, C. Kamath, A. Khan, S. Klasky, S. Koehler, S. Lang, R. Latham, J. W. Li, W. Liao, J. Ligon, Q. Liu, B. Ludaescher, P. Mouallem, M. Nagappan, N. Podhorszki, R. Ross, D. Rotem, N. Samatova, C. Silva, A. Sim, R. Tchoua, R. Thakur, M. Vouk, K. Wu, W. Yu, "The Scientific Data Management Center: Available Technologies and Highlights", SciDAC Conference, 2011,

Junmin Gu, Dimitrios Katramatos, Xin Liu, Vijaya Natarajan, Arie Shoshani, Alex Sim, Dantong Yu, Scott Bradley, Shawn McKee, "StorNet: Co-Scheduling of End-to-End Bandwidth Reservation on Storage and Network Systems for High Performance Data Transfers", IEEE INFOCOM HSN 2011, 2011,

Kesheng Wu, Rishi R Sinha, Chad Jones, Ethier, Scott Klasky, Kwan-Liu Ma, Shoshani, Marianne Winslett, "Finding regions of interest on toroidal meshes", Computational Science \& Discovery, 2011, 4:015003, doi: 10.1088/1749-4699/4/1/015003

Kesheng Wu, Surendra Byna, Doron Rotem, Arie, "Scientific Data Services -- A High-Performance I/O with Array Semantics", HPCDB, IEEE, 2011, doi: 10.11v45/2125636.2125640

J. Chou, K. Wu, O. R\ ubel, M. Howison, Qiang, Prabhat, B. Austin, E. W. Bethel, D. Ryne, A. Shoshani, "Parallel Index and Query for Large Scale Data", SC11, 2011, doi: 10.1145/2063384.2063424

Jinoh Kim, Hasan Abbasi, Luis Chac\ on, Docan, Scott Klasky, Qing Liu, Norbert, Arie Shoshani, Kesheng Wu, "Parallel In Situ Indexing for Data-intensive", LDAV, 2011, 65--72, doi: 10.1109/LDAV.2011.6092319

Dean N. Williams, Ian T. Foster, Don E. Middleton, Rachana Ananthakrishnan, Neill Miller, Mehmet Balman, Junmin Gu, Vijaya Natarajan, Arie Shoshani, Alex Sim, Gavin Bell, Robert Drach, Michael Ganzberger, Jim Ahrens, Phil Jones, Daniel Crichton, Luca Cinquini, David Brown, Danielle Harper, Nathan Hook, Eric Nienhouse, Gary Strand, Hannah Wilcox, Nathan Wilhelmi, Stephan Zednik, Steve Hankin, Roland Schweitzer, John Harney, Ross Miller, Galen Shipman, Feiyi Wang, Peter Fox, Patrick West, Stephan Zednik, Ann Chervenak, Craig Ward, "Earth System Grid Center for Enabling Technologies (ESG-CET): A Data Infrastructure for Data-Intensive Climate Research", SciDAC Conference, 2011,

2010

Alex Sim, Mehmet Balman, Dean N. Williams, Arie Shoshani, Vijaya Natarajan, "Adaptive Transfer Adjustment in Efficient Bulk Data Transfer Management for Climate Datasets", The 22nd IASTED International Conference on Parallel and Distributed Computing and System, Marina Del Rey, CA, November 20, 2010, LBNL 3985E,

Many scientific applications and experiments, such as high energy and nuclear physics, astrophysics, climate observation and modeling, combustion, nano-scale material sciences, and computational biology, generate extreme volumes of data with a large number of files. These data sources are distributed among national and international data repositories, and are shared by large numbers of geographically distributed scientists. A large portion of the data is frequently accessed, and a large volume of data is moved from one place to another for analysis and storage. A challenging issue in such efforts is the limited network capacity for moving large datasets. A tool that addresses this challenge is the Bulk Data Mover (BDM), a data transfer management tool used in the Earth System Grid (ESG) community. It has been managing massive dataset transfers efficiently in the environment where the network bandwidth is limited. Adaptive transfer adjustment was studied to enhance the BDM to handle significant end-to-end performance changes in the dynamic network environments as well as to control the data transfers for the desired transfer performance. We describe the results from our hands-on data transfer management experience in the climate research community. We study a practical transfer estimation model and state our initial results from the adaptive transfer adjustment methodology. 

Mehmet Balman, Evangelos Chaniotakis, Arie Shoshani, Alex Sim, "A Flexible Reservation Algorithm for Advance Network Provisioning", ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, November 2010 (SC'10)., New Orleans, LA, IEEE Computer Society Washington, DC, USA ISBN: 978-1-4244-7559-, November 14, 2010, LBNL 4017E, doi: http://dx.doi.org/10.1109/SC.2010.4

Many scientific applications need support from a communication infrastructure that provides predictable performance, which requires effective algorithms for bandwidth reservations. Network reservation sys- tems such as ESnet’s OSCARS, establish guaranteed bandwidth of secure virtual circuits for a certain bandwidth and length of time. However, users currently cannot inquire about bandwidth availability, nor have alternative suggestions when reservation requests fail. In general, the number of reservation options is exponential with the number of nodes n, and current reservation commitments. We present a novel approach for path finding in time-dependent networks taking advantage of user-provided parameters of total volume and time constraints, which produces options for earliest completion and shortest duration. The theoretical complexity is only O(n2r2) in the worst-case, where r is the number of reservations in the desired time interval. We have implemented our algorithm and developed efficient methodologies for incorporation into network reservation frameworks. Performance measurements confirm the theoretical predictions. 

M. Balman, E. Chaniotakis, A. Shoshani, A. Sim, "A New Approach in Advance Network Reservation and Provisioning for High-Performance Scientific Data Transfers", 2010, LBNL 4091E,

Julian Cummings, Jay Lofstead, Karsten Schwan, Alexander Sim, Arie Shoshani, Ciprian Docan, Manish Parashar, Scott Klasky, Norbert Podhorszki, Roselyne Barreto, "EFFIS: An End-to-end Framework for Fusion Integrated Simulation", 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010,

Kesheng Wu, Arie Shoshani, Kurt Stockinger, "Analyses of multi-level and multi-component compressed indexes", ACM Transactions on Database Systems, ACM, 2010, 35:1--52, doi: 10.1145/1670243.1670245

E. Pourabbas, A. Shoshani, "Improving Estimation Accuracy of Aggregate Queries on Data Cubes", Data & Knowledge Engineering 69 (2010), January 1, 2010, 69:50-72,

A. Sim, D. Gunter, V. Natarajan, A. Shoshani, D. Williams, J. Long, J. Hick, J. Lee, E. Dart, "Efficient Bulk Data Replication for the Earth System Grid", Data Driven E-science: Use Cases and Successful Applications of Distributed Computing Infrastructures (ISGC 2010), (Springer-Verlag New York Inc: 2010) Pages: 435

2009

Scientific Data Management: Challenges, Technology, and Deployment, edited by Arie Shoshani and Doron Rotem, (Chapman & Hall/CRC Computational Science: December 2009)

A. Sim, A. Shoshani, F. Donno, J. Jensen, Storage Resource Manager Interface Specification V2.2 Implementations Experience Report, Open Grid Forum, GFD.154, 2009,

D. N. Williams, R. Ananthakrishnan, D. E. Bernholdt, S. Bharathi, D. Brown, M. Chen, A. L. Chervenak, L. Cinquini, R. Drach, I. T. Foster, P. Fox, D. Fraser, J. Garcia, S. Hankin, P. Jones, D. E. Middleton, J. Schwidder, R. Schweitzer, R. Schuler, A. Shoshani, F. Siebenlist, A. Sim, W. G. Strand, M. Su, N. Wilhelmi, "The Earth System Grid: Enabling Access to Multimodel Climate Simulation Data", American Meteorological Society, 2009, 90(2):195-205,

M. Riedel, E. Laure, Th. Soddemann, L. Field, J. P. Navarro, J. Casey, M. Litmaath, J. Ph. Baud, B. Koblitz, C. Catlett, D. Skow, C. Zheng, P. M. Papadopoulos, M. Katz, N. Sharma, O. Smirnova, B. Kónya, P. Arzberger, F. Würthwein, A. S. Rana, T. Martin, M. Wan, V. Welch, T. Rimovsky, S. Newhouse, A. Vanni, Y. Tanaka, Y. Tanimura, T. Ikegami, D. Abramson, C. Enticott, G. Jenkins, R. Pordes, N. Sharma, S. Timm, N. Sharma, G. Moont, M. Aggarwal, D. Colling, O. van der Aa, A. Sim, V. Natarajan, A. Shoshani, J. Gu, S. Chen, G. Galang, R. Zappi, L. Magnoni, V. Ciaschini, M. Pace, V. Venturi, M. Marzolla, P. Andreetto, B. Cowles, S. Wang, Y. Saeki, H. Sato, S. Matsuoka, P. Uthayopas, S. Sriprayoonsakul, O. Koeroo, M. Viljoen, L. Pearlman, S. Pickles, David Wallom, G. Moloney, J. Lauret, J. Marsteller, P. Sheldon, S. Pathak, S. De Witt, J. Mencák, J. Jensen, M. Hodges, D. Ross, S. Phatanapherom, G. Netzer, A. R. Gregersen, M. Jones, S. Chen, P. Kacsuk, A. Streit, D. Mallmann, F. Wolf, T. Lippert, Th. Delaitre, E. Huedo, N. Geddes, "Interoperation of world-wide production e-Science infrastructures", Concurrency and Computation: Practice and Experience, 2009, 21(8):961-990,

Arie Shoshani, Flavia Donno, Junmin Gu, Jason Hick, Maarten Litmaath, Alex Sim, "Dynamic Storage Management", Scientific Data Management: Challenges, Technology, and Deployment, edited by Arie Shoshani, Doron Rotem, (Chapman & Hall/CRC Computational Science: 2009)

K Wu et al., "FastBit: Interactively Searching Massive Data", SciDAC 2009, 2009, LBNL 2164E, doi: 10.1088/1742-6596/180/1/012053

2008

P. Jakl, J. Lauret, A. Hanushevsky, A. Shoshani, A. Sim, J. Gu, "Grid data access on widely distributed worker nodes using scalla and SRM", Journal of Physics: Conf. Ser., 2008, 119, doi: 10.1088/1742-6596/119/7/072019

D. N. Williams, R. Ananthakrishnan, D. E. Bernholdt, S. Bharathi, D. Brown, M. Chen, A. L. Chervenak, L. Cinquini, R. Drach, I. T. Foster, P. Fox, S. Hankin, V. E. Henson, P. Jones, D. E. Middleton, J. Schwidder, R. Schweitzer, R. Schuler, A Shoshani, F. Siebenlist, A. Sim, W. G. Strand, N. Wilhelmi, M. Su, "Data Management and Analysis for the Earth System Grid", SciDAC Conference, 2008,

Alex Sim, Arie Shoshani (Editors), Paolo Badino, Olof Barring, Jean‐Philippe Baud, Ezio Corso, Shaun De Witt, Flavia Donno, Junmin Gu, Michael Haddox‐Schatz, Bryan Hess, Jens Jensen, Andy Kowalski, Maarten Litmaath, Luca Magnoni, Timur Perelmutov, Don Petravick, Chip Watson, The Storage Resource Manager Interface Specification Version 2.2, Open Grid Forum, Document in Full Recommendation, GFD.129, 2008,

C S Chang, S Klasky, J Cummings, R. Samtaney, A Shoshani, L Sugiyama, D Keyes, S Ku, G Park, S Parker, N Podhorszki, H. Strauss, H Abbasi, M Adams, R Barreto, G Bateman, K Bennett, Y Chen, E D’Azevedo, C Docan, S Ethier, E Feibush, L Greengard, T Hahm, F Hinton, C Jin, A. Khan, A Kritz, P Krsti, T Lao, W Lee, Z Lin, J Lofstead, P Mouallem, M Nagappan, A Pankin, M Parashar, M Pindzola, C Reinhold, D Schultz, K Schwan, D. Silver, A Sim, D Stotler, M Vouk, M Wolf, H Weitzner, P Worley, Y Xiao, E Yoon, D Zorin, "Toward a first- principles integrated simulation of tokamak edge plasmas", Journal of Physics: Conf. Ser., 2008, 125, doi: 10.1088/1742-6596/125/1/012042

R Ananthakrishnan, D E Bernholdt, S Bharathi, D Brown, M Chen, A L Chervenak, L Cinquini, R Drach, I T Foster, P Fox, D Fraser, K Halliday, S Hankin, P Jones, C Kesselman, D E Middleton, J Schwidder, R Schweitzer, R Schuler, A Shoshani, F Siebenlist, A Sim, W G Strand, N Wilhelmi, M Su, D N Williams, "Building a global federation system for climate change research: the earth system grid center for enabling technologies (ESG-CET)", Journal of Physics: Conf. Ser., 2008, 78, doi: 10.1088/1742-6596/78/1/012050

Kurt Stockinger, John Cieslewicz, Kesheng Wu, Rotem, Arie Shoshani, "Using Bitmap Indexing Technology for Combined and Text Queries", Annals of Information Systems, (Springer: 2008) Pages: 1--23

Rishi Rakesh Sinha, Marianne Winslett, Kesheng, Kurt Stockinger, Arie Shoshani, "Adaptive Bitmap Indexes for Space-Constrained", ICDE 2008, 2008, 1418--1420,

Kesheng Wu, Kurt Stockinger, Arie Shosani, "Breaking the Curse of Cardinality on Bitmap Indexes", SSDBM 08, Springer, 2008, 348--365, doi: 10.1007/978-3-540-69497-7_23

Meiyappan Nagappan, Mladen A. Vouk, Kesheng Wu Alex Sim, Arie Shoshani, "Efficient Operational Profiling of Systems Using Arrays on Execution Logs", ISSRE, 2008, 313--314, doi: 10.1109/ISSRE.2008.45

2007

L. Abadie, P. Badino, J. Baud, E. Corso, M. Crawford, S. De Witt, F. Donno, A. Forti, P. Fuhrmann,
G. Grosdidier, J. Gu , J. Jensen, S. Lemaitre, M. Litmaath, D. Litvinsev, G. Lo Presti, L. Magnoni, T. Mkrtchan, A. Moibenko, V. Natarajan, G. Oleynik, T. Perelmutov, D. Petravick, A. Shoshani, A. Sim, M. Sponza, R. Zappi,
"Storage Resource Managers: Recent International Experience on Requirements and Multiple Co-Operating Implementations", the 24th IEEE Conference on Mass Storage Systems and Technologies, 2007,

F. Donno, L. Abadie, P. Badino, J. Baud, E. Corso, M. Crawford, S. De Witt, A. Forti, P. Fuhrmann, G. Grosdidier, J. Gu , J. Jensen, S. Lemaitre, M. Litmaath, D. Litvinsev, G. Lo Presti, L. Magnoni, T. Mkrtchan, A. Moibenko, V. Natarajan, G. Oleynik, T. Perelmutov, D. Petravick, A. Shoshani, A. Sim, M. Sponza, R. Zappi, "Storage Resource Manager version 2.2: design, implementation, and testing experience", Journal of Physics: Conf. Ser., 2007, 119, doi: 10.1088/1742-6596/119/6/062028

Elaheh Pourabbas, Arie Shoshani, "Efficient Estimation of Joint Queries from Multiple OLAP Databases", ACM Transactions on Database Systems (TODS), March 1, 2007, Volume 3,

Kesheng Wu, Kurt Stockinger, Arie Shoshani, Performance of Multi-Level and Multi-Component Bitmap Indexes, 2007, doi: 10.1145/1670243.1670245

Frederick Reiss, Kurt Stockinger, Kesheng Wu, Shoshani, Joseph M. Hellerstein, "Enabling Real-Time Querying of Live and Historical Data", SSDBM 2007, 2007,

2006

Elaheh Pourabbas, Arie Shoshani, "The Composite OLAP-Object Data Model: Removing an Unnecessary Barrier", International Conference on Scientific and Statistical Database Management (SSDBM) 2006, July 3, 2006, 291-300,

A. Shoshani, A. Sim, K. Stockinger, "RRS: Replica Registration Service for Data Grids", Lecture Notes in Computer Science, edited by Jean-Marc Pierson, (Springer-Verlag GmbH Publisher: 2006) Pages: 100-112

D. E. Middleton, D. E. Bernholdt, D. Brown, M. Chen, A. L. Chervenak, L. Cinquini, R. Drach, P. Fox, P. Jones, C. Kesselman, I. T. Foster, V. Nefedova, A. Shoshani, A. Sim, W. G. Strand, D. Williams, "Enabling worldwide access to climate simulation data: the earth system grid (ESG)", SciDAV Conference, 2006,

P. Jakl, J. Lauret, A. Hanushevky, A. Shoshani, A. Sim, "From rootd to Xrootd, from physical to logical files: experience on accessing and managing distributed data", Computing in High Energy Physics (CHEP), 2006,

E. Hjort, L. Hajdu, J. Lauret, D. Olson, A. Sim, A. Shoshani, "Data and Computational Grid Coupling in RHIC/STAR – An Analysis Scenario using SRM Technology", Computing in High Energy Physics (CHEP), 2006,

Kesheng Wu, Ekow Otoo, Arie Shoshani, "Optimizing bitmap indices with efficient compression", ACM Transactions on Database Systems, 2006, 31:1--38, doi: 10.1145/1132863.1132864

K. Wu, K. Stockinger, A. Shoshani, Wes, "FastBit--Helps Finding the Proverbial Needle in a", 2006, LBNL LBNL-PUB/963,

F. Reiss, K. Stockinger, K. Wu, A. Shoshani J. M. Hellerstein, "Efficient analysis of live and historical streaming and its application to cybersecurity", 2006,

2005

D. Bernholdt, S. Bharathi, D. Brown, K. Chanchio, M. Chen, A. Chervenak, L. Cinquini, B. Zrach, I. Foster, P. Fox, J. Garcia, C. Kesselman, R. Markel, D. Middleton, V. Nefedova, L. Pouchard, A. Shoshani, A. Sim, G. Strand, D. Williams, "The Earth System Grid: Supporting the Next Generation of Climate Modeling Research", IEEE, 2005, 93(3):485-495,

A. Shoshani, A. Sim, K. Stockinger, "RRS: Replica Registration Service for Data Grids", International Workshop on Data Management in Grids, 2005,

Arie Shoshani, Alex Sim, Kurt Stockinger, "Replica Registration Service Functional Interface Specification 1.0", 2005, LBNL 57520,

Kesheng Wu, Junmin Gu, Jerome Lauret, Arthur Poskanzer, Arie Shoshani, Alexander Sim, Zhang, "Grid Collector: Facilitating Efficient Selective from Data Grids", International Supercomputer Conference 2005, 2005,

2004

K. Wu, W. Zhang, A. Sim, J. Gu, A. Shoshani, "Grid Collector: an Event Catalog with Automated File Management", 2004, LBNL 55563,

Eric Hjort, Doug Olson, Jerome Lauret, Arie Shoshani, Alex Sim, "Production mode Data- Replication framework in STAR using the HRM Grid middleware", Computing in High Energy Physics, 2004,

Alex Sim, Junmin Gu, Arie Shoshani, Vijaya Natarajan, "DataMover: Robust Terabytes-Scale Multi-file Replication over Wide-Area Networks", the 16th International Conference on Scientific and Statistical Database Management (SSDBM 2004), 2004,

Kesheng Wu, Wei-Ming Zhang, Victor, Jerome Lauret, Arie Shoshani, "The Grid Collector: Using an Event Catalog to Speed up Analysis in Distributed Environment", Proceedings of Computing in High Energy and Nuclear (CHEP) 2004, 2004,

K. Wu, A. Shoshani, E. J. Otoo, Word aligned bitmap compression method, data and apparatus, US Patent 6,831,575, 2004,

2003

Elaheh Pourabbas, Arie Shoshani, "Answering Joint Queries from Multiple Aggregate OLAP Databases", Data Warehousing and Knowledge Discovery, 5th International Conference, DaWaK 2003, September 3, 2003, 24-34,

Arie Shoshani, Alexander Sim, Junmin Gu, "Storage Resource Managers: Essential Components for the Grid", Grid Resource Management: State of the Art and Future Trends, edited by Jarek Nabrzyski, Jennifer M. Schopf, Jan Weglarz, (Kluwer Academic Publishers: 2003)

Ann L. Chervenak, Ewa Deelman, Carl Kesselman, William E. Allcock, Ian T. Foster, Veronika Nefedova, Jason Lee, Alex Sim, Arie Shoshani, Bob Drach, Dean Williams, Don Middleton, "High-performance remote access to climate simulation data: a challenge problem for data grid technologies", Parallel Computing, 2003, 29(10):1335-1356,

A. Sim, J. Gu, A. Shoshani, E. Hjort, D. Olson, "Experience with Deploying Storage Resource Managers to Achieve Robust File Replication", Computing in High Energy Physics, 2003,

D. Yu, J. Lauret, A. Shoshani, D. Oldon, E. Hjort, A. Sim, "The Design of High Performance Data Replication in the Grid Environment for the STAR Collaboration", Computing in High Energy Physics, 2003,

L. Pouchard, L. Cinquini, B. Drach, D. Middleton, D. Bernholdt, K. Chanchio, I. Foster, V. Nefedova, D. Brown, P. Fox, J. Garcia, G. Strand, D. Williams, A. Chervenak, C. Kesselman, A. Shoshani, A. Sim, "An Ontology for Scientific Information in a Grid Environment: the Earth System Grid", the Symposium on Cluster Computing and the Grid (CCGrid), 2003,

Arie Shoshani, Alex Sim, Junmin Gu, Storage Resource Managers: Essential Components for Grid Applications, Globus World, 2003,

Kesheng Wu, Wei-Ming Zhang, Alexander Sim, Gu, Arie Shoshani, "Grid Collector: An Event Catalog With Automated File", Proceedings of IEEE Nuclear Science Symposium 2003, 2003, doi: 10.1109/NSSMIC.2003.1351830

2002

Elaheh Pourabbas, Arie Shoshani, "Joint Queries Estimation from Multiple OLAP Databases", International Conference on Scientific and Statistical Database Management, 2002 (SSDBM’02), July 24, 2002,

A. Shoshani, A. Sim, J. Gu, "Storage Resource Managers: Middleware components for Grid Storage", the 19th IEEE Symposium on Mass Storage Systems, 2002,

Kesheng Wu, Ekow Otoo, Arie Shoshani, "An Efficient Compression Scheme For Bitmap Indices", 2002,

2001

B. Allcock, I. Foster, V. Nefedova, A. Chervenak, E. Deelman, C. Kesselman, J. Lee, A. Sim, A. Shoshani, B. Drach, D. Williams, "High-Performance Remote Access to Climate Simulation Data: A Challenge Problem for Data Grid Technologies", Super Computing 2001, 2001,

L. Bernardo, B. Gibbard, D. Malon, H. Nordberg, D. Olson, R. Porter, A. Shoshani, A. Sim, A. Vaniachine, T. Wenaus, K. Wu, D. Zimmerman, "New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC", Journal of Computer Physics Communications, 2001,

A. Sim, H. Nordberg, L.M. Bernardo, A. Shoshani, D. Rotem, "Experience with using CORBA to implement a file caching coordination system", Concurrency and Computation: Practice and Experience, 2001, 13:1-15,

D. Olson, E. Hjort, J. Lauret, M. Messer, A. Shoshani, A. Sim, "Non-shared Disk Cluster - A Fault Tolerant, Commodity Approach to Hi-Bandwidth Data Analysis", Computing in High Energy Physics, 2001,

2000

A. Shoshani, A. Sim, L.M. Bernerdo, H. Nordberg, "Coordinating Simultaneous Caching of File Bundles from Tertiary Storage", International Conference on Scientific and Statistical Database Management (SSDBM), 2000,

L. M. Bernardo, B. Gibbard, D. Malon, H. Nordberg, D. Olson, R. Porter, A. Shoshani, A. Sim, A. Vaniachine, T. Wenaus, K. Wu, D. Zimmerman, "New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC", Computing in High Energy Physics, 2000,

L. M. Bernardo, A. Shoshani, A. Sim, H. Nordberg, "Access Coordination Of Tertiary Storage For High Energy Physics Applications", the 17th IEEE Symposium on Mass Storage Systems, 2000,

A. Sim, A. Shoshani, HRM: Hierarchical Resource Manager, Globus World, 2000,

A. Sim, A. Shoshani, L. M. Bernardo, H. Nordberg, A Storage Access Coordination System for Perabyte Scale Scientific Data, IONA World, 2000,

1999

A. Sim, H. Nordberg, L. M. Bernardo, A. Shoshani, D. Rotem, "Storage Access Coordination Using CORBA", Distributed Objects and Application, 1999, 168-175,

A. Shoshani, L.M. Bernardo, H. Nordberg, D. Rotem and A. Sim, "Multidimensional Indexing and Query Coordination for Tertiary Storage Management", International Conference on Scientific and Statistical Database Management, 1999, 214-225,

1998

L.M. Bernardo, D. Rotem, A. Shoshani, H. Nordberg, A. Sim, "Using Access Patterns to Partition Large Datasets on Tertiary Storage in Order to Minimize Retrieval Costs", 1998, LBNL 41504,

A. Shoshani, L.M. Bernardo, H. Nordberg, D. Rotem, A. Sim, "Storage Management for High Energy Physics Applications", Computing in High Energy Physics, 1998,

Alex Sim

2018

T. Kim, J. Choi, D. Lee, A. Sim, C. A. Spurlock, A. Todd, K. Wu, "Predicting Baseline for Analysis of Electricity Pricing", International Journal of Big Data Intelligence. Special issue on Data to Decision, 2018, 5:3-20, doi: 10.1504/IJBDI.2018.10008133

2017

J. Wang, A. Sim, K. Wu, S. Hwangbo, "Accurate Signal Timing from High Frequency Streaming Data", 2017 IEEE International Conference on Big Data (Big Data 2017), 2017,

A. Lazar, L. Jin, A. Spurlock, A. Todd, K. Wu, A. Sim, "Data Quality Challenges with Missing Values and Mixed Types in Joint Sequence Analysis", Workshop in Data Quality Issues in Big Data and Machine Learning Applications: Going Beyond Data Cleaning and Transformations, in Conjunction with the 2017 IEEE International Conference on Big Data (Big Data 2017), 2017,

J. Wang, K. Wu, A. Sim, S. Hwangbo, "Feature Engineering and Classification Models for Partial Discharge Events in Power Transformers", 10th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2017), 2017,

P. Harrington, W. Yoo (Advisor), A. Sim (Advisor), K. Wu (Advisor), "Diagnosing Parallel I/O Bottlenecks in HPC Applications", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’17), ACM Student Research Competition (SRC), First place winner, 2017,

J. Wang, K. Wu, A. Sim, S. Hwangbo, "Convolutional Filtering for Accurate Signal Timing from Noisy Streaming Data", 3rd IEEE International Conference on Big Data Intelligence and Computing (DataCom2017), 2017,

K. Wu, D. Lee, A. Sim, J. Choi, "Statistical Data Reduction for Streaming Data", 2017 New York Scientific Data Summit (NYSDS), Data-Driven Discovery in Science and Industry, 2017,

Jinoh Kim, Alex Sim, "A New Approach to Online, Multivariate Network Traffic Analysis", 2nd Workshop on Network Security Analytics and Automation (NSAA), in conjunction with the 26th International Conference on Computer Communications and Networks (ICCCN 2017), 2017,

Dongeun Lee, Alex Sim, Jaesik Choi, Kesheng Wu, "Improving Statistical Similarity Based Data Reduction for Non-Stationary Data", 29th International Conference on Scientific and Statistical Database Management (SSDBM2017), 2017, doi: 10.1145/3085504.3085583

Updated experiment version: https://sdm.lbl.gov/oapapers/ssdbm17-lee-upd.pdf
Original version: http://dl.acm.org/citation.cfm?doid=3085504.3085583

Jonathan Wang, Wucherl Yoo, Alex Sim, Peter Nugent, K. John Wu, "Parallel Variable Selection for Effective Performance Prediction", the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid2017), 2017, doi: 10.1109/CCGRID.2017.47

Dongeun Lee, Alex Sim, Jaesik Choi, Kesheng Wu, "Expanding Statistical Similarity Based Data Reduction to Capture Diverse Patterns", Data Compression Conference (DCC 2017), 2017,

Ling Jin, Doris Lee, Alex Sim, Sam Borgeson, John Wu, Anna Spurlock, Annika Todd, "Comparison of Clustering Techniques for Residential Energy Behavior using Smart Meter Data", 2nd International Workshop on Artificial Intelligence for Smart Grids and Smart Buildings, In conjunction with AAAI 2017, 2017,

J. Kim, A. Sim, S.C. Suh, I. Kim, "An Approach to Online Network Monitoring Using Clustered Patterns", International Conference on Computing, Networking and Communications (ICNC 2017), 2017, doi: 10.1109/ICCNC.2017.7876207

J. Kim, W. Yoo, A. Sim, S.C. Suh, I. Kim, "A Lightweight Network Anomaly Detection Technique", International Workshop on Computing, Networking and Communications (CNC 2017), 2017, doi: 10.1109/ICCNC.2017.7876251

2016

Sam Fries, Sasha Ames, Alex Sim, Dean Williams, "HPSS Connections to ESGF: BASEJumper", 2016 Earth System Grid Federation (ESGF) Conference, 2016,

J. Wang, W. Yoo (Advisor), A. Sim (Advisor), K. Wu (Advisor), "Analysis of Variable Selection Methods on Scientific Cluster Measurement Data", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), ACM Student Research Competition (SRC), Second place winner, 2016, 2016,

M. Bae, W. Yoo (Advisor), A. Sim (Advisor), K. Wu (Advisor), "Discovering Energy Resource Usage Patterns on Scientific Clusters", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), ACM Student Research Competition (SRC), Third place winner, 2016, 2016,

M. Bryson, S. Byna (Advisor), A. Sim (Advisor), K. Wu (Advisor), "The Search for Missing Parallel IO Performance on the Cori Supercomputer", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), ACM Student Research Competition (SRC), 2016,

Lingfei Wu, Kesheng Wu, Alex Sim, Michael Churchill, Jong Choi, Andreas Stathopoulos, Choong-Seock Chang, Scott Klasky, "Towards Real-Time Detection and Tracking of Spatio-Temporal Features: Blob-Filaments in Fusion Plasma", IEEE Transactions on Big Data (TBD), 2016, 2:3:262-275, doi: 10.1109/TBDATA.2016.2599929

W. Yoo, M. Koo, Y. Cao, A. Sim, P. Nugent, K. Wu, "Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters", Conquering Big Data with High Performance Computing, edited by R. Arora, (Springer International: 2016) Pages: 139-161 doi: 10.1007/978-3-319-33742-5

W. Yoo, B. Foster, A. Sim, K. Wu, "Machine Learning Based Job Status Prediction in Scientific Clusters", IEEE SAI Computing Conference, 2016, 44-53, doi: 10.1109/SAI.2016.7555961

Dongeun Lee, Alex Sim, Jaesik Choi, Kesheng, "Novel Data Reduction Based on Statistical Similarity", International Conference on Scientific and Statistical Database Management (SSDBM'16), New York, NY, USA, ACM, 2016, 21:1--21:1, doi: 10.1145/2949689.2949708

D. Pugmire, J. Kress, H. Childs, M. Wolf, G. Eisenhauer, J. Low, R. M. Churchill, T. Kurc, K. Wu, A. Sim, J. Gu, J. Choi, S. Klasky, "Visualization and Analysis for Near-Real-Time Decision Making in Distributed Workflows", High Performance Data Analysis and Visualization Workshop (HPDAV2016) in conjunction with the 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2016), 2016, doi: 10.1109/IPDPSW.2016.175

2015

T. Kim, D. Lee, J. Choi, A. Spurlock, A. Sim, A. Todd, K. Wu, "Extracting Baseline Electricity Usage Using Gradient Tree Boosting", International Conference on Big Data Intelligence and Computing (DataCom 2015), Best Paper Award, 2015,

W. Yoo, M. Koo, Y. Cao, A. Sim, P. Nugent, K. Wu, "PATHA: Performance Analysis Tool for HPC Applications", the 34th IEEE International Performance Computing and Communications Conference (IPCCC 2015), 2015,

S. Fries, A. Sim, "HPSS connections to ESGF", Earth System Grid Federation Conference, (ESGF 2015), 2015,

M. Koo, W. Yoo (advisor), A. Sim (advisor), "I/O Performance Analysis Framework on Measurement Data from Scientific Clusters", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15), ACM Student Research Competition (SRC), 2015, 2015,

J. Kim, A. Sim, "Peeking Network States with Clustered Patterns", 2015, LBNL 1003744,

K. Hu, J. Choi, A. Sim, J. Jiang, "Best Predictive Generalized Linear Mixed Model with Predictive Lasso for High-Speed Network Data Analysis", International Journal of Statistics and Probability, 2015,

S. Shannigrahi, A. J. Barczyk, C. Papadopoulos, A. Sim, I. Monga, H. Newman, K. Wu, E. Yeh, "Named Data Networking in Climate Research and HEP Applications", 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015), 2015,

W. Yoo, A. Sim, "Network Bandwidth Utilization Forecast Model on High Bandwidth Networks", IEEE International Conference on Computing, Networking and Communications (ICNC’15), 2015,

S. Shannigrahi, A. Barczuk, C. Papadopoulos, A. Sim, I. Monga, H. Newman, K. Wu, E., Named Data Networking in Climate Research and HEP, 21st International Conference on Computing in High and Nuclear Physics (CHEP2015), Okinawa Japan, 2015,

L. Wu, K. Wu, A. Sim, M. Churchill, J. Y. Choi, A. Stathopoulos, C.S. Chang, S. Klasky, "Towards Real-Time Detection and Tracking of Blob-Filaments in Fusion Plasma Big Data", WM-CS-2015-01, Department of Computer Science, College of William and Mary, 2015,

David H. Bailey, Stephanie Ger, Marcos Lopez de, Alexander Sim, Kesheng Wu, "Statistical Overfitting and Backtest Performance", Quantitative Finance, 2015,

http://ssrn.com/abstract=2507040

2014

W. Yoo, A. Sim, "Efficient Changing Pattern Detection on High Bandwidth Network Measurements", 7th International Conference on Grid and Distributed Computing, 2014,

J. Choi, A. Sim, Data reduction methods, systems, and devices, U.S. Patent Pending serial no. 14/555,365, 2014,

U.S. Patent pending serial no. 14/555,365, “DATA REDUCTION METHODS, SYSTEMS, AND DEVICES”, filed on 11/26/2014. Provisional application no. 61/909,518. “An Efficient Data Reduction Method with Locally Exchangeable Measures”, J. Choi and A. Sim, filed on 11/27/2013, LBNL IB2013-133.

L. Wu, K. Wu, A. Sim, M. Churchill, J. Y. Choi, A. Stathopoulos, CS Chang, S. Klasky, "High-Performance Outlier Detection Algorithm for Finding Blob-Filaments in Plasma", 5th International Workshop on Big Data Analytics: Challenges, and Opportunities (BDAC’14), 2014,

A. L. Chervenak, A. Sim, J. Gu, R. Schuler, N. Hirpathak, "Adaptation and Policy-Based Resource Allocation for Efficient Bulk Data Transfers in High Performance Computing Environments", 4th International Workshop on Network-aware Data Management (NDM'14), 2014,

L. Wu, K. Wu, A. Sim, A. Stathopoulos, "Real-Time Outlier Detection Algorithm for Finding Blob-Filaments in Plasma", Super Computing 2014, ACM SRC, 2014,

John Wu, Alex Sim, Lingfei Wu, Abraham Frankl, Scott Klasky, Jong Y Choi, CS Chang, Michael Churchill, "Exercising ICEE Framework with Fusion Blob Detection", DOE/ASCR NGNS PI meeting, 2014,

US Patent 8,705,342 B2. “Co-scheduling of network resource provisioning and host-to-host bandwidth reservation on high-performance network and storage systems”, D. Yu, D. Katramatos, A. Sim, and A. Shoshani, Apr. 22, 2014, prior publication No. US 2012/0268053 A1 issued on Oct. 25, 2012, provisional application No. 61/393,750, filed on Oct. 15, 2010, LBNL IB-3152, BNL BSA 11-02.

A. L. Chervenak, A. Sim, J. Gu, R. Schuler, N. Hirpathak, "Efficient Data Staging Using Performance-Based Adaptation and Policy-Based Resource Allocation", 22nd Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2014,

David H. Bailey, Stephanie Ger, Marcos L\ opez Prado, Alexander Sim, Kesheng Wu, "Statistical Overfitting and Backtest Performance", http://ssrn.com/abstract2507040, ( January 1, 2014)

ISBN 978-1-78548-008-9

2013

Jong Y. Choi, Kesheng Wu, Jacky C. Wu, Alex Sim, Qing G. Liu, Matthew Wolf, CS Chang, Scott Klasky, "ICEE: Wide-area In Transit Data Processing Framework For Near Real-Time Scientific Applications", The 4th International Workshop on Big Data Analytics: Challenges and Opportunities (BDAC-13), 2013,

J. Choi, K. Hu, A. Sim, "Relational Dynamic Bayesian Networks with Locally Exchangeable Measures", 2013, LBNL 6341E,

K. Hu, J. Choi, J. Jiang, A. Sim, "Best Predictive GLMM using LASSO with Application on High- Speed Network", 2013, LBNL 6327E,

K. Hu, A. Sim, D. Antoniades, C. Dovrolis, "Estimating and Forecasting Network Traffic Performance based on Statistical Patterns Observed in SNMP data", the 9th International Conference on Machine Learning and Data Mining (MLDM2013), 2013,

D. Antoniades, K. Hu, A. Sim, C. Dovrolis, "What SNMP data can tell us about Edge-to-Edge network performance", Passive and Active Measurement Conference (PAM2013), 2013,

K. Hu, A. Sim, D. Antoniades, C. Dovrolis, Statistical Prediction Models for Network Traffic Performance, the APAN 35 conference and the Winter 2013 ESCC/Internet2 Joint Techs meeting (TIP2013), 2013,

2012

Junmin Gu, David Smith, Ann L. Chervenak, Alex Sim, "Adaptive Data Transfers that Utilize Policies for Resource Sharing", The 2nd International Workshop on Network-Aware Data Management Workshop (NDM2012), 2012,

Mehmet Balman, Eric Pouyoul, Yushu Yao, E. Wes Bethel, Burlen Loring, Prabhat, John Shalf, Alex Sim, and Brian L. Tierney, "Experiences with 100G Network Applications", In Proceedings of the Fifth international Workshop on Data-intensive Distributed Computing, in conjunction with ACM High Performance Distributing Computing (HPDC) Conference, 2012, Delft, Netherlands, June 2012, LBNL 5603E, doi: 10.1145/2286996.2287004

100Gbps networking has finally arrived, and many research and educational in- stitutions have begun to deploy 100Gbps routers and services. ESnet and Internet2 worked together to make 100Gbps networks available to researchers at the Super- computing 2011 conference in Seattle Washington. In this paper, we describe two of the first applications to take advantage of this network. We demonstrate a visu- alization application that enables remotely located scientists to gain insights from large datasets. We also demonstrate climate data movement and analysis over the 100Gbps network. We describe a number of application design issues and host tuning strategies necessary for enabling applications to scale to 100Gbps rates. 

M. Balman, A. Sim, "Scaling the Earth System Grid to 100Gbps Networks", 2012, LBNL 5794E,

D. Yu, D. Katramatos, A. Shoshani, A. Sim, J. Gu, V. Natarajan, "StorNet: Integrating Storage Resource Management with Dynamic Network Provisioning for Automated Data Transfer", International Committee for Future Accelerators (ICFA) Standing Committee on Inter-Regional Connectivity (SCIC) 2012 Report: Networking for High Energy Physics, 2012,

Benson Ma, Arie Shoshani, Alex Sim, Kesheng, Yong-Ik Byun, Jaegyoon Hahm, Min-Su Shin, "Efficient Attribute-Based Data Access in Astronomy", The 2nd International Workshop on Network-Aware Data Workshop (NDM2012), 2012, 562--571,

2011

J. Gu, D. Katramatos, X. Liu, V. Natarajan, A. Shoshani, A. Sim, D. Yu, S. Bradley, S. McKee, "StorNet: Integrated Dynamic Storage and Network Resource Provisioning and Management for Automated Data Transfers", Journal of Physics: Conf. Ser., 2011, 331, doi: 10.1088/1742- 6596/331/1/012002

G. Garzoglio, J. Bester, K. Chadwick, D. Dykstra, D. Groep, J. Gu, T. Hesselroth, O. Koeroo, T. Levshina, S. Martin, M. Salle, N. Sharma, A. Sim, S. Timm, A. Verstegen, "Adoption of a SAML-XACML Profile for Authorization Interoperability across Grid Middleware in OSG and EGEE", Journal of Physics: Conf. Ser., 2011, 331, doi: 10.1088/1742-6596/331/6/062011

A. Shoshani, I. Altintas, J. Chen, G. Chin, A. Choudhary, D. Crawl, T. Critchlow, K. Gao, B. Grimm, H. Iyer, C. Kamath, A. Khan, S. Klasky, S. Koehler, S. Lang, R. Latham, J. W. Li, W. Liao, J. Ligon, Q. Liu, B. Ludaescher, P. Mouallem, M. Nagappan, N. Podhorszki, R. Ross, D. Rotem, N. Samatova, C. Silva, A. Sim, R. Tchoua, R. Thakur, M. Vouk, K. Wu, W. Yu, "The Scientific Data Management Center: Available Technologies and Highlights", SciDAC Conference, 2011,

Junmin Gu, Dimitrios Katramatos, Xin Liu, Vijaya Natarajan, Arie Shoshani, Alex Sim, Dantong Yu, Scott Bradley, Shawn McKee, "StorNet: Co-Scheduling of End-to-End Bandwidth Reservation on Storage and Network Systems for High Performance Data Transfers", IEEE INFOCOM HSN 2011, 2011,

Dean N. Williams, Ian T. Foster, Don E. Middleton, Rachana Ananthakrishnan, Neill Miller, Mehmet Balman, Junmin Gu, Vijaya Natarajan, Arie Shoshani, Alex Sim, Gavin Bell, Robert Drach, Michael Ganzberger, Jim Ahrens, Phil Jones, Daniel Crichton, Luca Cinquini, David Brown, Danielle Harper, Nathan Hook, Eric Nienhouse, Gary Strand, Hannah Wilcox, Nathan Wilhelmi, Stephan Zednik, Steve Hankin, Roland Schweitzer, John Harney, Ross Miller, Galen Shipman, Feiyi Wang, Peter Fox, Patrick West, Stephan Zednik, Ann Chervenak, Craig Ward, "Earth System Grid Center for Enabling Technologies (ESG-CET): A Data Infrastructure for Data-Intensive Climate Research", SciDAC Conference, 2011,

2010

D. Hasenkamp, A. Sim, M. Wehner and K. Wu, "Finding Tropical Cyclones on a Cloud Computing Cluster: Using Parallel Virtualization for Large-Scale Climate Simulation Analysis", Proceedings of the 2nd IEEE International Conference on Cloud Computing Technology and Science, Nov. 30-Dec. 3, 2010, Indianapolis, Indiana, 2010, LBNL 4218E,

 

 

Alex Sim, Mehmet Balman, Dean N. Williams, Arie Shoshani, Vijaya Natarajan, "Adaptive Transfer Adjustment in Efficient Bulk Data Transfer Management for Climate Datasets", The 22nd IASTED International Conference on Parallel and Distributed Computing and System, Marina Del Rey, CA, November 20, 2010, LBNL 3985E,

Many scientific applications and experiments, such as high energy and nuclear physics, astrophysics, climate observation and modeling, combustion, nano-scale material sciences, and computational biology, generate extreme volumes of data with a large number of files. These data sources are distributed among national and international data repositories, and are shared by large numbers of geographically distributed scientists. A large portion of the data is frequently accessed, and a large volume of data is moved from one place to another for analysis and storage. A challenging issue in such efforts is the limited network capacity for moving large datasets. A tool that addresses this challenge is the Bulk Data Mover (BDM), a data transfer management tool used in the Earth System Grid (ESG) community. It has been managing massive dataset transfers efficiently in the environment where the network bandwidth is limited. Adaptive transfer adjustment was studied to enhance the BDM to handle significant end-to-end performance changes in the dynamic network environments as well as to control the data transfers for the desired transfer performance. We describe the results from our hands-on data transfer management experience in the climate research community. We study a practical transfer estimation model and state our initial results from the adaptive transfer adjustment methodology. 

Mehmet Balman, Evangelos Chaniotakis, Arie Shoshani, Alex Sim, "A Flexible Reservation Algorithm for Advance Network Provisioning", ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, November 2010 (SC'10)., New Orleans, LA, IEEE Computer Society Washington, DC, USA ISBN: 978-1-4244-7559-, November 14, 2010, LBNL 4017E, doi: http://dx.doi.org/10.1109/SC.2010.4

Many scientific applications need support from a communication infrastructure that provides predictable performance, which requires effective algorithms for bandwidth reservations. Network reservation sys- tems such as ESnet’s OSCARS, establish guaranteed bandwidth of secure virtual circuits for a certain bandwidth and length of time. However, users currently cannot inquire about bandwidth availability, nor have alternative suggestions when reservation requests fail. In general, the number of reservation options is exponential with the number of nodes n, and current reservation commitments. We present a novel approach for path finding in time-dependent networks taking advantage of user-provided parameters of total volume and time constraints, which produces options for earliest completion and shortest duration. The theoretical complexity is only O(n2r2) in the worst-case, where r is the number of reservations in the desired time interval. We have implemented our algorithm and developed efficient methodologies for incorporation into network reservation frameworks. Performance measurements confirm the theoretical predictions. 

D. Hasenkamp, A. Sim, M. Wehner, K. Wu, "Finding Tropical Cyclones on Clouds", Supercomputing 2010, ACM SRC 3rd place, 2010,

M. Balman, E. Chaniotakis, A. Shoshani, A. Sim, "A New Approach in Advance Network Reservation and Provisioning for High-Performance Scientific Data Transfers", 2010, LBNL 4091E,

Julian Cummings, Jay Lofstead, Karsten Schwan, Alexander Sim, Arie Shoshani, Ciprian Docan, Manish Parashar, Scott Klasky, Norbert Podhorszki, Roselyne Barreto, "EFFIS: An End-to-end Framework for Fusion Integrated Simulation", 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010,

G. Attebury, A. Baranovski, K. Bloom, B. Bockelman, D. Kcira, J. Letts, T. Levshina, C. Lundestedt, T. Martin, W. Maier, H. Pi, A. Rana, I. Sfiligoi, A. Sim, M. Thomas, F. Wuerthwein, "Roadmap for Applying Hadoop Distributed File System in Scientific Grid Computing", International Symposium on Grid Computing (ISGC), 2010,

A. Sim, D. Gunter, V. Natarajan, A. Shoshani, D. Williams, J. Long, J. Hick, J. Lee, E. Dart, "Efficient Bulk Data Replication for the Earth System Grid", Data Driven E-science: Use Cases and Successful Applications of Distributed Computing Infrastructures (ISGC 2010), (Springer-Verlag New York Inc: 2010) Pages: 435

Raj Kettimuthu, Alex Sim, Dan Gunter, Bill Allcock, Peer T. Bremer, John Bresnahan, Andrew Cherry, Lisa Childers, Eli Dart, Ian Foster, Kevin Harms, Jason Hick, Jason Lee, Michael Link, Jeff Long, Keith Miller, Vijaya Natarajan, Valerio Pascucci, Ken Raffenetti, David Ressman, Dean Williams, Loren Wilson, Linda Winkler, "Lessons learned from moving earth system grid data sets over a 20 Gbps wide-area network", HPDC 10, New York, NY, USA, ACM, 2010, 316--319, doi: 10.1145/1851476.1851519

2009

A. Sim, A. Shoshani, F. Donno, J. Jensen, Storage Resource Manager Interface Specification V2.2 Implementations Experience Report, Open Grid Forum, GFD.154, 2009,

D. N. Williams, R. Ananthakrishnan, D. E. Bernholdt, S. Bharathi, D. Brown, M. Chen, A. L. Chervenak, L. Cinquini, R. Drach, I. T. Foster, P. Fox, D. Fraser, J. Garcia, S. Hankin, P. Jones, D. E. Middleton, J. Schwidder, R. Schweitzer, R. Schuler, A. Shoshani, F. Siebenlist, A. Sim, W. G. Strand, M. Su, N. Wilhelmi, "The Earth System Grid: Enabling Access to Multimodel Climate Simulation Data", American Meteorological Society, 2009, 90(2):195-205,

G. Attebury, A. Baranovski, K. Bloom, B. Bockelman, D. Kcira, J. Letts, T. Levshina, C. Lundestedt, T. Martin, W. Maier, H. Pi, A. Rana, I. Sfiligoi, A. Sim, M. Thomas, F. Wuerthwein, "Hadoop Distributed File System for the Grid", IEEE Nuclear Science Symposium, 2009,

J. Jensen, R. Downing, D. Ross, A. Sim, "Practical Grid Storage Interoperation", Journal of Grid Computing, 2009, 7:3, doi: 10.1007/s10723-009-9127-2

M. Riedel, E. Laure, Th. Soddemann, L. Field, J. P. Navarro, J. Casey, M. Litmaath, J. Ph. Baud, B. Koblitz, C. Catlett, D. Skow, C. Zheng, P. M. Papadopoulos, M. Katz, N. Sharma, O. Smirnova, B. Kónya, P. Arzberger, F. Würthwein, A. S. Rana, T. Martin, M. Wan, V. Welch, T. Rimovsky, S. Newhouse, A. Vanni, Y. Tanaka, Y. Tanimura, T. Ikegami, D. Abramson, C. Enticott, G. Jenkins, R. Pordes, N. Sharma, S. Timm, N. Sharma, G. Moont, M. Aggarwal, D. Colling, O. van der Aa, A. Sim, V. Natarajan, A. Shoshani, J. Gu, S. Chen, G. Galang, R. Zappi, L. Magnoni, V. Ciaschini, M. Pace, V. Venturi, M. Marzolla, P. Andreetto, B. Cowles, S. Wang, Y. Saeki, H. Sato, S. Matsuoka, P. Uthayopas, S. Sriprayoonsakul, O. Koeroo, M. Viljoen, L. Pearlman, S. Pickles, David Wallom, G. Moloney, J. Lauret, J. Marsteller, P. Sheldon, S. Pathak, S. De Witt, J. Mencák, J. Jensen, M. Hodges, D. Ross, S. Phatanapherom, G. Netzer, A. R. Gregersen, M. Jones, S. Chen, P. Kacsuk, A. Streit, D. Mallmann, F. Wolf, T. Lippert, Th. Delaitre, E. Huedo, N. Geddes, "Interoperation of world-wide production e-Science infrastructures", Concurrency and Computation: Practice and Experience, 2009, 21(8):961-990,

Arie Shoshani, Flavia Donno, Junmin Gu, Jason Hick, Maarten Litmaath, Alex Sim, "Dynamic Storage Management", Scientific Data Management: Challenges, Technology, and Deployment, edited by Arie Shoshani, Doron Rotem, (Chapman & Hall/CRC Computational Science: 2009)

K Wu et al., "FastBit: Interactively Searching Massive Data", SciDAC 2009, 2009, LBNL 2164E, doi: 10.1088/1742-6596/180/1/012053

2008

P. Jakl, J. Lauret, A. Hanushevsky, A. Shoshani, A. Sim, J. Gu, "Grid data access on widely distributed worker nodes using scalla and SRM", Journal of Physics: Conf. Ser., 2008, 119, doi: 10.1088/1742-6596/119/7/072019

D. N. Williams, R. Ananthakrishnan, D. E. Bernholdt, S. Bharathi, D. Brown, M. Chen, A. L. Chervenak, L. Cinquini, R. Drach, I. T. Foster, P. Fox, S. Hankin, V. E. Henson, P. Jones, D. E. Middleton, J. Schwidder, R. Schweitzer, R. Schuler, A Shoshani, F. Siebenlist, A. Sim, W. G. Strand, N. Wilhelmi, M. Su, "Data Management and Analysis for the Earth System Grid", SciDAC Conference, 2008,

Alex Sim, Arie Shoshani (Editors), Paolo Badino, Olof Barring, Jean‐Philippe Baud, Ezio Corso, Shaun De Witt, Flavia Donno, Junmin Gu, Michael Haddox‐Schatz, Bryan Hess, Jens Jensen, Andy Kowalski, Maarten Litmaath, Luca Magnoni, Timur Perelmutov, Don Petravick, Chip Watson, The Storage Resource Manager Interface Specification Version 2.2, Open Grid Forum, Document in Full Recommendation, GFD.129, 2008,

C S Chang, S Klasky, J Cummings, R. Samtaney, A Shoshani, L Sugiyama, D Keyes, S Ku, G Park, S Parker, N Podhorszki, H. Strauss, H Abbasi, M Adams, R Barreto, G Bateman, K Bennett, Y Chen, E D’Azevedo, C Docan, S Ethier, E Feibush, L Greengard, T Hahm, F Hinton, C Jin, A. Khan, A Kritz, P Krsti, T Lao, W Lee, Z Lin, J Lofstead, P Mouallem, M Nagappan, A Pankin, M Parashar, M Pindzola, C Reinhold, D Schultz, K Schwan, D. Silver, A Sim, D Stotler, M Vouk, M Wolf, H Weitzner, P Worley, Y Xiao, E Yoon, D Zorin, "Toward a first- principles integrated simulation of tokamak edge plasmas", Journal of Physics: Conf. Ser., 2008, 125, doi: 10.1088/1742-6596/125/1/012042

R Ananthakrishnan, D E Bernholdt, S Bharathi, D Brown, M Chen, A L Chervenak, L Cinquini, R Drach, I T Foster, P Fox, D Fraser, K Halliday, S Hankin, P Jones, C Kesselman, D E Middleton, J Schwidder, R Schweitzer, R Schuler, A Shoshani, F Siebenlist, A Sim, W G Strand, N Wilhelmi, M Su, D N Williams, "Building a global federation system for climate change research: the earth system grid center for enabling technologies (ESG-CET)", Journal of Physics: Conf. Ser., 2008, 78, doi: 10.1088/1742-6596/78/1/012050

W. Betts, L. Didenko, T. Freeman, P. Jakl, L. Hajdu, E. Hjort, K. Keahey, J. Lauret, D. Olson, A. Rose, I. Sakrejda, A. Sim, "STAR Grid Activities, OSG and Beyond", International Symposium on Grid Computing (ISGC), 2008,

Meiyappan Nagappan, Mladen A. Vouk, Kesheng Wu Alex Sim, Arie Shoshani, "Efficient Operational Profiling of Systems Using Arrays on Execution Logs", ISSRE, 2008, 313--314, doi: 10.1109/ISSRE.2008.45

2007

L. Abadie, P. Badino, J. Baud, E. Corso, M. Crawford, S. De Witt, F. Donno, A. Forti, P. Fuhrmann,
G. Grosdidier, J. Gu , J. Jensen, S. Lemaitre, M. Litmaath, D. Litvinsev, G. Lo Presti, L. Magnoni, T. Mkrtchan, A. Moibenko, V. Natarajan, G. Oleynik, T. Perelmutov, D. Petravick, A. Shoshani, A. Sim, M. Sponza, R. Zappi,
"Storage Resource Managers: Recent International Experience on Requirements and Multiple Co-Operating Implementations", the 24th IEEE Conference on Mass Storage Systems and Technologies, 2007,

F. Donno, L. Abadie, P. Badino, J. Baud, E. Corso, M. Crawford, S. De Witt, A. Forti, P. Fuhrmann, G. Grosdidier, J. Gu , J. Jensen, S. Lemaitre, M. Litmaath, D. Litvinsev, G. Lo Presti, L. Magnoni, T. Mkrtchan, A. Moibenko, V. Natarajan, G. Oleynik, T. Perelmutov, D. Petravick, A. Shoshani, A. Sim, M. Sponza, R. Zappi, "Storage Resource Manager version 2.2: design, implementation, and testing experience", Journal of Physics: Conf. Ser., 2007, 119, doi: 10.1088/1742-6596/119/6/062028

2006

A. Shoshani, A. Sim, K. Stockinger, "RRS: Replica Registration Service for Data Grids", Lecture Notes in Computer Science, edited by Jean-Marc Pierson, (Springer-Verlag GmbH Publisher: 2006) Pages: 100-112

D. E. Middleton, D. E. Bernholdt, D. Brown, M. Chen, A. L. Chervenak, L. Cinquini, R. Drach, P. Fox, P. Jones, C. Kesselman, I. T. Foster, V. Nefedova, A. Shoshani, A. Sim, W. G. Strand, D. Williams, "Enabling worldwide access to climate simulation data: the earth system grid (ESG)", SciDAV Conference, 2006,

P. Jakl, J. Lauret, A. Hanushevky, A. Shoshani, A. Sim, "From rootd to Xrootd, from physical to logical files: experience on accessing and managing distributed data", Computing in High Energy Physics (CHEP), 2006,

E. Hjort, L. Hajdu, J. Lauret, D. Olson, A. Sim, A. Shoshani, "Data and Computational Grid Coupling in RHIC/STAR – An Analysis Scenario using SRM Technology", Computing in High Energy Physics (CHEP), 2006,

2005

D. Bernholdt, S. Bharathi, D. Brown, K. Chanchio, M. Chen, A. Chervenak, L. Cinquini, B. Zrach, I. Foster, P. Fox, J. Garcia, C. Kesselman, R. Markel, D. Middleton, V. Nefedova, L. Pouchard, A. Shoshani, A. Sim, G. Strand, D. Williams, "The Earth System Grid: Supporting the Next Generation of Climate Modeling Research", IEEE, 2005, 93(3):485-495,

A. Shoshani, A. Sim, K. Stockinger, "RRS: Replica Registration Service for Data Grids", International Workshop on Data Management in Grids, 2005,

Arie Shoshani, Alex Sim, Kurt Stockinger, "Replica Registration Service Functional Interface Specification 1.0", 2005, LBNL 57520,

Kesheng Wu, Junmin Gu, Jerome Lauret, Arthur Poskanzer, Arie Shoshani, Alexander Sim, Zhang, "Grid Collector: Facilitating Efficient Selective from Data Grids", International Supercomputer Conference 2005, 2005,

2004

K. Wu, W. Zhang, A. Sim, J. Gu, A. Shoshani, "Grid Collector: an Event Catalog with Automated File Management", 2004, LBNL 55563,

Eric Hjort, Doug Olson, Jerome Lauret, Arie Shoshani, Alex Sim, "Production mode Data- Replication framework in STAR using the HRM Grid middleware", Computing in High Energy Physics, 2004,

Alex Sim, Junmin Gu, Arie Shoshani, Vijaya Natarajan, "DataMover: Robust Terabytes-Scale Multi-file Replication over Wide-Area Networks", the 16th International Conference on Scientific and Statistical Database Management (SSDBM 2004), 2004,

2003

Arie Shoshani, Alexander Sim, Junmin Gu, "Storage Resource Managers: Essential Components for the Grid", Grid Resource Management: State of the Art and Future Trends, edited by Jarek Nabrzyski, Jennifer M. Schopf, Jan Weglarz, (Kluwer Academic Publishers: 2003)

Ann L. Chervenak, Ewa Deelman, Carl Kesselman, William E. Allcock, Ian T. Foster, Veronika Nefedova, Jason Lee, Alex Sim, Arie Shoshani, Bob Drach, Dean Williams, Don Middleton, "High-performance remote access to climate simulation data: a challenge problem for data grid technologies", Parallel Computing, 2003, 29(10):1335-1356,

A. Sim, J. Gu, A. Shoshani, E. Hjort, D. Olson, "Experience with Deploying Storage Resource Managers to Achieve Robust File Replication", Computing in High Energy Physics, 2003,

D. Yu, J. Lauret, A. Shoshani, D. Oldon, E. Hjort, A. Sim, "The Design of High Performance Data Replication in the Grid Environment for the STAR Collaboration", Computing in High Energy Physics, 2003,

L. Pouchard, L. Cinquini, B. Drach, D. Middleton, D. Bernholdt, K. Chanchio, I. Foster, V. Nefedova, D. Brown, P. Fox, J. Garcia, G. Strand, D. Williams, A. Chervenak, C. Kesselman, A. Shoshani, A. Sim, "An Ontology for Scientific Information in a Grid Environment: the Earth System Grid", the Symposium on Cluster Computing and the Grid (CCGrid), 2003,

Arie Shoshani, Alex Sim, Junmin Gu, Storage Resource Managers: Essential Components for Grid Applications, Globus World, 2003,

Kesheng Wu, Wei-Ming Zhang, Alexander Sim, Gu, Arie Shoshani, "Grid Collector: An Event Catalog With Automated File", Proceedings of IEEE Nuclear Science Symposium 2003, 2003, doi: 10.1109/NSSMIC.2003.1351830

2002

A. Shoshani, A. Sim, J. Gu, "Storage Resource Managers: Middleware components for Grid Storage", the 19th IEEE Symposium on Mass Storage Systems, 2002,

2001

B. Allcock, I. Foster, V. Nefedova, A. Chervenak, E. Deelman, C. Kesselman, J. Lee, A. Sim, A. Shoshani, B. Drach, D. Williams, "High-Performance Remote Access to Climate Simulation Data: A Challenge Problem for Data Grid Technologies", Super Computing 2001, 2001,

L. Bernardo, B. Gibbard, D. Malon, H. Nordberg, D. Olson, R. Porter, A. Shoshani, A. Sim, A. Vaniachine, T. Wenaus, K. Wu, D. Zimmerman, "New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC", Journal of Computer Physics Communications, 2001,

A. Sim, H. Nordberg, L.M. Bernardo, A. Shoshani, D. Rotem, "Experience with using CORBA to implement a file caching coordination system", Concurrency and Computation: Practice and Experience, 2001, 13:1-15,

E. Hjort, D. Olson, A. Sim, J. Yang, J. Lauret, M. Messer, "Data Grid Services in STAR, Initial Deployment: Site-to-Site File Replication", Computing in High Energy Physics, 2001,

D. Olson, E. Hjort, J. Lauret, M. Messer, A. Shoshani, A. Sim, "Non-shared Disk Cluster - A Fault Tolerant, Commodity Approach to Hi-Bandwidth Data Analysis", Computing in High Energy Physics, 2001,

2000

A. Shoshani, A. Sim, L.M. Bernerdo, H. Nordberg, "Coordinating Simultaneous Caching of File Bundles from Tertiary Storage", International Conference on Scientific and Statistical Database Management (SSDBM), 2000,

L. M. Bernardo, B. Gibbard, D. Malon, H. Nordberg, D. Olson, R. Porter, A. Shoshani, A. Sim, A. Vaniachine, T. Wenaus, K. Wu, D. Zimmerman, "New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC", Computing in High Energy Physics, 2000,

L. M. Bernardo, A. Shoshani, A. Sim, H. Nordberg, "Access Coordination Of Tertiary Storage For High Energy Physics Applications", the 17th IEEE Symposium on Mass Storage Systems, 2000,

A. Sim, A. Shoshani, HRM: Hierarchical Resource Manager, Globus World, 2000,

A. Sim, A. Shoshani, L. M. Bernardo, H. Nordberg, A Storage Access Coordination System for Perabyte Scale Scientific Data, IONA World, 2000,

1999

A. Sim, H. Nordberg, L. M. Bernardo, A. Shoshani, D. Rotem, "Storage Access Coordination Using CORBA", Distributed Objects and Application, 1999, 168-175,

A. Shoshani, L.M. Bernardo, H. Nordberg, D. Rotem and A. Sim, "Multidimensional Indexing and Query Coordination for Tertiary Storage Management", International Conference on Scientific and Statistical Database Management, 1999, 214-225,

1998

L.M. Bernardo, D. Rotem, A. Shoshani, H. Nordberg, A. Sim, "Using Access Patterns to Partition Large Datasets on Tertiary Storage in Order to Minimize Retrieval Costs", 1998, LBNL 41504,

A. Shoshani, L.M. Bernardo, H. Nordberg, D. Rotem, A. Sim, "Storage Management for High Energy Physics Applications", Computing in High Energy Physics, 1998,

1996

A. Sim, B. Parvin, P. Keagy, "Invariant Representation and Classification of Fruits from X-ray Images", International Journal of Imaging Systems and Technology, 1996, 7:231-237,

1995

A. Sim, B. Parvin, P. Keagy, "Invariant Representation and Hierarchical Network for Inspection of Nuts from X-ray Images", IEEE International Conference on Neural Networks, 1995, II:738-743,

A. Sim, B. Parvin, P. Keagy, "Machine Vision Inspection of Insect Infested Pistachio Nuts from X-ray Images", Vision Interface, 1995, 17-22,

Horst D. Simon

2014

Jung Heon Song, Marcos L\ opez de Prado, Horst Simon, Kesheng Wu, "Exploring Irregular Time Series Through Non-uniform Fourier Transform", WHPCF 14, Piscataway, NJ, USA, IEEE Press, 2014, 37--44, doi: 10.1109/WHPCF.2014.8

Jung Heon Song, Kesheng Wu, Horst D Simon, "Parameter Analysis of the VPIN (Volume synchronized of Informed Trading) Metric", Quantitative Financial Risk Management: Theory and, 2014,

2013

William Gu, Jaesik Choi, Ming Gu, Horst Simon, Kesheng Wu, "Fast Change Point Detection for Electricity Market Analysis", October 6, 2013, LBNL LBNL-6388E,

2010

Ichitaro Yamazaki, Zhaojun Bai, Horst D. Simon Lin-Wang Wang, Kesheng Wu, "Adaptive Projection Subspace Dimension for the Lanczos Method", ACM Transactions on Mathematical Software, 2010, 37, doi: 10.1145/1824801.1824805

2008

I. Yamazaki, K. Wu, H. Simon, "nu-TRLan User Guide version 1.0", 2008, LBNL 1288E,

Houjun Tang

2017

Houjun Tang, Suren Byna, Bin Dong, Jialin Liu, and Quincey Koziol, "SoMeta: Scalable Object-centric Metadata Management for High Performance Computing", IEEE Cluster 2017, September 5, 2017,

2016

Wenzhao Zhang, Houjun Tang, Xiaocheng Zou, Steven Harenberg, Qing Liu, Scott Klasky, Nagiza F Samatova, "Exploring Memory Hierarchy to Improve Scientific Data Read Performance", 2015 IEEE International Conference on Cluster Computing, 2016, 66--69,

Wenzhao Zhang, Houjun Tang, Stephen Ranshous, Surendra Byna, Daniel F Martín, Kesheng Wu, Bin Dong, Scott Klasky, Nagiza F Samatova, "Exploring memory hierarchy and network topology for runtime AMR data sharing across scientific applications", 2016 IEEE International Conference on Big Data (Big Data) (Acceptance rate: 19.39% as short papers.), December 5, 2016,

Xiaocheng Zou, David Boyuka, Dhara Desai, Martin, Suren Byna, Kesheng Wu, Kushal, Bin Dong, Wenzhao Zhang, Houjun Tang Dharshi Devendran, David Trebotich, Scott, Hans Johansen, Nagiza Samatova, "AMR-aware In Situ Indexing and Scalable Querying", The 24th High Performance Computing Symposium (HPC, January 1, 2016,

Houjun Tang, Suren Byna, Steve Harenberg, Xiaocheng Zou, Wenzhao Zhang, Kesheng Wu, Bin Dong, Oliver Rubel, Kristofer Bouchard, Scott Klasky, others, "Usage Pattern-Driven Dynamic Data Layout Reorganization", Cluster, Cloud and Grid Computing (CCGrid), 2016 16th IEEE/ACM International Symposium on, January 1, 2016, 356--365,

Wenzhao Zhang, Houjun Tang, Steve Harenberg, Surendra Byna, Xiaocheng Zou, Dharshi Devendran, Daniel F Martin, Kesheng Wu, Bin Dong, Scott Klasky, others, "AMRZone: A Runtime AMR Data Sharing Framework for Scientific Applications", Cluster, Cloud and Grid Computing (CCGrid), 2016 16th IEEE/ACM International Symposium on, January 1, 2016, 116--125,

2015

Xiaocheng Zou, Kesheng Wu, David A. Boyuka, Daniel F. Martin, Suren Byna, Houjun, Kushal Bansal, Terry J. Ligocki, Hans Johansen, and Nagiza F. Samatova, "Parallel In Situ Detection of Connected Components Adaptive Mesh Refinement Data", Proceedings of the Cluster, Cloud and Grid Computing (CCGrid) 2015, 2015,

Xiaocheng Zou, Kesheng Wu, David A Boyuka, Daniel F Martin, Suren Byna, Houjun Tang, Kushal Bansal, Terry J Ligocki, Hans Johansen, Nagiza F Samatova, "Parallel in situ detection of connected components in adaptive mesh refinement data", Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on, 2015, 302--312,

David A Boyuka II, Houjun Tang, Kushal Bansal, Xiaocheng Zou, Scott Klasky, Nagiza F Samatova, "The hyperdyadic index and generalized indexing and query with PIQUE", Proceedings of the 27th International Conference on Scientific and Statistical Database Management, 2015, 20,

2014

John Jenkins, Xiaocheng Zou, Houjun Tang, Dries Kimpe, Robert Ross, Nagiza F Samatova, "Radar: Runtime asymmetric data-access driven scientific data replication", International Supercomputing Conference, 2014, 296--313,

Houjun Tang, Xiaocheng Zou, John Jenkins, David A Boyuka II, Stephen Ranshous, Dries Kimpe, Scott Klasky, Nagiza F Samatova, "Improving read performance with online access pattern analysis and prefetching", European Conference on Parallel Processing, 2014, 246--257,

Xiaocheng Zou, Sriram Lakshminarasimhan, David A Boyuka II, Stephen Ranshous, Houjun Tang, Scott Klasky, Nagiza F Samatova, "Fast set intersection through run-time bitmap construction over pfordelta-compressed indexes", European Conference on Parallel Processing, 2014, 668--679,

2013

Eric R Schendel, Steve Harenberg, Houjun Tang, Venkatram Vishwanath, Michael E Papka, Nagiza F Samatova, "A generic high-performance method for deinterleaving scientific data", European Conference on Parallel Processing, 2013, 571--582,

Daniela Ushizima

2010

Oliver Rübel, Sean Ahern, E. Wes Bethel, Mark. D Biggin, Hank Childs, Estelle Cormier-Michel, Angela DePace, Michael B. Eisen, Charless C. Fowlkes, Cameron G. R. Geddes, Hans Hagen, Bernd Hamann, Min-Yu Huang, Soile V. E. Keränen, David W. Knowles, Cris L. Luengo Hendriks, Jitendra Malik, Jeremy Meredith, Peter Messmer, Prabhat, Daniela Ushizima, Gunther H. Weber, and Kesheng Wu, "Coupling Visualization and Data Analysis for Knowledge Discovery from Multi-dimensional Scientific Data", Procedia Computer Science, Proceedings of International Conference on Computational Science, ICCS 2010, June 2010, LBNL 3669E,

2009

O. Rübel, C.G.R. Geddes, E. Cormier-Michel, K. Wu, Prabhat, G.H. Weber, D.M. Ushizima, P. Messmer, H. Hagen, B. Hamann, and E.W. Bethel, "Automatic Beam Path Analysis of laser Wakefield Particle Acceleration Data", IOP Computational Science & Discovery, November 2009, 2, LBNL 2734E,

E. W. Bethel, C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, M. Day, E. Deines, T. Fogal, C. Garth, C. G. R. Geddes, H. Hagen, B. Hamann, C. Hansen, J. Jacobsen, K. Joy, J. Kruger, J. Meredith, P. Messmer, G. Ostrouchov, V. Pascucci, K. Potter, Prabhat, D. Pugmire, O. Rubel, A. Sanderson, C. Silva, D. Ushizima, G. Weber, B. Whitlock, K. Wu, "Occam's Razor and Petascale Visual Data Analysis", SciDAC 2009, J. of Physics: Conference Series, San Diego, California, July 2009, LBNL 2210E,

C. G. R. Geddes, E Cormier-Michel, E. H. Esarey, C. B. Schroeder, J.-L. Vay, W. P. Leemans, D. L.. Bruhwiler, J. R. Cary, B. Cowan, M. Durant, P. Hamill, P. Messmer, P. Mullowney, C. Nieter, K. Paul, S. Shasharina, S. Veitzer, G. Weber, O. Rübel, D. Ushizima, Prabhat, E. W.Bethel, K. Wu, Large Fields for Smaller Facility Sources, SciDAC Review, Pages: 13-21, 2009,

Brian Van Straalen

2016

Dharshi Devendran, Suren Byna, Bin Dong, Brian van Straalen, Hans Johansen, Noel Keen, and Nagiza Samatova,, "Collective I/O Optimizations for Adaptive Mesh Refinement Data Writes on Lustre File System", Cray User Group (CUG) 2016, May 10, 2016,

2010

G. H. Weber, S. Ahern, E.W. Bethel, S. Borovikov, H.R. Childs, E. Deines, C. Garth, H. Hagen, B. Hamann, K.I. Joy, D. Martin, J. Meredith, Prabhat, D. Pugmire, O. Rübel, B. Van Straalen and K. Wu, "Recent Advances in VisIt: AMR Streamlines and Query-Driven Visualization", Numerical Modeling of Space Plasma Flows: Astronum-2009 (Astronomical Society of the Pacific Conference Series, 3185E, 2010, 429:329-334,

Gunther H. Weber

2016

Utkarsh Ayachit, Andrew Bauer, Earl P. N. Duque, Greg Eisenhauer, Nicola Ferrier, Junmin Gu, Kenneth Jansen, Burlen Loring, Zarija Luki\ c, Suresh Menon, Dmitriy Morozov, Patrick O Leary, Michel Rasquin, Christopher P. Stone, Venkat Vishwanath, Gunther H. Weber, Brad Whitlock, Matthew Wolf, K. John Wu, E. Wes Bethel, "Performance Analysis, Design Considerations, and Applications of Extreme-scale In Situ Infrastructures", ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), Salt Lake City, UT, USA, 2016, doi: 10.1109/SC.2016.78

Wahid Bhimji, Debbie Bard, Melissa Romanus, David Paul, Andrey Ovsyannikov, Brian Friesen, Matt Bryson, Joaquin Correa, Glenn K. Lockwood, Vakho Tsulaia, Suren Byna, Steve Farrell, Doga Gursoy, Chris Daley, Vince Beckner, Brian Van Straalen, Nicholas Wright, Katie Antypas, Prabhat,, "Accelerating Science with the NERSC Burst Buffer Early User Program", Cray User Group (CUG) 2016, May 10, 2016,

2012

Allen R. Sanderson, Brad Whitlock, Oliver, Hank Childs, Gunther H. Weber, , Kesheng Wu, "A System for Query Based Analysis and Visualization", Third International Eurovis Workshop on Visual EuroVA 2012, Vienna, Austria, January 2012, LBNL 5507E,

2011

Prabhat, Suren Byna. Chris Paciorek, Gunther Weber, Kesheng Wu, Thomas Yopes, Michael Wehner, William Collins, George Ostrouchov, Richard Strelitz, E. Wes Bethel, "Pattern Detection and Extreme Value Analysis on Large Climate Data", DOE/BER Climate and Earth System Modeling PI Meeting, September 2011,

2010

Oliver Rübel, Sean Ahern, E. Wes Bethel, Mark. D Biggin, Hank Childs, Estelle Cormier-Michel, Angela DePace, Michael B. Eisen, Charless C. Fowlkes, Cameron G. R. Geddes, Hans Hagen, Bernd Hamann, Min-Yu Huang, Soile V. E. Keränen, David W. Knowles, Cris L. Luengo Hendriks, Jitendra Malik, Jeremy Meredith, Peter Messmer, Prabhat, Daniela Ushizima, Gunther H. Weber, and Kesheng Wu, "Coupling Visualization and Data Analysis for Knowledge Discovery from Multi-dimensional Scientific Data", Procedia Computer Science, Proceedings of International Conference on Computational Science, ICCS 2010, June 2010, LBNL 3669E,

G. H. Weber, S. Ahern, E.W. Bethel, S. Borovikov, H.R. Childs, E. Deines, C. Garth, H. Hagen, B. Hamann, K.I. Joy, D. Martin, J. Meredith, Prabhat, D. Pugmire, O. Rübel, B. Van Straalen and K. Wu, "Recent Advances in VisIt: AMR Streamlines and Query-Driven Visualization", Numerical Modeling of Space Plasma Flows: Astronum-2009 (Astronomical Society of the Pacific Conference Series, 3185E, 2010, 429:329-334,

2009

O. Rübel, C.G.R. Geddes, E. Cormier-Michel, K. Wu, Prabhat, G.H. Weber, D.M. Ushizima, P. Messmer, H. Hagen, B. Hamann, and E.W. Bethel, "Automatic Beam Path Analysis of laser Wakefield Particle Acceleration Data", IOP Computational Science & Discovery, November 2009, 2, LBNL 2734E,

E. W. Bethel, C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, M. Day, E. Deines, T. Fogal, C. Garth, C. G. R. Geddes, H. Hagen, B. Hamann, C. Hansen, J. Jacobsen, K. Joy, J. Kruger, J. Meredith, P. Messmer, G. Ostrouchov, V. Pascucci, K. Potter, Prabhat, D. Pugmire, O. Rubel, A. Sanderson, C. Silva, D. Ushizima, G. Weber, B. Whitlock, K. Wu, "Occam's Razor and Petascale Visual Data Analysis", SciDAC 2009, J. of Physics: Conference Series, San Diego, California, July 2009, LBNL 2210E,

E. Wes Bethel, Oliver Rübel, Prabhat, Kesheng Wu, Gunther H. Weber, Valerio Pascucci, Hank Childs, Ajith Mascarenhas, Jeremy Meredith, and Sean Ahern, "Modern Scientific Visualization is More than Just Pretty Pictures", Numerical Modeling of Space Plasma Flows: Astronum-2008 (Astronomical Society of the Pacific Conference Series, St. Thomas, USVI, June 2009, 301-317, LBNL 1450E,

K Wu et al., "FastBit: Interactively Searching Massive Data", SciDAC 2009, 2009, LBNL 2164E, doi: 10.1088/1742-6596/180/1/012053

C. G. R. Geddes, E Cormier-Michel, E. H. Esarey, C. B. Schroeder, J.-L. Vay, W. P. Leemans, D. L.. Bruhwiler, J. R. Cary, B. Cowan, M. Durant, P. Hamill, P. Messmer, P. Mullowney, C. Nieter, K. Paul, S. Shasharina, S. Veitzer, G. Weber, O. Rübel, D. Ushizima, Prabhat, E. W.Bethel, K. Wu, Large Fields for Smaller Facility Sources, SciDAC Review, Pages: 13-21, 2009,

2008

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "High Performance Multivariate Visual Data Exploration for Extemely Large Data", Supercomputing (SC), Austin, Texas, USA, November 2008, LBNL 716E,

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "Application of High-performance Visual Analysis Methods to Laser Wakefield Particle Acceleration Data", IEEE Visualization 2008, October 2008,

Michael Wehner

2013

E. Wes Bethel, Prabhat, Suren Byna, Oliver Rübel, K. John Wu, and Michael Wehner, "Why High Performance Visual Data Analytics is both Relevant and Difficult", Proceedings of Visualization and Data Analysis 2013, IS&T/SPIE Electronic Imaging 2013, San Francisco, CA, USA, SPIE, February 2013, LBNL LBNL-6063E,

2012

Prabhat, Oliver Rübel, Surendra Byna, Kesheng Wu, Fuyu Li, Michael Wehner and E. Wes Bethel, "TECA: A Parallel Toolkit for Extreme Climate Analysis", Procedia Computer Science, Proceedings of the International Conference on Computational Science, ICCS 2012, Presented at Third Worskhop on Data Mining in Earth System Science (DMESS 2012), Omaha, Nebraska, June 2012, 9:866–876, LBNL 5352E, doi: 10.1016/j.procs.2012.04.093

We present TECA, a parallel toolkit for detecting extreme events in large climate datasets. Modern climate datasets expose parallelism across a number of dimensions: spatial locations, timesteps and ensemble members. We design TECA to exploit these modes of parallelism and demonstrate a prototype implementation for detecting and tracking three classes of extreme events: tropical cyclones, extra-tropical cyclones and atmospheric rivers. We process a modern TB-sized CAM5 simulation dataset with TECA, and demonstrate good runtime performance for the three case studies.

E. W. Bethel, Surendra Byna, Jerry Chou, Cormier-Michel, Cameron G. R. Geddes, Howison, Fuyu Li, Prabhat, Ji Qiang, R\ ubel, Rob D. Ryne, Michael Wehner, Wu, "Big Data Analysis and Visualization: What Do LINACS Tropical Storms Have In Common?", 11th International Computational Accelerator Physics ICAP 2012, Germany, 2012,

2011

Suren Byna, Prabhat, Michael F. Wehner and Kesheng Wu, "Detecting Atmospheric Rivers in Large Climate Datasets", Proceedings of the 2nd International Workshop on Petascale Data Analytics: Challenges, and Opportunities (PDAC-11/ Supercomputing11/ ACM/IEEE), November 14, 2011, Seattle, Washington, 2011, doi: 10.1145/2110205.2110208

Extreme precipitation events on the western coast of North America are often traced to an unusual weather phenomenon known as atmospheric rivers. Although these storms may provide a significant fraction of the total water to the highly managed western US hydrological system, the resulting intense weather poses severe risks to the human and natural infrastructure through severe flooding and wind damage. To aid the understanding of this phenomenon, we have developed an efficient detection algorithm suitable for analyzing large amounts of data. In addition to detecting actual events in the recent observed historical record, this detection algorithm can be applied to global climate model output providing a new model validation methodology. Comparing the statistical behavior of simulated atmospheric river events in models to observations will enhance confidence in projections of future extreme storms. Our detection algorithm is based on a thresholding condition on the total column integrated water vapor established by Ralph et al. (2004) followed by a connected component labeling procedure to group the mesh points into connected regions in space. We develop an efficient parallel implementation of the algorithm and demonstrate good weak and strong scaling. We process a 30-year simulation output on 10,000 cores in under 3 seconds.

Prabhat, Suren Byna. Chris Paciorek, Gunther Weber, Kesheng Wu, Thomas Yopes, Michael Wehner, William Collins, George Ostrouchov, Richard Strelitz, E. Wes Bethel, "Pattern Detection and Extreme Value Analysis on Large Climate Data", DOE/BER Climate and Earth System Modeling PI Meeting, September 2011,

2010

D. Hasenkamp, A. Sim, M. Wehner and K. Wu, "Finding Tropical Cyclones on a Cloud Computing Cluster: Using Parallel Virtualization for Large-Scale Climate Simulation Analysis", Proceedings of the 2nd IEEE International Conference on Cloud Computing Technology and Science, Nov. 30-Dec. 3, 2010, Indianapolis, Indiana, 2010, LBNL 4218E,

 

 

Kesheng Wu

2018

T. Kim, J. Choi, D. Lee, A. Sim, C. A. Spurlock, A. Todd, K. Wu, "Predicting Baseline for Analysis of Electricity Pricing", International Journal of Big Data Intelligence. Special issue on Data to Decision, 2018, 5:3-20, doi: 10.1504/IJBDI.2018.10008133

2017

J. Wang, A. Sim, K. Wu, S. Hwangbo, "Accurate Signal Timing from High Frequency Streaming Data", 2017 IEEE International Conference on Big Data (Big Data 2017), 2017,

A. Lazar, L. Jin, A. Spurlock, A. Todd, K. Wu, A. Sim, "Data Quality Challenges with Missing Values and Mixed Types in Joint Sequence Analysis", Workshop in Data Quality Issues in Big Data and Machine Learning Applications: Going Beyond Data Cleaning and Transformations, in Conjunction with the 2017 IEEE International Conference on Big Data (Big Data 2017), 2017,

J. Wang, K. Wu, A. Sim, S. Hwangbo, "Feature Engineering and Classification Models for Partial Discharge Events in Power Transformers", 10th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2017), 2017,

P. Harrington, W. Yoo (Advisor), A. Sim (Advisor), K. Wu (Advisor), "Diagnosing Parallel I/O Bottlenecks in HPC Applications", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’17), ACM Student Research Competition (SRC), First place winner, 2017,

Tzu-Hsien Wu, Jerry Chou, Shyng Hao, Bin Dong, KeshengWu, Scott Klasky, "Optimizing the Query Performance of Block Index Through Data Analysis and I/O Modeling", The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'17), November 13, 2017,

J. Wang, K. Wu, A. Sim, S. Hwangbo, "Convolutional Filtering for Accurate Signal Timing from Noisy Streaming Data", 3rd IEEE International Conference on Big Data Intelligence and Computing (DataCom2017), 2017,

K. Wu, D. Lee, A. Sim, J. Choi, "Statistical Data Reduction for Streaming Data", 2017 New York Scientific Data Summit (NYSDS), Data-Driven Discovery in Science and Industry, 2017,

Dongeun Lee, Alex Sim, Jaesik Choi, Kesheng Wu, "Improving Statistical Similarity Based Data Reduction for Non-Stationary Data", 29th International Conference on Scientific and Statistical Database Management (SSDBM2017), 2017, doi: 10.1145/3085504.3085583

Updated experiment version: https://sdm.lbl.gov/oapapers/ssdbm17-lee-upd.pdf
Original version: http://dl.acm.org/citation.cfm?doid=3085504.3085583

Bin Dong, Kesheng Wu, Surendra Byna, Jialin Liu, Weijie Zhao, Florin Rusu, "ArrayUDF: User-Defined Scientific Data Analysis on Arrays", The ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC) 2017 (Acceptance rate:19%), June 26, 2017,

Jonathan Wang, Wucherl Yoo, Alex Sim, Peter Nugent, K. John Wu, "Parallel Variable Selection for Effective Performance Prediction", the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid2017), 2017, doi: 10.1109/CCGRID.2017.47

Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, and Peter Nugent, "Incremental View Maintenance over Array Data", In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17) (Acceptance rate: 20%). ACM, New York, NY, USA, May 14, 2017,

Dongeun Lee, Alex Sim, Jaesik Choi, Kesheng Wu, "Expanding Statistical Similarity Based Data Reduction to Capture Diverse Patterns", Data Compression Conference (DCC 2017), 2017,

Ling Jin, Doris Lee, Alex Sim, Sam Borgeson, John Wu, Anna Spurlock, Annika Todd, "Comparison of Clustering Techniques for Residential Energy Behavior using Smart Meter Data", 2nd International Workshop on Artificial Intelligence for Smart Grids and Smart Buildings, In conjunction with AAAI 2017, 2017,

2016

Bin Dong, Suren Byna, Kesheng Wu, Prabhat, Hans Johansen, Jeffrey N. Johnson, and Noel Keen, "Data Elevator: Low-contention Data Movement in Hierarchical Storage System", The 23rd annual IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC) (Acceptance rate: 25%), December 19, 2016,

Wenzhao Zhang, Houjun Tang, Stephen Ranshous, Surendra Byna, Daniel F Martín, Kesheng Wu, Bin Dong, Scott Klasky, Nagiza F Samatova, "Exploring memory hierarchy and network topology for runtime AMR data sharing across scientific applications", 2016 IEEE International Conference on Big Data (Big Data) (Acceptance rate: 19.39% as short papers.), December 5, 2016,

J. Wang, W. Yoo (Advisor), A. Sim (Advisor), K. Wu (Advisor), "Analysis of Variable Selection Methods on Scientific Cluster Measurement Data", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), ACM Student Research Competition (SRC), Second place winner, 2016, 2016,

M. Bae, W. Yoo (Advisor), A. Sim (Advisor), K. Wu (Advisor), "Discovering Energy Resource Usage Patterns on Scientific Clusters", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), ACM Student Research Competition (SRC), Third place winner, 2016, 2016,

M. Bryson, S. Byna (Advisor), A. Sim (Advisor), K. Wu (Advisor), "The Search for Missing Parallel IO Performance on the Cori Supercomputer", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), ACM Student Research Competition (SRC), 2016,

Utkarsh Ayachit, Andrew Bauer, Earl P. N. Duque, Greg Eisenhauer, Nicola Ferrier, Junmin Gu, Kenneth Jansen, Burlen Loring, Zarija Luki\ c, Suresh Menon, Dmitriy Morozov, Patrick O Leary, Michel Rasquin, Christopher P. Stone, Venkat Vishwanath, Gunther H. Weber, Brad Whitlock, Matthew Wolf, K. John Wu, E. Wes Bethel, "Performance Analysis, Design Considerations, and Applications of Extreme-scale In Situ Infrastructures", ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), Salt Lake City, UT, USA, 2016, doi: 10.1109/SC.2016.78

Lingfei Wu, Kesheng Wu, Alex Sim, Michael Churchill, Jong Choi, Andreas Stathopoulos, Choong-Seock Chang, Scott Klasky, "Towards Real-Time Detection and Tracking of Spatio-Temporal Features: Blob-Filaments in Fusion Plasma", IEEE Transactions on Big Data (TBD), 2016, 2:3:262-275, doi: 10.1109/TBDATA.2016.2599929

W. Yoo, M. Koo, Y. Cao, A. Sim, P. Nugent, K. Wu, "Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters", Conquering Big Data with High Performance Computing, edited by R. Arora, (Springer International: 2016) Pages: 139-161 doi: 10.1007/978-3-319-33742-5

W. Yoo, B. Foster, A. Sim, K. Wu, "Machine Learning Based Job Status Prediction in Scientific Clusters", IEEE SAI Computing Conference, 2016, 44-53, doi: 10.1109/SAI.2016.7555961

Bin Dong, Suren Byna, and Kesheng Wu,, "SDS-Sort: Scalable Dynamic Skew-aware Parallel Sorting", The ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC) 2016, July 1, 2016,

Dongeun Lee, Alex Sim, Jaesik Choi, Kesheng, "Novel Data Reduction Based on Statistical Similarity", International Conference on Scientific and Statistical Database Management (SSDBM'16), New York, NY, USA, ACM, 2016, 21:1--21:1, doi: 10.1145/2949689.2949708

D. Pugmire, J. Kress, H. Childs, M. Wolf, G. Eisenhauer, J. Low, R. M. Churchill, T. Kurc, K. Wu, A. Sim, J. Gu, J. Choi, S. Klasky, "Visualization and Analysis for Near-Real-Time Decision Making in Distributed Workflows", High Performance Data Analysis and Visualization Workshop (HPDAV2016) in conjunction with the 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2016), 2016, doi: 10.1109/IPDPSW.2016.175

Tzuhsien Wu, Shyng Hao, Jerry Chou, Bin Dong and Kesheng Wu, "Indexing Blocks to Reduce Space and Time Requirements for Searching Large Data Files", 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) 2016, May 16, 2016,

Xiaocheng Zou, David Boyuka, Dhara Desai, Martin, Suren Byna, Kesheng Wu, Kushal, Bin Dong, Wenzhao Zhang, Houjun Tang Dharshi Devendran, David Trebotich, Scott, Hans Johansen, Nagiza Samatova, "AMR-aware In Situ Indexing and Scalable Querying", The 24th High Performance Computing Symposium (HPC, January 1, 2016,

Houjun Tang, Suren Byna, Steve Harenberg, Xiaocheng Zou, Wenzhao Zhang, Kesheng Wu, Bin Dong, Oliver Rubel, Kristofer Bouchard, Scott Klasky, others, "Usage Pattern-Driven Dynamic Data Layout Reorganization", Cluster, Cloud and Grid Computing (CCGrid), 2016 16th IEEE/ACM International Symposium on, January 1, 2016, 356--365,

Wenzhao Zhang, Houjun Tang, Steve Harenberg, Surendra Byna, Xiaocheng Zou, Dharshi Devendran, Daniel F Martin, Kesheng Wu, Bin Dong, Scott Klasky, others, "AMRZone: A Runtime AMR Data Sharing Framework for Scientific Applications", Cluster, Cloud and Grid Computing (CCGrid), 2016 16th IEEE/ACM International Symposium on, January 1, 2016, 116--125,

Deborah A Agarwal, Boris Faybishenko, Vicky L, Harinarayan Krishnan, Carina Lansing Gary Kushner, Ellen Porter, Alexandru Romosan Arie Shoshani, Haruko Wainwright, Arthur, Kesheng Wu, "A Science Data Gateway for Environmental Management", Concurrency and Computation: Practice and Experience, 2016, 28:1994--2004, doi: 10.1002/cpe.3697

2015

T. Kim, D. Lee, J. Choi, A. Spurlock, A. Sim, A. Todd, K. Wu, "Extracting Baseline Electricity Usage Using Gradient Tree Boosting", International Conference on Big Data Intelligence and Computing (DataCom 2015), Best Paper Award, 2015,

W. Yoo, M. Koo, Y. Cao, A. Sim, P. Nugent, K. Wu, "PATHA: Performance Analysis Tool for HPC Applications", the 34th IEEE International Performance Computing and Communications Conference (IPCCC 2015), 2015,

Bin Dong, Suren Byna, and Kesheng Wu, "Heavy-tailed Distribution of Parallel I/O System Response Time", 10th Parallel Data Storage Workshop (PDSW) 2015, to be held in conjunction with SC15, 2015,

Jinoh Kim, Bin Dong, Suren Byna, and Kesheng Wu, "Security for the Scientific Data Service Framework", 2nd International Workshop on Privacy and Security of Big Data (PSBD 2015), in conjunction with IEEE BigData 2015, 2015,

Bin Dong, Suren Byna, and Kesheng Wu, "Spatially Clustered Join on Heterogeneous Scientific Data Sets", 2015 IEEE International Conference on Big Data (IEEE BigData 2015), IEEE, 2015,

Xiaocheng (Chris) Zou, Suren Byna, Hans Johansen, Daniel Martin, Nagiza F. Samatova, Arie Shoshani, John Wu, "Six-fold Speedup of Ice Calving Detection Achieved by AMR-aware Parallel Connected Component Labeling", SciDAC PI Meeting, July 2015, 2015,

Xiaocheng Zou, Kesheng Wu, David A. Boyuka, Daniel F. Martin, Suren Byna, Houjun, Kushal Bansal, Terry J. Ligocki, Hans Johansen, and Nagiza F. Samatova, "Parallel In Situ Detection of Connected Components Adaptive Mesh Refinement Data", Proceedings of the Cluster, Cloud and Grid Computing (CCGrid) 2015, 2015,

S. Shannigrahi, A. J. Barczyk, C. Papadopoulos, A. Sim, I. Monga, H. Newman, K. Wu, E. Yeh, "Named Data Networking in Climate Research and HEP Applications", 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015), 2015,

S. Shannigrahi, A. Barczuk, C. Papadopoulos, A. Sim, I. Monga, H. Newman, K. Wu, E., Named Data Networking in Climate Research and HEP, 21st International Conference on Computing in High and Nuclear Physics (CHEP2015), Okinawa Japan, 2015,

L. Wu, K. Wu, A. Sim, M. Churchill, J. Y. Choi, A. Stathopoulos, C.S. Chang, S. Klasky, "Towards Real-Time Detection and Tracking of Blob-Filaments in Fusion Plasma Big Data", WM-CS-2015-01, Department of Computer Science, College of William and Mary, 2015,

David H. Bailey, Stephanie Ger, Marcos Lopez de, Alexander Sim, Kesheng Wu, "Statistical Overfitting and Backtest Performance", Quantitative Finance, 2015,

http://ssrn.com/abstract=2507040

2014

Qian Sun, Fan Zhang, Tong Jin, Hoang Bui, Kesheng Wu, Arie Shoshani, Hemanth Kolla, Scott Klasky, Jacqueline Chen and Manish Parashar, "Scalable Run-time Data Indexing and Querying for Scientific Simulations", Proceedings of the Fifth International Workshop on Big Data Analytics: Challenges, and Opportunities (BDAC’14), 2014,

L. Wu, K. Wu, A. Sim, M. Churchill, J. Y. Choi, A. Stathopoulos, CS Chang, S. Klasky, "High-Performance Outlier Detection Algorithm for Finding Blob-Filaments in Plasma", 5th International Workshop on Big Data Analytics: Challenges, and Opportunities (BDAC’14), 2014,

L. Wu, K. Wu, A. Sim, A. Stathopoulos, "Real-Time Outlier Detection Algorithm for Finding Blob-Filaments in Plasma", Super Computing 2014, ACM SRC, 2014,

John Wu, Alex Sim, Lingfei Wu, Abraham Frankl, Scott Klasky, Jong Y Choi, CS Chang, Michael Churchill, "Exercising ICEE Framework with Fusion Blob Detection", DOE/ASCR NGNS PI meeting, 2014,

Hsuan-Te Chiu, Jerry Chou, Venkat Vishwanath, Surendra Byna, Kesheng Wu, "Simplifying index file structure to improve I/O performance of parallel indexing", Parallel and Distributed Systems (ICPADS), 2014 20th IEEE International Conference on, 2014, 576-583, doi: 10.1109/PADSW.2014.7097856

Jialin Liu, S. Byna, Bin Dong, Kesheng Wu, Chen, "Model-Driven Data Layout Selection for Improving Read", Parallel Distributed Processing Symposium Workshops 2014 IEEE International, 2014, 1708--1716, doi: 10.1109/IPDPSW.2014.190

Jung Heon Song, Marcos L\ opez de Prado, Horst Simon, Kesheng Wu, "Exploring Irregular Time Series Through Non-uniform Fourier Transform", WHPCF 14, Piscataway, NJ, USA, IEEE Press, 2014, 37--44, doi: 10.1109/WHPCF.2014.8

Spyros Blanas, Kesheng Wu, Surendra Byna, Bin Dong, Arie Shoshani, "Parallel Data Analysis Directly on Scientific File Formats", SIGMOD 14, 2014, 385--396, doi: 10.1145/2588555.2612185

F. Rusu, P. Nugent, K. Wu, "Implementing the Palomar Transient Factory Real-Time Pipeline in GLADE: Results and", Lecture Notes in Computer Science, ( 2014) Pages: 53--66

Jung Heon Song, Kesheng Wu, Horst D Simon, "Parameter Analysis of the VPIN (Volume synchronized of Informed Trading) Metric", Quantitative Financial Risk Management: Theory and, 2014,

David H. Bailey, Stephanie Ger, Marcos L\ opez Prado, Alexander Sim, Kesheng Wu, "Statistical Overfitting and Backtest Performance", http://ssrn.com/abstract2507040, ( January 1, 2014)

ISBN 978-1-78548-008-9

Bin Dong, S. Byna, Kesheng Wu, "Parallel query evaluation as a Scientific Data Service", Cluster Computing (CLUSTER), 2014 IEEE International Conference on, January 1, 2014, 194-202, doi: 10.1109/CLUSTER.2014.6968765

2013

Jong Y. Choi, Kesheng Wu, Jacky C. Wu, Alex Sim, Qing G. Liu, Matthew Wolf, CS Chang, Scott Klasky, "ICEE: Wide-area In Transit Data Processing Framework For Near Real-Time Scientific Applications", The 4th International Workshop on Big Data Analytics: Challenges and Opportunities (BDAC-13), 2013,

William Gu, Jaesik Choi, Ming Gu, Horst Simon, Kesheng Wu, "Fast Change Point Detection for Electricity Market Analysis", October 6, 2013, LBNL LBNL-6388E,

Bin Dong; Byna, S.; Kesheng Wu, "Expediting scientific data analysis with reorganization of data", 2013 IEEE International Conference on Cluster Computing (CLUSTER), pp.1,8, 23-27 Sept. 2013, September 1, 2013,

E. Wes Bethel, Prabhat, Suren Byna, Oliver Rübel, K. John Wu, and Michael Wehner, "Why High Performance Visual Data Analytics is both Relevant and Difficult", Proceedings of Visualization and Data Analysis 2013, IS&T/SPIE Electronic Imaging 2013, San Francisco, CA, USA, SPIE, February 2013, LBNL LBNL-6063E,

Kesheng Wu, Wes Bethel, Ming Gu, David, Oliver R\ ubel, Testing VPIN on Big Data, Available at SSRN 2318259, 2013,

Kuan-Wu Lin, Surendra Byna, Jerry Chou, Wu, "Optimizing FastQuery performance on Lustre file", Proceedings of the 25th International Conference on and Statistical Database Management, 2013, 29,

Alex Romosan, Arie Shoshani, Kesheng Wu, Markowitz, Kostas Mavrommatis, "Accelerating gene context analysis using bitmaps", Proceedings of the 25th International Conference on and Statistical Database Management, 2013, 26, LBNL 6397E, doi: 10.1145/2484838.2484856

Kesheng Wu, Wes Bethel, Ming Gu, David, Oliver R\ ubel, "A Big Data Approach to Analyzing Market Volatility", Algorithmic Finance, 2013, 2:241--267, LBNL LBNL-6382E, doi: 10.3233/AF-13030

Understanding the microstructure of the financial market requires the processing of a vast amount of data related to individual trades, and sometimes even multiple levels of quotes. Analyzing such a large volume of data requires tremendous computing power that is not easily available to financial academics and regulators. Fortunately, public funded High Performance Computing (HPC) power is widely available at the National Laboratories in the US. In this paper we demonstrate that the HPC resource and the techniques for data-intensive sciences can be used to greatly accelerate the computation of an early warning indicator called Volume-synchronized Probability of Informed trading (VPIN). The test data used in this study contains five and a half year's worth of trading data for about 100 most liquid futures contracts, includes about 3 billion trades, and takes 140GB as text files. By using (1) a more efficient file format for storing the trading records, (2) more effective data structures and algorithms, and (3) parallelizing the computations, we are able to explore 16,000 different ways of computing VPIN in less than 20 hours on a 32-core IBM DataPlex machine. Our test demonstrates that a modest computer is sufficient to monitor a vast number of trading activities in real-time -- an ability that could be valuable to regulators.

Our test results also confirm that VPIN is a strong predictor of liquidity-induced volatility. With appropriate parameter choices, the false positive rates are about 7% averaged over all the futures contracts in the test data set. More specifically, when VPIN values rise above a threshold (CDF > 0.99), the volatility in the subsequent time windows is higher than the average in 93% of the cases.

B. Dong, S. Byna, K. Wu, "SDS: a framework for scientific data services", Proceedings of the 8th Parallel Data Storage, January 1, 2013, doi: http://dx.doi.org/10.1145/2538542.2538563

2012

Surendra Byna, Jerry Chou, Oliver Rübel, Prabhat, Homa Karimabadi, William S. Daughton, Vadim Roytershteyn, E. Wes Bethel, Mark Howison, Ke-Jou Hsu, Kuan-Wu Lin, Arie Shoshani, Andrew Uselton, and Kesheng Wu, "Parallel I/O, Analysis, and Visualization of a Trillion Particle Simulation", SuperComputing 2012 (SC12), Salt Lake City, Utah, November 2012,

Prabhat, Oliver Rübel, Surendra Byna, Kesheng Wu, Fuyu Li, Michael Wehner and E. Wes Bethel, "TECA: A Parallel Toolkit for Extreme Climate Analysis", Procedia Computer Science, Proceedings of the International Conference on Computational Science, ICCS 2012, Presented at Third Worskhop on Data Mining in Earth System Science (DMESS 2012), Omaha, Nebraska, June 2012, 9:866–876, LBNL 5352E, doi: 10.1016/j.procs.2012.04.093

We present TECA, a parallel toolkit for detecting extreme events in large climate datasets. Modern climate datasets expose parallelism across a number of dimensions: spatial locations, timesteps and ensemble members. We design TECA to exploit these modes of parallelism and demonstrate a prototype implementation for detecting and tracking three classes of extreme events: tropical cyclones, extra-tropical cyclones and atmospheric rivers. We process a modern TB-sized CAM5 simulation dataset with TECA, and demonstrate good runtime performance for the three case studies.

Benson Ma, Arie Shoshani, Alex Sim, Kesheng, Yong-Ik Byun, Jaegyoon Hahm, Min-Su Shin, "Efficient Attribute-Based Data Access in Astronomy", The 2nd International Workshop on Network-Aware Data Workshop (NDM2012), 2012, 562--571,

Ichitaro Yamazaki, Kesheng Wu, "A Communication-Avoiding Thick-Restart Lanczos Method a Distributed-Memory System", Lecture Notes in Computer Science, 2012, 7155:345--354, doi: 10.1007/978-3-642-29737-3_39

E. W. Bethel, Surendra Byna, Jerry Chou, Cormier-Michel, Cameron G. R. Geddes, Howison, Fuyu Li, Prabhat, Ji Qiang, R\ ubel, Rob D. Ryne, Michael Wehner, Wu, "Big Data Analysis and Visualization: What Do LINACS Tropical Storms Have In Common?", 11th International Computational Accelerator Physics ICAP 2012, Germany, 2012,

G. F. Lofstead, Q. Liu, J. Logan, Y. Tian, Abbasi, N. Podhorszki, J. Y. Choi, S., R. Tchoua, R. A. Oldfield, others, "Hello ADIOS: The Challenges and Lessons of Leadership Class I/O Frameworks", 2012,

Allen R. Sanderson, Brad Whitlock, Oliver, Hank Childs, Gunther H. Weber, , Kesheng Wu, "A System for Query Based Analysis and Visualization", Third International Eurovis Workshop on Visual EuroVA 2012, Vienna, Austria, January 2012, LBNL 5507E,

E. W. Bethel and D. Leinweber and O. Rubel and K. Wu, "Federal Market Information Technology in the Post Flash Crash Era: Roles of Supercomputing", The Journal of Trading, 2012, 7:9-24, LBNL 5263E, doi: 10.3905/jot.2012.7.2.009

E. Pourabbas, A. Shoshani, K. Wu, "Minimizing index size by reordering rows and columns", SSDBM, Springer Berlin/Heidelberg, January 2012, 467--484,

2011

Suren Byna, Prabhat, Michael F. Wehner and Kesheng Wu, "Detecting Atmospheric Rivers in Large Climate Datasets", Proceedings of the 2nd International Workshop on Petascale Data Analytics: Challenges, and Opportunities (PDAC-11/ Supercomputing11/ ACM/IEEE), November 14, 2011, Seattle, Washington, 2011, doi: 10.1145/2110205.2110208

Extreme precipitation events on the western coast of North America are often traced to an unusual weather phenomenon known as atmospheric rivers. Although these storms may provide a significant fraction of the total water to the highly managed western US hydrological system, the resulting intense weather poses severe risks to the human and natural infrastructure through severe flooding and wind damage. To aid the understanding of this phenomenon, we have developed an efficient detection algorithm suitable for analyzing large amounts of data. In addition to detecting actual events in the recent observed historical record, this detection algorithm can be applied to global climate model output providing a new model validation methodology. Comparing the statistical behavior of simulated atmospheric river events in models to observations will enhance confidence in projections of future extreme storms. Our detection algorithm is based on a thresholding condition on the total column integrated water vapor established by Ralph et al. (2004) followed by a connected component labeling procedure to group the mesh points into connected regions in space. We develop an efficient parallel implementation of the algorithm and demonstrate good weak and strong scaling. We process a 30-year simulation output on 10,000 cores in under 3 seconds.

E. Wes Bethel, David Leinweber, Oliver Rübel, Kesheng Wu, "Federal Market Information Technology in the Post Flash Crash Era: Roles of Supercomputing", Workshop on High Performance Computational Finance at SC11, Seattle, WA, USA, November 2011, LBNL 5263E,

Jerry Chou, Kesheng Wu, Oliver Rübel, Mark Howison, Ji Qiang, Prabhat, Brian Austin, E. Wes Bethel, Rob D. Ryne, and Arie Shoshani, "Parallel Index and Query for Large Scale Data Analysis", In Proceedings of Supercomputing 2011, Seattle, WA, USA, 2011, 1-11, LBNL 5317E, doi: 10.1145/2063384.2063424

R. Ryne, B. Austin, J. Byrd, J. Corlett, E. Esarey, C. G. R. Geddes, W. Leemans, X. Li, Prabhat, J. Qiang, O. Rübel, J.-L. Vay, M. Venturini, K. Wu, B. Carlsten, D. Higdon and N. Yampolsky, "High Performance Computing in Accelerator Science: Past Successes, Future Challenges", Workshop on Data and Communications in Basic Energy Sciences: Creating a Pathway for Scientific Discovery, October 2011,

Prabhat, Suren Byna. Chris Paciorek, Gunther Weber, Kesheng Wu, Thomas Yopes, Michael Wehner, William Collins, George Ostrouchov, Richard Strelitz, E. Wes Bethel, "Pattern Detection and Extreme Value Analysis on Large Climate Data", DOE/BER Climate and Earth System Modeling PI Meeting, September 2011,

A. Shoshani, I. Altintas, J. Chen, G. Chin, A. Choudhary, D. Crawl, T. Critchlow, K. Gao, B. Grimm, H. Iyer, C. Kamath, A. Khan, S. Klasky, S. Koehler, S. Lang, R. Latham, J. W. Li, W. Liao, J. Ligon, Q. Liu, B. Ludaescher, P. Mouallem, M. Nagappan, N. Podhorszki, R. Ross, D. Rotem, N. Samatova, C. Silva, A. Sim, R. Tchoua, R. Thakur, M. Vouk, K. Wu, W. Yu, "The Scientific Data Management Center: Available Technologies and Highlights", SciDAC Conference, 2011,

Kesheng Wu, Rishi R Sinha, Chad Jones, Ethier, Scott Klasky, Kwan-Liu Ma, Shoshani, Marianne Winslett, "Finding regions of interest on toroidal meshes", Computational Science \& Discovery, 2011, 4:015003, doi: 10.1088/1749-4699/4/1/015003

Kesheng Wu, Surendra Byna, Doron Rotem, Arie, "Scientific Data Services -- A High-Performance I/O with Array Semantics", HPCDB, IEEE, 2011, doi: 10.11v45/2125636.2125640

J. Chou, K. Wu, O. R\ ubel, M. Howison, Qiang, Prabhat, B. Austin, E. W. Bethel, D. Ryne, A. Shoshani, "Parallel Index and Query for Large Scale Data", SC11, 2011, doi: 10.1145/2063384.2063424

Jinoh Kim, Hasan Abbasi, Luis Chac\ on, Docan, Scott Klasky, Qing Liu, Norbert, Arie Shoshani, Kesheng Wu, "Parallel In Situ Indexing for Data-intensive", LDAV, 2011, 65--72, doi: 10.1109/LDAV.2011.6092319

Jerry Chou, Kesheng Wu, Prabhat, "FastQuery: A General Indexing and Querying System Scientific Data", SSDBM, 2011, 573--574, doi: 10.1007/978-3-642-22351-8_42

Jerry Chou, Kesheng Wu, Prabhat, "FastQuery: A Parallel Indexing System for Data", IASDS, IEEE, 2011, doi: 10.1109/CLUSTER.2011.86

Jerry Chuo, John Wu, Prabhat, "FastQuery: A Parallel Indexing System for Scientific Data", Workshop on Interfaces and Abstractions for Scientific Data Storage, IEEE Cluster, 2011,

Kamesh Madduri, Kesheng Wu, "Massive-Scale RDF Processing Using Compressed Bitmap", SSDBM, Springer, 2011, 470--479, doi: 10.1007/978-3-642-22351-8_30

2010

D. Hasenkamp, A. Sim, M. Wehner and K. Wu, "Finding Tropical Cyclones on a Cloud Computing Cluster: Using Parallel Virtualization for Large-Scale Climate Simulation Analysis", Proceedings of the 2nd IEEE International Conference on Cloud Computing Technology and Science, Nov. 30-Dec. 3, 2010, Indianapolis, Indiana, 2010, LBNL 4218E,

 

 

D. Hasenkamp, A. Sim, M. Wehner, K. Wu, "Finding Tropical Cyclones on Clouds", Supercomputing 2010, ACM SRC 3rd place, 2010,

Oliver Rübel, Sean Ahern, E. Wes Bethel, Mark. D Biggin, Hank Childs, Estelle Cormier-Michel, Angela DePace, Michael B. Eisen, Charless C. Fowlkes, Cameron G. R. Geddes, Hans Hagen, Bernd Hamann, Min-Yu Huang, Soile V. E. Keränen, David W. Knowles, Cris L. Luengo Hendriks, Jitendra Malik, Jeremy Meredith, Peter Messmer, Prabhat, Daniela Ushizima, Gunther H. Weber, and Kesheng Wu, "Coupling Visualization and Data Analysis for Knowledge Discovery from Multi-dimensional Scientific Data", Procedia Computer Science, Proceedings of International Conference on Computational Science, ICCS 2010, June 2010, LBNL 3669E,

G. H. Weber, S. Ahern, E.W. Bethel, S. Borovikov, H.R. Childs, E. Deines, C. Garth, H. Hagen, B. Hamann, K.I. Joy, D. Martin, J. Meredith, Prabhat, D. Pugmire, O. Rübel, B. Van Straalen and K. Wu, "Recent Advances in VisIt: AMR Streamlines and Query-Driven Visualization", Numerical Modeling of Space Plasma Flows: Astronum-2009 (Astronomical Society of the Pacific Conference Series, 3185E, 2010, 429:329-334,

Kesheng Wu, Arie Shoshani, Kurt Stockinger, "Analyses of multi-level and multi-component compressed indexes", ACM Transactions on Database Systems, ACM, 2010, 35:1--52, doi: 10.1145/1670243.1670245

Kesheng Wu, Kamesh Madduri, Shane Cannon, "Multi-level bitmap indexes for flash memory storage", IDEAS, 2010, doi: 10.1145/1866480.1866497

Ichitaro Yamazaki, Zhaojun Bai, Horst D. Simon Lin-Wang Wang, Kesheng Wu, "Adaptive Projection Subspace Dimension for the Lanczos Method", ACM Transactions on Mathematical Software, 2010, 37, doi: 10.1145/1824801.1824805

2009

Luke J. Gosink, Kesheng Wu, E. Wes Bethel, John D. Owens, Kenneth I. Joy, "Data Parallel Bin-Based Indexing for Answering Queries on Multi-core Architectures", SSDBM 2009, 2009, 110-129,

 

 

M. Nagappan, Kesheng Wu, M. A.Vouk, "Efficiently Extracting Operational Profiles from Execution Logs Using Suffix Arrays", 20th International Symposium on Software Reliability Engineering (ISSRE '09), November 1, 2009, doi: 10.1109/ISSRE.2009.23

An important software reliability engineering tool is operational profiles. In this paper we propose a cost effective automated approach for creating second generation operational profiles using execution logs of a software product. Our algorithm parses the execution logs into sequences of events and produces an ordered list of all possible subsequences by constructing a suffix array of the events. The difficulty in using execution logs is that the amount of data that needs to be analyzed is often extremely large (more than a million records per day in many applications). Our approach is very efficient. We show that our approach requires O(N) in space and time to discover all possible patterns in N events. We discuss a practical implementation of the algorithm in the context of the logs from a large cloud computing system.

O. Rübel, C.G.R. Geddes, E. Cormier-Michel, K. Wu, Prabhat, G.H. Weber, D.M. Ushizima, P. Messmer, H. Hagen, B. Hamann, and E.W. Bethel, "Automatic Beam Path Analysis of laser Wakefield Particle Acceleration Data", IOP Computational Science & Discovery, November 2009, 2, LBNL 2734E,

E. W. Bethel, C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, M. Day, E. Deines, T. Fogal, C. Garth, C. G. R. Geddes, H. Hagen, B. Hamann, C. Hansen, J. Jacobsen, K. Joy, J. Kruger, J. Meredith, P. Messmer, G. Ostrouchov, V. Pascucci, K. Potter, Prabhat, D. Pugmire, O. Rubel, A. Sanderson, C. Silva, D. Ushizima, G. Weber, B. Whitlock, K. Wu, "Occam's Razor and Petascale Visual Data Analysis", SciDAC 2009, J. of Physics: Conference Series, San Diego, California, July 2009, LBNL 2210E,

E. Wes Bethel, Oliver Rübel, Prabhat, Kesheng Wu, Gunther H. Weber, Valerio Pascucci, Hank Childs, Ajith Mascarenhas, Jeremy Meredith, and Sean Ahern, "Modern Scientific Visualization is More than Just Pretty Pictures", Numerical Modeling of Space Plasma Flows: Astronum-2008 (Astronomical Society of the Pacific Conference Series, St. Thomas, USVI, June 2009, 301-317, LBNL 1450E,

Luke Gosink, Kesheng Wu, E. Wes Bethel, John D. Owens, Kenneth I. Joy, "Data Parallel Bin-based Indexing for Answering Queries on Multi-core Architecture", Proceedings of the 21st International Conference on Scientific and Statistical Database Management (SSDBM), June 2009, 5566:110-129, LBNL 2211E,

Lifeng He, Yuyan Chao, Kenji Suzuki, Kesheng, "Fast Connected-Component Labeling", Pattern Recognition, 2009, 42:1977--1987, doi: 10.1016/j.patcog.2008.10.013

K Wu et al., "FastBit: Interactively Searching Massive Data", SciDAC 2009, 2009, LBNL 2164E, doi: 10.1088/1742-6596/180/1/012053

C. G. R. Geddes, E Cormier-Michel, E. H. Esarey, C. B. Schroeder, J.-L. Vay, W. P. Leemans, D. L.. Bruhwiler, J. R. Cary, B. Cowan, M. Durant, P. Hamill, P. Messmer, P. Mullowney, C. Nieter, K. Paul, S. Shasharina, S. Veitzer, G. Weber, O. Rübel, D. Ushizima, Prabhat, E. W.Bethel, K. Wu, Large Fields for Smaller Facility Sources, SciDAC Review, Pages: 13-21, 2009,

Kesheng Wu, Ekow Otoo, Kenji Suzuki, "Optimizing two-pass connected-component labeling", Pattern Analysis \& Applications, 2009, 12:117--135,

2008

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "High Performance Multivariate Visual Data Exploration for Extemely Large Data", Supercomputing (SC), Austin, Texas, USA, November 2008, LBNL 716E,

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "Application of High-performance Visual Analysis Methods to Laser Wakefield Particle Acceleration Data", IEEE Visualization 2008, October 2008,

Luke J. Gosink, Kesheng Wu, E. Wes Bethel, John D. Owens, Kenneth I. Joy, "Bin-Hash Indexing: A Parallel Method for Fast Query Processing", 2008, LBNL 729E,

I. Yamazaki, K. Wu, H. Simon, "nu-TRLan User Guide version 1.0", 2008, LBNL 1288E,

Kurt Stockinger, John Cieslewicz, Kesheng Wu, Rotem, Arie Shoshani, "Using Bitmap Indexing Technology for Combined and Text Queries", Annals of Information Systems, (Springer: 2008) Pages: 1--23

Rishi Rakesh Sinha, Marianne Winslett, Kesheng, Kurt Stockinger, Arie Shoshani, "Adaptive Bitmap Indexes for Space-Constrained", ICDE 2008, 2008, 1418--1420,

Kesheng Wu, Kurt Stockinger, Arie Shosani, "Breaking the Curse of Cardinality on Bitmap Indexes", SSDBM 08, Springer, 2008, 348--365, doi: 10.1007/978-3-540-69497-7_23

Meiyappan Nagappan, Mladen A. Vouk, Kesheng Wu Alex Sim, Arie Shoshani, "Efficient Operational Profiling of Systems Using Arrays on Execution Logs", ISSRE, 2008, 313--314, doi: 10.1109/ISSRE.2008.45

E. Wes Bethel, Oliver R\ ubel, Prabhat, Wu, Gunther H. Weber, Valerio Pascucci Hank Childs, Ajith Mascarenhas, Jeremy, Sean Ahern, "Modern Scientific Visualization is More than Just Pictures", Numerical Modeling of Space Plasma Flows: (Astronomical Society of the Pacific Series), St. Thomas, USVI, 2008, 301--317,

2007

Kesheng Wu, "FastBit Reference Manual", 2007, LBNL LBNL PUB/3192,

Kesheng Wu, Kurt Stockinger, Arie Shoshani, Performance of Multi-Level and Multi-Component Bitmap Indexes, 2007, doi: 10.1145/1670243.1670245

Frederick Reiss, Kurt Stockinger, Kesheng Wu, Shoshani, Joseph M. Hellerstein, "Enabling Real-Time Querying of Live and Historical Data", SSDBM 2007, 2007,

2006

Kesheng Wu, Ekow Otoo, Arie Shoshani, "Optimizing bitmap indices with efficient compression", ACM Transactions on Database Systems, 2006, 31:1--38, doi: 10.1145/1132863.1132864

K. Wu, K. Stockinger, A. Shoshani, Wes, "FastBit--Helps Finding the Proverbial Needle in a", 2006, LBNL LBNL-PUB/963,

Kurt Stockinger, Kesheng Wu, "Bitmap Indices for Data Warehouses", Data Warehouses and OLAP: Concepts, Architectures and, (Idea Group, Inc.: 2006) Pages: 179--202

Kurt Stockinger, Kesheng Wu, Rene Brun, Canal, "Bitmap indices for fast end-user physics analysis in", Nuclear Instruments and Methods in Physics Research A: Accelerators, Spectrometers, Detectors and Equipment, 2006, 559:99--102,

Luke Gosink, John Shalf, Kurt Stockinger, Wu, Wes Bethel, "HDF5-FastQuery: Accelerating Complex Queries on Datasets using Fast Bitmap Indices", SSDBM 2006, Vienna, Austria, July 2006, IEEE Computer Society Press., 2006, 149--158,

F. Reiss, K. Stockinger, K. Wu, A. Shoshani J. M. Hellerstein, "Efficient analysis of live and historical streaming and its application to cybersecurity", 2006,

2005

Kurt Stockinger, John Shalf, Wes Bethel, Wu, "Query-Driven Visualization of Large Data Sets", IEEE Visualization 2005, Minneapolis, MN, October 2005, 2005, 22, doi: 10.1109/VIS.2005.84

K. Wu, E. Otoo, "A simpler proof of the average case complexity of with path compression", 2005,

Kesheng Wu, Junmin Gu, Jerome Lauret, Arthur Poskanzer, Arie Shoshani, Alexander Sim, Zhang, "Grid Collector: Facilitating Efficient Selective from Data Grids", International Supercomputer Conference 2005, 2005,

Kesheng Wu, "FastBit: an efficient indexing technology for data-intensive science", Journal of Physics: Conference Series, IOP Publishing, 2005, 16:556--560, LBNL LBNL-2164E, doi: 10.1088/1742-6596/16/1/077

Kesheng Wu, Ekow Otoo, Arie Shoshani, "Optimizing Connected Component Labeling Algorithms", Proceedings of SPIE Medical Imaging Conference 2005, Diego, CA, 2005,

Kesheng Wu, Ekow Otoo, Kenji Suzuki, "Two Strategies to Speed up Connected Component Algorithms", 2005,

E. Wes Bethel, Scott Campbell, Eli Dart, Lee, Steven A. Smith, Kurt Stockinger, Tierney, Kesheng Wu, "Interactive Analysis of Large Network Data Collections Query-Driven Visualization", 2005,

2004

K. Wu, W. Zhang, A. Sim, J. Gu, A. Shoshani, "Grid Collector: an Event Catalog with Automated File Management", 2004, LBNL 55563,

Kesheng Wu, Wei-Ming Zhang, Victor, Jerome Lauret, Arie Shoshani, "The Grid Collector: Using an Event Catalog to Speed up Analysis in Distributed Environment", Proceedings of Computing in High Energy and Nuclear (CHEP) 2004, 2004,

K. Wu, A. Shoshani, E. J. Otoo, Word aligned bitmap compression method, data and apparatus, US Patent 6,831,575, 2004,

2003

Kesheng Wu, Wei-Ming Zhang, Alexander Sim, Gu, Arie Shoshani, "Grid Collector: An Event Catalog With Automated File", Proceedings of IEEE Nuclear Science Symposium 2003, 2003, doi: 10.1109/NSSMIC.2003.1351830

2002

Kesheng Wu, Ekow Otoo, Arie Shoshani, "An Efficient Compression Scheme For Bitmap Indices", 2002,

2001

L. Bernardo, B. Gibbard, D. Malon, H. Nordberg, D. Olson, R. Porter, A. Shoshani, A. Sim, A. Vaniachine, T. Wenaus, K. Wu, D. Zimmerman, "New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC", Journal of Computer Physics Communications, 2001,

2000

L. M. Bernardo, B. Gibbard, D. Malon, H. Nordberg, D. Olson, R. Porter, A. Shoshani, A. Sim, A. Vaniachine, T. Wenaus, K. Wu, D. Zimmerman, "New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC", Computing in High Energy Physics, 2000,

1993

Kesheng Wu, Robert Savit, William Brock, "Statistical tests for deterministic effects in broad time series", Physica D, 1993, 69:172--188, doi: 10.1016/0167-2789(93)90188-7

Wucherl Yoo

2017

Jonathan Wang, Wucherl Yoo, Alex Sim, Peter Nugent, K. John Wu, "Parallel Variable Selection for Effective Performance Prediction", the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid2017), 2017, doi: 10.1109/CCGRID.2017.47

J. Kim, W. Yoo, A. Sim, S.C. Suh, I. Kim, "A Lightweight Network Anomaly Detection Technique", International Workshop on Computing, Networking and Communications (CNC 2017), 2017, doi: 10.1109/ICCNC.2017.7876251

2016

J. Wang, W. Yoo (Advisor), A. Sim (Advisor), K. Wu (Advisor), "Analysis of Variable Selection Methods on Scientific Cluster Measurement Data", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), ACM Student Research Competition (SRC), Second place winner, 2016, 2016,

M. Bae, W. Yoo (Advisor), A. Sim (Advisor), K. Wu (Advisor), "Discovering Energy Resource Usage Patterns on Scientific Clusters", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), ACM Student Research Competition (SRC), Third place winner, 2016, 2016,

W. Yoo, M. Koo, Y. Cao, A. Sim, P. Nugent, K. Wu, "Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters", Conquering Big Data with High Performance Computing, edited by R. Arora, (Springer International: 2016) Pages: 139-161 doi: 10.1007/978-3-319-33742-5

W. Yoo, B. Foster, A. Sim, K. Wu, "Machine Learning Based Job Status Prediction in Scientific Clusters", IEEE SAI Computing Conference, 2016, 44-53, doi: 10.1109/SAI.2016.7555961

2015

W. Yoo, M. Koo, Y. Cao, A. Sim, P. Nugent, K. Wu, "PATHA: Performance Analysis Tool for HPC Applications", the 34th IEEE International Performance Computing and Communications Conference (IPCCC 2015), 2015,

M. Koo, W. Yoo (advisor), A. Sim (advisor), "I/O Performance Analysis Framework on Measurement Data from Scientific Clusters", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15), ACM Student Research Competition (SRC), 2015, 2015,

W. Yoo, A. Sim, "Network Bandwidth Utilization Forecast Model on High Bandwidth Networks", IEEE International Conference on Computing, Networking and Communications (ICNC’15), 2015,

2014

W. Yoo, A. Sim, "Efficient Changing Pattern Detection on High Bandwidth Network Measurements", 7th International Conference on Grid and Distributed Computing, 2014,

2013

M. Montanari, E. Chan, K. Larson, W. Yoo, R. H. Campbell, "Distributed security policy conformance", Computers & Security, March 31, 2013,

2012

W. Yoo, K. Larson, L. Baugh, S. Kim, R. H. Campbell, "ADP: automated diagnosis of performance pathologies using hardware events", SIGMETRICS '12: Proc. of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems, 2012,

2011

W. Yoo, K. Larson, L. Baugh, S. Kim, W. Ahn, R. H. Campbell, "Automated Fingerprinting of Performance Pathologies Using Performance Monitoring Units (PMUs)", HotPar'11: Proc. of USENIX Workshop on Hot topics in parallelism., May 26, 2011,

M. Montanari, E. Chan, K. Larson, W. Yoo, R. H. Campbell, "Distributed security policy conformance", Future Challenges in Security and Privacy for Academia and Industry, January 1, 2011,

2010

W. Yoo, S. Shi, W. J. Jeon, K. Nahrstedt, R. H. Campbell, "Real-time parallel remote rendering for mobile devices using graphics processing units", ICME '10: IEEE International Conference onMultimedia and Expo, July 19, 2010,

Thomas Rodman Yopes

2011

Prabhat, Suren Byna. Chris Paciorek, Gunther Weber, Kesheng Wu, Thomas Yopes, Michael Wehner, William Collins, George Ostrouchov, Richard Strelitz, E. Wes Bethel, "Pattern Detection and Extreme Value Analysis on Large Climate Data", DOE/BER Climate and Earth System Modeling PI Meeting, September 2011,

Other

2016

Bin Dong, Surendra Byna, Kesheng Wu, SDS-Sort: Scalable Dynamic Skew-aware Parallel, HPDC 16, Pages: 57--68 2016, doi: 10.1145/2907294.2907300

2012

E. Wes Bethel, David Leinweber, Oliver R\ ubel Kesheng Wu, Federal Market Information Technology in the Crash Era: Roles for Supercomputing, The Journal of Trading, Pages: 9--25 2012, doi: 10.3905/jot.2012.7.2.009

2011

Prabhat, Quincey Koziol, Karen Schuchardt, E. Bethel, Jerry Chuo, Mark Howison, Mike, Bruce Palmer, Oliver Ruebel, Kesheng, ExaHDF5: An I/O Platform for Exascale Data Analysis and Performance, SciDAC 2011, 2011,

2010

Oliver R\ ubel, Sean Ahern, E. Wes Bethel, D. Biggin, Hank Childs, Estelle, Angela DePace, Michael B. Eisen Charless C. Fowlkes, Cameron G. R. Geddes, Hagen, Bernd Hamann, Min-Yu Huang, Soile E. Ker\ anen, David W. Knowles, Cris L. Hendriks, Jitendra Malik, Jeremy Meredith Peter Messmer, Prabhat, Daniela Ushizima, H. Weber, Kesheng Wu, Coupling visualization and data analysis for knowledge from multi-dimensional scientific data, Procedia Computer Science, Pages: 1751--1758 2010, doi: 10.1016/j.procs.2010.04.197