Careers | Phone Book | A - Z Index

John Wu

JohnWu1702d
Kesheng (John) Wu
Group Leader
Phone: +1 510 486 6609
Fax: +1 510 486 4004
Berkeley Lab
One Cyclotron Road
MS50B-3238
Berkeley, CA 94720 US

John Wu is currently working on indexing technology for searching large datasets. He primarily focuses on improving bitmap index technology with compression, encoding and binning. He is the key developer of FastBit bitmap indexing software, which has been used in a number of applications including High-Energy physics, combustion, network security, and query-driven visualization.  John has also been working on a number of scientific computing projects including developing Thick-Restart Lanczos (TRLan) algorithm for solving eigenvalue problems and devising statistical tests for deterministic effects in broad band time series.  John received a Ph.D. in computer science from the University of Minnesota, an M.S. in physics from the University of Wisconsin-Milwaukee, and a B.S. in physics from Nanjing University, China.

Projects

FastBit

Make It A Bit Faster with FastBit

[Publications]: an efficient compressed bitmap index technology for data intensive sciences. This project addresses the challenges of efficiently searching growing amounts of data collected/generated by various scientific applications, such as high-energy physics, combustion, astrophysics, and network traffic analysis. The FastBit software has received an R&D 100 Award; here is a photo from the award receiption.

ICEE

The ICEE project aims to introduce the in-transit analysis capability into a collaborative workflow system by leveraging the in-transit capability of ADIOS and selective data access capability of FastBit.

CIFT

Bring CRD's breadth and depth of experience in supercomputing, data intensive science, visualization, financial engineering, and computer security to the study of modern markets.

ExaHDF5

To provide high performance I/O middleware that makes effective use of computational platforms, researching a number of optimization strategies and deploying them through the HDF5 software.

Connected Component Labeling

[Publications]: an efficient connected component labeling algorithm. This grows out our work on feature tracking for a combustion data analysis. The key new insight is that there is a way to make use of an implicit union-find data structure to speed up the connected component labeling algorithms, which in turn leads to faster algorithms for finding regions of interest. In particular, using compressed bitmaps as representations of points in the regions of interest, we can find the regions in time that is proportional to the the number of points on the boundary of the regions. This is faster than the best iso-contouring algorithms and much faster than similar region finding algorithms. This is also a basis of some of the work on visualization and visual analytics.

 

Journal Articles

Lingfei Wu, Kesheng Wu, Alex Sim, Michael Churchill, Jong Choi, Andreas Stathopoulos, Choong-Seock Chang, Scott Klasky, "Towards Real-Time Detection and Tracking of Spatio-Temporal Features: Blob-Filaments in Fusion Plasma", IEEE Transactions on Big Data (TBD), 2016, 2:3:262-275, doi: 10.1109/TBDATA.2016.2599929

Deborah A Agarwal, Boris Faybishenko, Vicky L, Harinarayan Krishnan, Carina Lansing Gary Kushner, Ellen Porter, Alexandru Romosan Arie Shoshani, Haruko Wainwright, Arthur, Kesheng Wu, "A Science Data Gateway for Environmental Management", Concurrency and Computation: Practice and Experience, 2016, 28:1994--2004, doi: 10.1002/cpe.3697

Jung Heon Song, Kesheng Wu, Horst D Simon, "Parameter Analysis of the VPIN (Volume synchronized of Informed Trading) Metric", Quantitative Financial Risk Management: Theory and, 2014,

Kesheng Wu, Wes Bethel, Ming Gu, David, Oliver R\ ubel, "A Big Data Approach to Analyzing Market Volatility", Algorithmic Finance, 2013, 2:241--267, LBNL LBNL-6382E, doi: 10.3233/AF-13030

Understanding the microstructure of the financial market requires the processing of a vast amount of data related to individual trades, and sometimes even multiple levels of quotes. Analyzing such a large volume of data requires tremendous computing power that is not easily available to financial academics and regulators. Fortunately, public funded High Performance Computing (HPC) power is widely available at the National Laboratories in the US. In this paper we demonstrate that the HPC resource and the techniques for data-intensive sciences can be used to greatly accelerate the computation of an early warning indicator called Volume-synchronized Probability of Informed trading (VPIN). The test data used in this study contains five and a half year's worth of trading data for about 100 most liquid futures contracts, includes about 3 billion trades, and takes 140GB as text files. By using (1) a more efficient file format for storing the trading records, (2) more effective data structures and algorithms, and (3) parallelizing the computations, we are able to explore 16,000 different ways of computing VPIN in less than 20 hours on a 32-core IBM DataPlex machine. Our test demonstrates that a modest computer is sufficient to monitor a vast number of trading activities in real-time -- an ability that could be valuable to regulators.

Our test results also confirm that VPIN is a strong predictor of liquidity-induced volatility. With appropriate parameter choices, the false positive rates are about 7% averaged over all the futures contracts in the test data set. More specifically, when VPIN values rise above a threshold (CDF > 0.99), the volatility in the subsequent time windows is higher than the average in 93% of the cases.

E. W. Bethel and D. Leinweber and O. Rubel and K. Wu, "Federal Market Information Technology in the Post Flash Crash Era: Roles of Supercomputing", The Journal of Trading, 2012, 7:9-24, LBNL 5263E, doi: 10.3905/jot.2012.7.2.009

G. F. Lofstead, Q. Liu, J. Logan, Y. Tian, Abbasi, N. Podhorszki, J. Y. Choi, S., R. Tchoua, R. A. Oldfield, others, "Hello ADIOS: The Challenges and Lessons of Leadership Class I/O Frameworks", 2012,

Kesheng Wu, Rishi R Sinha, Chad Jones, Ethier, Scott Klasky, Kwan-Liu Ma, Shoshani, Marianne Winslett, "Finding regions of interest on toroidal meshes", Computational Science \& Discovery, 2011, 4:015003, doi: 10.1088/1749-4699/4/1/015003

Ichitaro Yamazaki, Zhaojun Bai, Horst D. Simon Lin-Wang Wang, Kesheng Wu, "Adaptive Projection Subspace Dimension for the Lanczos Method", ACM Transactions on Mathematical Software, 2010, 37, doi: 10.1145/1824801.1824805

O. Rübel, C.G.R. Geddes, E. Cormier-Michel, K. Wu, Prabhat, G.H. Weber, D.M. Ushizima, P. Messmer, H. Hagen, B. Hamann, and E.W. Bethel, "Automatic Beam Path Analysis of laser Wakefield Particle Acceleration Data", IOP Computational Science & Discovery, November 2009, 2, LBNL 2734E,

Lifeng He, Yuyan Chao, Kenji Suzuki, Kesheng, "Fast Connected-Component Labeling", Pattern Recognition, 2009, 42:1977--1987, doi: 10.1016/j.patcog.2008.10.013

Kesheng Wu, Ekow Otoo, Kenji Suzuki, "Optimizing two-pass connected-component labeling", Pattern Analysis \& Applications, 2009, 12:117--135,

Kesheng Wu, Ekow Otoo, Arie Shoshani, "Optimizing bitmap indices with efficient compression", ACM Transactions on Database Systems, 2006, 31:1--38, doi: 10.1145/1132863.1132864

Kurt Stockinger, Kesheng Wu, Rene Brun, Canal, "Bitmap indices for fast end-user physics analysis in", Nuclear Instruments and Methods in Physics Research A: Accelerators, Spectrometers, Detectors and Equipment, 2006, 559:99--102,

L. Bernardo, B. Gibbard, D. Malon, H. Nordberg, D. Olson, R. Porter, A. Shoshani, A. Sim, A. Vaniachine, T. Wenaus, K. Wu, D. Zimmerman, "New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC", Journal of Computer Physics Communications, 2001,

Kesheng Wu, Horst Simon, "Thick-restart Lanczos method for large symmetric problems", SIAM J. Matrix Anal. Appl., 2000, 22:602--616,

Kesheng Wu, Horst Simon, "A Parallel Lanczos method for symmetric generalized problems", Computing and Visualization in Science, 1999, 2:37--46,

Kesheng Wu, Andrew Canning, Horst D. Simon, Wang, "Thick-Restart Lanczos method for electronic calculations", Journal of Computational Physics, 1999, 154:156--173,

Kesheng Wu, Robert Savit, William Brock, "Statistical tests for deterministic effects in broad time series", Physica D, 1993, 69:172--188, doi: 10.1016/0167-2789(93)90188-7

Conference Papers

Dongeun Lee, Alex Sim, Jaesik Choi, Kesheng Wu, "Improving Statistical Similarity Based Data Reduction for Non-Stationary Data", 29th International Conference on Scientific and Statistical Database Management (SSDBM2017), 2017, doi: 10.1145/3085504.3085583

Bin Dong, Kesheng Wu, Surendra Byna, Jialin Liu, Weijie Zhao, Florin Rusu, "ArrayUDF: User-Defined Scientific Data Analysis on Arrays", The ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC) 2017 (Acceptance rate:19%), June 26, 2017,

Jonathan Wang, Wucherl Yoo, Alex Sim, Peter Nugent, K. John Wu, "Parallel Variable Selection for Effective Performance Prediction", the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid2017), 2017, doi: 10.1109/CCGRID.2017.47

Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, and Peter Nugent, "Incremental View Maintenance over Array Data", In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17) (Acceptance rate: 20%). ACM, New York, NY, USA, May 14, 2017,

Ling Jin, Doris Lee, Alex Sim, Sam Borgeson, John Wu, Anna Spurlock, Annika Todd, "Comparison of Clustering Techniques for Residential Energy Behavior using Smart Meter Data", 2nd International Workshop on Artificial Intelligence for Smart Grids and Smart Buildings, In conjunction with AAAI 2017, 2017,

Bin Dong, Suren Byna, Kesheng Wu, Prabhat, Hans Johansen, Jeffrey N. Johnson, and Noel Keen, "Data Elevator: Low-contention Data Movement in Hierarchical Storage System", The 23rd annual IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC) (Acceptance rate: 25%), December 19, 2016,

Wenzhao Zhang, Houjun Tang, Stephen Ranshous, Surendra Byna, Daniel F Martín, Kesheng Wu, Bin Dong, Scott Klasky, Nagiza F Samatova, "Exploring memory hierarchy and network topology for runtime AMR data sharing across scientific applications", 2016 IEEE International Conference on Big Data (Big Data) (Acceptance rate: 19.39% as short papers.), December 5, 2016,

Houjun Tang, Suren Byna, Steve Harenberg, Wenzhao Zhang, Xiaocheng Zou, Daniel F Martin, Bin Dong, Dharshi Devendran, Kesheng Wu, David Trebotich, others, "In Situ Storage Layout Optimization for AMR Spatio-temporal Read Accesses", 2016 45th International Conference on Parallel Processing (ICPP) (Acceptance rate: 21.1%), August 16, 2016, 406--415,

W. Yoo, B. Foster, A. Sim, K. Wu, "Machine Learning Based Job Status Prediction in Scientific Clusters", IEEE SAI Computing Conference, 2016, 44-53, doi: 10.1109/SAI.2016.7555961

Bin Dong, Suren Byna, and Kesheng Wu,, "SDS-Sort: Scalable Dynamic Skew-aware Parallel Sorting", The ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC) 2016, July 1, 2016,

Dongeun Lee, Alex Sim, Jaesik Choi, Kesheng, "Novel Data Reduction Based on Statistical Similarity", International Conference on Scientific and Statistical Database Management (SSDBM'16), New York, NY, USA, ACM, 2016, 21:1--21:1, doi: 10.1145/2949689.2949708

D. Pugmire, J. Kress, H. Childs, M. Wolf, G. Eisenhauer, J. Low, R. M. Churchill, T. Kurc, K. Wu, A. Sim, J. Gu, J. Choi, S. Klasky, "Visualization and Analysis for Near-Real-Time Decision Making in Distributed Workflows", High Performance Data Analysis and Visualization Workshop (HPDAV2016) in conjunction with the 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2016), 2016, doi: 10.1109/IPDPSW.2016.175

Tzuhsien Wu, Shyng Hao, Jerry Chou, Bin Dong and Kesheng Wu, "Indexing Blocks to Reduce Space and Time Requirements for Searching Large Data Files", 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) 2016, May 16, 2016,

Houjun Tang, Suren Byna, Steve Harenberg, Xiaocheng Zou, Wenzhao Zhang, Kesheng Wu, Bin Dong, Oliver Rubel, Kristofer Bouchard, Scott Klasky, others, "Usage Pattern-Driven Dynamic Data Layout Reorganization", Cluster, Cloud and Grid Computing (CCGrid), 2016 16th IEEE/ACM International Symposium on, January 1, 2016, 356--365,

Wenzhao Zhang, Houjun Tang, Steve Harenberg, Surendra Byna, Xiaocheng Zou, Dharshi Devendran, Daniel F Martin, Kesheng Wu, Bin Dong, Scott Klasky, others, "AMRZone: A Runtime AMR Data Sharing Framework for Scientific Applications", Cluster, Cloud and Grid Computing (CCGrid), 2016 16th IEEE/ACM International Symposium on, January 1, 2016, 116--125,

Xiaocheng Zou, David Boyuka, Dhara Desai, Martin, Suren Byna, Kesheng Wu, Kushal, Bin Dong, Wenzhao Zhang, Houjun Tang Dharshi Devendran, David Trebotich, Scott, Hans Johansen, Nagiza Samatova, "AMR-aware In Situ Indexing and Scalable Querying", The 24th High Performance Computing Symposium (HPC, January 1, 2016,

Weijie Zhao, Florin Rusu, Bin Dong, Kesheng, "Similarity Join over Array Data", SIGMOD 16, New York, NY, USA, ACM, January 1, 2016, 2007--2022, doi: 10.1145/2882903.2915247

T. Kim, D. Lee, J. Choi, A. Spurlock, A. Sim, A. Todd, K. Wu, "Extracting Baseline Electricity Usage Using Gradient Tree Boosting", International Conference on Big Data Intelligence and Computing (DataCom 2015), Best Paper Award, 2015,

W. Yoo, M. Koo, Y. Cao, A. Sim, P. Nugent, K. Wu, "PATHA: Performance Analysis Tool for HPC Applications", the 34th IEEE International Performance Computing and Communications Conference (IPCCC 2015), 2015,

Bin Dong, Suren Byna, and Kesheng Wu, "Heavy-tailed Distribution of Parallel I/O System Response Time", 10th Parallel Data Storage Workshop (PDSW) 2015, to be held in conjunction with SC15, 2015,

Jinoh Kim, Bin Dong, Suren Byna, and Kesheng Wu, "Security for the Scientific Data Service Framework", 2nd International Workshop on Privacy and Security of Big Data (PSBD 2015), in conjunction with IEEE BigData 2015, 2015,

Bin Dong, Suren Byna, and Kesheng Wu, "Spatially Clustered Join on Heterogeneous Scientific Data Sets", 2015 IEEE International Conference on Big Data (IEEE BigData 2015), IEEE, 2015,

Xiaocheng Zou, Kesheng Wu, David A. Boyuka, Daniel F. Martin, Suren Byna, Houjun, Kushal Bansal, Terry J. Ligocki, Hans Johansen, and Nagiza F. Samatova, "Parallel In Situ Detection of Connected Components Adaptive Mesh Refinement Data", Proceedings of the Cluster, Cloud and Grid Computing (CCGrid) 2015, 2015,

S. Shannigrahi, A. J. Barczyk, C. Papadopoulos, A. Sim, I. Monga, H. Newman, K. Wu, E. Yeh, "Named Data Networking in Climate Research and HEP Applications", 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015), 2015,

Qian Sun, Fan Zhang, Tong Jin, Hoang Bui, Kesheng Wu, Arie Shoshani, Hemanth Kolla, Scott Klasky, Jacqueline Chen and Manish Parashar, "Scalable Run-time Data Indexing and Querying for Scientific Simulations", Proceedings of the Fifth International Workshop on Big Data Analytics: Challenges, and Opportunities (BDAC’14), 2014,

L. Wu, K. Wu, A. Sim, M. Churchill, J. Y. Choi, A. Stathopoulos, CS Chang, S. Klasky, "High-Performance Outlier Detection Algorithm for Finding Blob-Filaments in Plasma", 5th International Workshop on Big Data Analytics: Challenges, and Opportunities (BDAC’14), 2014,

Spyros Blanas, Kesheng Wu, Surendra Byna, Bin Dong, Arie Shoshani, "Parallel data analysis directly on scientific file formats", Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14)., June 23, 2014, doi: 10.1145/2588555.2612185

Spyros Blanas, Kesheng Wu, Surendra Byna, Bin Dong, Arie Shoshani, "Parallel Data Analysis Directly on Scientific File Formats", SIGMOD 14, 2014, 385--396, doi: 10.1145/2588555.2612185

Hsuan-Te Chiu, Jerry Chou, Venkat Vishwanath, Surendra Byna, Kesheng Wu, "Simplifying index file structure to improve I/O performance of parallel indexing", Parallel and Distributed Systems (ICPADS), 2014 20th IEEE International Conference on, 2014, 576-583, doi: 10.1109/PADSW.2014.7097856

Bin Dong, S. Byna, Kesheng Wu, "Parallel query evaluation as a Scientific Data Service", Cluster Computing (CLUSTER), 2014 IEEE International Conference on, January 1, 2014, 194-202, doi: 10.1109/CLUSTER.2014.6968765

Jialin Liu, S. Byna, Bin Dong, Kesheng Wu, Chen, "Model-Driven Data Layout Selection for Improving Read", Parallel Distributed Processing Symposium Workshops 2014 IEEE International, 2014, 1708--1716, doi: 10.1109/IPDPSW.2014.190

Jung Heon Song, Marcos L\ opez de Prado, Horst Simon, Kesheng Wu, "Exploring Irregular Time Series Through Non-uniform Fourier Transform", WHPCF 14, Piscataway, NJ, USA, IEEE Press, 2014, 37--44, doi: 10.1109/WHPCF.2014.8

Jong Y. Choi, Kesheng Wu, Jacky C. Wu, Alex Sim, Qing G. Liu, Matthew Wolf, CS Chang, Scott Klasky, "ICEE: Wide-area In Transit Data Processing Framework For Near Real-Time Scientific Applications", The 4th International Workshop on Big Data Analytics: Challenges and Opportunities (BDAC-13), 2013,

Bin Dong; Byna, S.; Kesheng Wu, "Expediting scientific data analysis with reorganization of data", 2013 IEEE International Conference on Cluster Computing (CLUSTER), pp.1,8, 23-27 Sept. 2013, September 1, 2013,

E. Wes Bethel, Prabhat, Suren Byna, Oliver Rübel, K. John Wu, and Michael Wehner, "Why High Performance Visual Data Analytics is both Relevant and Difficult", Proceedings of Visualization and Data Analysis 2013, IS&T/SPIE Electronic Imaging 2013, San Francisco, CA, USA, SPIE, February 2013, LBNL LBNL-6063E,

Alex Romosan, Arie Shoshani, Kesheng Wu, Markowitz, Kostas Mavrommatis, "Accelerating gene context analysis using bitmaps", Proceedings of the 25th International Conference on and Statistical Database Management, 2013, 26, LBNL 6397E, doi: 10.1145/2484838.2484856

B. Dong, S. Byna, K. Wu, "SDS: a framework for scientific data services", Proceedings of the 8th Parallel Data Storage, January 1, 2013, doi: http://dx.doi.org/10.1145/2538542.2538563

Kuan-Wu Lin, Surendra Byna, Jerry Chou, Wu, "Optimizing FastQuery performance on Lustre file", Proceedings of the 25th International Conference on and Statistical Database Management, 2013, 29,

W. Gu, J. Choi, M. Gu, H. D. Simon, K., "Fast Change Point Detection for electricity market", 2013 IEEE International Conference on Big Data, 2013, 50--57, doi: 10.1109/BigData.2013.6691733

Surendra Byna, Jerry Chou, Oliver Rübel, Prabhat, Homa Karimabadi, William S. Daughton, Vadim Roytershteyn, E. Wes Bethel, Mark Howison, Ke-Jou Hsu, Kuan-Wu Lin, Arie Shoshani, Andrew Uselton, and Kesheng Wu, "Parallel I/O, Analysis, and Visualization of a Trillion Particle Simulation", SuperComputing 2012 (SC12), Salt Lake City, Utah, November 2012,

Prabhat, Oliver Rübel, Surendra Byna, Kesheng Wu, Fuyu Li, Michael Wehner and E. Wes Bethel, "TECA: A Parallel Toolkit for Extreme Climate Analysis", Procedia Computer Science, Proceedings of the International Conference on Computational Science, ICCS 2012, Presented at Third Worskhop on Data Mining in Earth System Science (DMESS 2012), Omaha, Nebraska, June 2012, 9:866–876, LBNL 5352E, doi: 10.1016/j.procs.2012.04.093

We present TECA, a parallel toolkit for detecting extreme events in large climate datasets. Modern climate datasets expose parallelism across a number of dimensions: spatial locations, timesteps and ensemble members. We design TECA to exploit these modes of parallelism and demonstrate a prototype implementation for detecting and tracking three classes of extreme events: tropical cyclones, extra-tropical cyclones and atmospheric rivers. We process a modern TB-sized CAM5 simulation dataset with TECA, and demonstrate good runtime performance for the three case studies.

Allen R. Sanderson, Brad Whitlock, Oliver, Hank Childs, Gunther H. Weber, , Kesheng Wu, "A System for Query Based Analysis and Visualization", Third International Eurovis Workshop on Visual EuroVA 2012, Vienna, Austria, January 2012, LBNL 5507E,

E. Pourabbas, A. Shoshani, K. Wu, "Minimizing index size by reordering rows and columns", SSDBM, Springer Berlin/Heidelberg, January 2012, 467--484,

Benson Ma, Arie Shoshani, Alex Sim, Kesheng, Yong-Ik Byun, Jaegyoon Hahm, Min-Su Shin, "Efficient Attribute-Based Data Access in Astronomy", The 2nd International Workshop on Network-Aware Data Workshop (NDM2012), 2012, 562--571,

Ichitaro Yamazaki, Kesheng Wu, "A Communication-Avoiding Thick-Restart Lanczos Method a Distributed-Memory System", Lecture Notes in Computer Science, 2012, 7155:345--354, doi: 10.1007/978-3-642-29737-3_39

E. W. Bethel, Surendra Byna, Jerry Chou, Cormier-Michel, Cameron G. R. Geddes, Howison, Fuyu Li, Prabhat, Ji Qiang, R\ ubel, Rob D. Ryne, Michael Wehner, Wu, "Big Data Analysis and Visualization: What Do LINACS Tropical Storms Have In Common?", 11th International Computational Accelerator Physics ICAP 2012, Germany, 2012,

Suren Byna, Prabhat, Michael F. Wehner and Kesheng Wu, "Detecting Atmospheric Rivers in Large Climate Datasets", Proceedings of the 2nd International Workshop on Petascale Data Analytics: Challenges, and Opportunities (PDAC-11/ Supercomputing11/ ACM/IEEE), November 14, 2011, Seattle, Washington, 2011, doi: 10.1145/2110205.2110208

Extreme precipitation events on the western coast of North America are often traced to an unusual weather phenomenon known as atmospheric rivers. Although these storms may provide a significant fraction of the total water to the highly managed western US hydrological system, the resulting intense weather poses severe risks to the human and natural infrastructure through severe flooding and wind damage. To aid the understanding of this phenomenon, we have developed an efficient detection algorithm suitable for analyzing large amounts of data. In addition to detecting actual events in the recent observed historical record, this detection algorithm can be applied to global climate model output providing a new model validation methodology. Comparing the statistical behavior of simulated atmospheric river events in models to observations will enhance confidence in projections of future extreme storms. Our detection algorithm is based on a thresholding condition on the total column integrated water vapor established by Ralph et al. (2004) followed by a connected component labeling procedure to group the mesh points into connected regions in space. We develop an efficient parallel implementation of the algorithm and demonstrate good weak and strong scaling. We process a 30-year simulation output on 10,000 cores in under 3 seconds.

E. Wes Bethel, David Leinweber, Oliver Rübel, Kesheng Wu, "Federal Market Information Technology in the Post Flash Crash Era: Roles of Supercomputing", Workshop on High Performance Computational Finance at SC11, Seattle, WA, USA, November 2011, LBNL 5263E,

Jerry Chou, Kesheng Wu, Oliver Rübel, Mark Howison, Ji Qiang, Prabhat, Brian Austin, E. Wes Bethel, Rob D. Ryne, and Arie Shoshani, "Parallel Index and Query for Large Scale Data Analysis", In Proceedings of Supercomputing 2011, Seattle, WA, USA, 2011, 1-11, LBNL 5317E, doi: 10.1145/2063384.2063424

A. Shoshani, I. Altintas, J. Chen, G. Chin, A. Choudhary, D. Crawl, T. Critchlow, K. Gao, B. Grimm, H. Iyer, C. Kamath, A. Khan, S. Klasky, S. Koehler, S. Lang, R. Latham, J. W. Li, W. Liao, J. Ligon, Q. Liu, B. Ludaescher, P. Mouallem, M. Nagappan, N. Podhorszki, R. Ross, D. Rotem, N. Samatova, C. Silva, A. Sim, R. Tchoua, R. Thakur, M. Vouk, K. Wu, W. Yu, "The Scientific Data Management Center: Available Technologies and Highlights", SciDAC Conference, 2011,

Kesheng Wu, Surendra Byna, Doron Rotem, Arie, "Scientific Data Services -- A High-Performance I/O with Array Semantics", HPCDB, IEEE, 2011, doi: 10.11v45/2125636.2125640

J. Chou, K. Wu, O. R\ ubel, M. Howison, Qiang, Prabhat, B. Austin, E. W. Bethel, D. Ryne, A. Shoshani, "Parallel Index and Query for Large Scale Data", SC11, 2011, doi: 10.1145/2063384.2063424

Jinoh Kim, Hasan Abbasi, Luis Chac\ on, Docan, Scott Klasky, Qing Liu, Norbert, Arie Shoshani, Kesheng Wu, "Parallel In Situ Indexing for Data-intensive", LDAV, 2011, 65--72, doi: 10.1109/LDAV.2011.6092319

Jerry Chou, Kesheng Wu, Prabhat, "FastQuery: A General Indexing and Querying System Scientific Data", SSDBM, 2011, 573--574, doi: 10.1007/978-3-642-22351-8_42

Jerry Chou, Kesheng Wu, Prabhat, "FastQuery: A Parallel Indexing System for Data", IASDS, IEEE, 2011, doi: 10.1109/CLUSTER.2011.86

Jerry Chuo, John Wu, Prabhat, "FastQuery: A Parallel Indexing System for Scientific Data", Workshop on Interfaces and Abstractions for Scientific Data Storage, IEEE Cluster, 2011,

Kamesh Madduri, Kesheng Wu, "Massive-Scale RDF Processing Using Compressed Bitmap", SSDBM, Springer, 2011, 470--479, doi: 10.1007/978-3-642-22351-8_30

D. Hasenkamp, A. Sim, M. Wehner and K. Wu, "Finding Tropical Cyclones on a Cloud Computing Cluster: Using Parallel Virtualization for Large-Scale Climate Simulation Analysis", Proceedings of the 2nd IEEE International Conference on Cloud Computing Technology and Science, Nov. 30-Dec. 3, 2010, Indianapolis, Indiana, 2010, LBNL 4218E,

 

 

Oliver Rübel, Sean Ahern, E. Wes Bethel, Mark. D Biggin, Hank Childs, Estelle Cormier-Michel, Angela DePace, Michael B. Eisen, Charless C. Fowlkes, Cameron G. R. Geddes, Hans Hagen, Bernd Hamann, Min-Yu Huang, Soile V. E. Keränen, David W. Knowles, Cris L. Luengo Hendriks, Jitendra Malik, Jeremy Meredith, Peter Messmer, Prabhat, Daniela Ushizima, Gunther H. Weber, and Kesheng Wu, "Coupling Visualization and Data Analysis for Knowledge Discovery from Multi-dimensional Scientific Data", Procedia Computer Science, Proceedings of International Conference on Computational Science, ICCS 2010, June 2010, LBNL 3669E,

G. H. Weber, S. Ahern, E.W. Bethel, S. Borovikov, H.R. Childs, E. Deines, C. Garth, H. Hagen, B. Hamann, K.I. Joy, D. Martin, J. Meredith, Prabhat, D. Pugmire, O. Rübel, B. Van Straalen and K. Wu, "Recent Advances in VisIt: AMR Streamlines and Query-Driven Visualization", Numerical Modeling of Space Plasma Flows: Astronum-2009 (Astronomical Society of the Pacific Conference Series, 3185E, 2010, 429:329-334,

Kesheng Wu, Arie Shoshani, Kurt Stockinger, "Analyses of multi-level and multi-component compressed indexes", ACM Transactions on Database Systems, ACM, 2010, 35:1--52, doi: 10.1145/1670243.1670245

Kesheng Wu, Kamesh Madduri, Shane Cannon, "Multi-level bitmap indexes for flash memory storage", IDEAS, 2010, doi: 10.1145/1866480.1866497

Luke J. Gosink, Kesheng Wu, E. Wes Bethel, John D. Owens, Kenneth I. Joy, "Data Parallel Bin-Based Indexing for Answering Queries on Multi-core Architectures", SSDBM 2009, 2009, 110-129,

 

 

M. Nagappan, Kesheng Wu, M. A.Vouk, "Efficiently Extracting Operational Profiles from Execution Logs Using Suffix Arrays", 20th International Symposium on Software Reliability Engineering (ISSRE '09), November 1, 2009, doi: 10.1109/ISSRE.2009.23

An important software reliability engineering tool is operational profiles. In this paper we propose a cost effective automated approach for creating second generation operational profiles using execution logs of a software product. Our algorithm parses the execution logs into sequences of events and produces an ordered list of all possible subsequences by constructing a suffix array of the events. The difficulty in using execution logs is that the amount of data that needs to be analyzed is often extremely large (more than a million records per day in many applications). Our approach is very efficient. We show that our approach requires O(N) in space and time to discover all possible patterns in N events. We discuss a practical implementation of the algorithm in the context of the logs from a large cloud computing system.

E. W. Bethel, C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, M. Day, E. Deines, T. Fogal, C. Garth, C. G. R. Geddes, H. Hagen, B. Hamann, C. Hansen, J. Jacobsen, K. Joy, J. Kruger, J. Meredith, P. Messmer, G. Ostrouchov, V. Pascucci, K. Potter, Prabhat, D. Pugmire, O. Rubel, A. Sanderson, C. Silva, D. Ushizima, G. Weber, B. Whitlock, K. Wu, "Occam's Razor and Petascale Visual Data Analysis", SciDAC 2009, J. of Physics: Conference Series, San Diego, California, July 2009, LBNL 2210E,

Luke Gosink, Kesheng Wu, E. Wes Bethel, John D. Owens, Kenneth I. Joy, "Data Parallel Bin-based Indexing for Answering Queries on Multi-core Architecture", Proceedings of the 21st International Conference on Scientific and Statistical Database Management (SSDBM), June 2009, 5566:110-129, LBNL 2211E,

E. Wes Bethel, Oliver Rübel, Prabhat, Kesheng Wu, Gunther H. Weber, Valerio Pascucci, Hank Childs, Ajith Mascarenhas, Jeremy Meredith, and Sean Ahern, "Modern Scientific Visualization is More than Just Pretty Pictures", Numerical Modeling of Space Plasma Flows: Astronum-2008 (Astronomical Society of the Pacific Conference Series, St. Thomas, USVI, June 2009, 301-317, LBNL 1450E,

K Wu et al., "FastBit: Interactively Searching Massive Data", SciDAC 2009, 2009, LBNL 2164E, doi: 10.1088/1742-6596/180/1/012053

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "High Performance Multivariate Visual Data Exploration for Extemely Large Data", Supercomputing (SC), Austin, Texas, USA, November 2008, LBNL 716E,

Rishi Rakesh Sinha, Marianne Winslett, Kesheng, Kurt Stockinger, Arie Shoshani, "Adaptive Bitmap Indexes for Space-Constrained", ICDE 2008, 2008, 1418--1420,

Kesheng Wu, Kurt Stockinger, Arie Shosani, "Breaking the Curse of Cardinality on Bitmap Indexes", SSDBM 08, Springer, 2008, 348--365, doi: 10.1007/978-3-540-69497-7_23

Meiyappan Nagappan, Mladen A. Vouk, Kesheng Wu Alex Sim, Arie Shoshani, "Efficient Operational Profiling of Systems Using Arrays on Execution Logs", ISSRE, 2008, 313--314, doi: 10.1109/ISSRE.2008.45

E. Wes Bethel, Oliver R\ ubel, Prabhat, Wu, Gunther H. Weber, Valerio Pascucci Hank Childs, Ajith Mascarenhas, Jeremy, Sean Ahern, "Modern Scientific Visualization is More than Just Pictures", Numerical Modeling of Space Plasma Flows: (Astronomical Society of the Pacific Series), St. Thomas, USVI, 2008, 301--317,

Frederick Reiss, Kurt Stockinger, Kesheng Wu, Shoshani, Joseph M. Hellerstein, "Enabling Real-Time Querying of Live and Historical Data", SSDBM 2007, 2007,

Luke Gosink, John Shalf, Kurt Stockinger, Wu, Wes Bethel, "HDF5-FastQuery: Accelerating Complex Queries on Datasets using Fast Bitmap Indices", SSDBM 2006, Vienna, Austria, July 2006, IEEE Computer Society Press., 2006, 149--158,

F. Reiss, K. Stockinger, K. Wu, A. Shoshani J. M. Hellerstein, "Efficient analysis of live and historical streaming and its application to cybersecurity", 2006,

Kesheng Wu, "FastBit: an efficient indexing technology for data-intensive science", Journal of Physics: Conference Series, IOP Publishing, 2005, 16:556--560, LBNL LBNL-2164E, doi: 10.1088/1742-6596/16/1/077

Kesheng Wu, Junmin Gu, Jerome Lauret, Arthur Poskanzer, Arie Shoshani, Alexander Sim, Zhang, "Grid Collector: Facilitating Efficient Selective from Data Grids", International Supercomputer Conference 2005, 2005,

Kesheng Wu, Ekow Otoo, Arie Shoshani, "Optimizing Connected Component Labeling Algorithms", Proceedings of SPIE Medical Imaging Conference 2005, Diego, CA, 2005,

E. Wes Bethel, Scott Campbell, Eli Dart, Lee, Steven A. Smith, Kurt Stockinger, Tierney, Kesheng Wu, "Interactive Analysis of Large Network Data Collections Query-Driven Visualization", 2005,

Kurt Stockinger, John Shalf, Wes Bethel, Wu, "Query-Driven Visualization of Large Data Sets", IEEE Visualization 2005, Minneapolis, MN, October 2005, 2005, 22, doi: 10.1109/VIS.2005.84

Kesheng Wu, Wei-Ming Zhang, Victor, Jerome Lauret, Arie Shoshani, "The Grid Collector: Using an Event Catalog to Speed up Analysis in Distributed Environment", Proceedings of Computing in High Energy and Nuclear (CHEP) 2004, 2004,

Kesheng Wu, Wei-Ming Zhang, Alexander Sim, Gu, Arie Shoshani, "Grid Collector: An Event Catalog With Automated File", Proceedings of IEEE Nuclear Science Symposium 2003, 2003, doi: 10.1109/NSSMIC.2003.1351830

L. M. Bernardo, B. Gibbard, D. Malon, H. Nordberg, D. Olson, R. Porter, A. Shoshani, A. Sim, A. Vaniachine, T. Wenaus, K. Wu, D. Zimmerman, "New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC", Computing in High Energy Physics, 2000,

Kesheng Wu, Horst Simon, "An Evaluation of Parallel Shift-and-Invert Lanczos", Proceedings of The 1999 International Conference on and Distributed Processing Techniques and Las Vegas, Nevada, June 28 - July 1, 1999, 2913--19,

Kesheng Wu, Horst Simon, "Parallel Efficiency of the Lanczos method for problems", Berkeley, CA, 1999,

Book Chapters

W. Yoo, M. Koo, Y. Cao, A. Sim, P. Nugent, K. Wu, "Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters", Conquering Big Data with High Performance Computing, edited by R. Arora, (Springer International: 2016) Pages: 139-161 doi: 10.1007/978-3-319-33742-5

F. Rusu, P. Nugent, K. Wu, "Implementing the Palomar Transient Factory Real-Time Pipeline in GLADE: Results and", Lecture Notes in Computer Science, ( 2014) Pages: 53--66

David H. Bailey, Stephanie Ger, Marcos L\ opez Prado, Alexander Sim, Kesheng Wu, "Statistical Overfitting and Backtest Performance", http://ssrn.com/abstract2507040, ( January 1, 2014)

ISBN 978-1-78548-008-9

Kurt Stockinger, John Cieslewicz, Kesheng Wu, Rotem, Arie Shoshani, "Using Bitmap Indexing Technology for Combined and Text Queries", Annals of Information Systems, (Springer: 2008) Pages: 1--23

Kurt Stockinger, Kesheng Wu, "Bitmap Indices for Data Warehouses", Data Warehouses and OLAP: Concepts, Architectures and, (Idea Group, Inc.: 2006) Pages: 179--202

Reports

David H. Bailey, Stephanie Ger, Marcos Lopez de, Alexander Sim, Kesheng Wu, "Statistical Overfitting and Backtest Performance", Quantitative Finance, 2015,

http://ssrn.com/abstract=2507040

L. Wu, K. Wu, A. Sim, M. Churchill, J. Y. Choi, A. Stathopoulos, C.S. Chang, S. Klasky, "Towards Real-Time Detection and Tracking of Blob-Filaments in Fusion Plasma Big Data", WM-CS-2015-01, Department of Computer Science, College of William and Mary, 2015,

William Gu, Jaesik Choi, Ming Gu, Horst Simon, Kesheng Wu, "Fast Change Point Detection for Electricity Market Analysis", October 6, 2013, LBNL LBNL-6388E,

Luke J. Gosink, Kesheng Wu, E. Wes Bethel, John D. Owens, Kenneth I. Joy, "Bin-Hash Indexing: A Parallel Method for Fast Query Processing", 2008, LBNL 729E,

I. Yamazaki, K. Wu, H. Simon, "nu-TRLan User Guide version 1.0", 2008, LBNL 1288E,

Kesheng Wu, "FastBit Reference Manual", 2007, LBNL LBNL PUB/3192,

K. Wu, K. Stockinger, A. Shoshani, Wes, "FastBit--Helps Finding the Proverbial Needle in a", 2006, LBNL LBNL-PUB/963,

K. Wu, E. Otoo, "A simpler proof of the average case complexity of with path compression", 2005,

Kesheng Wu, Ekow Otoo, Kenji Suzuki, "Two Strategies to Speed up Connected Component Algorithms", 2005,

K. Wu, W. Zhang, A. Sim, J. Gu, A. Shoshani, "Grid Collector: an Event Catalog with Automated File Management", 2004, LBNL 55563,

Kesheng Wu, Ekow Otoo, Arie Shoshani, "An Efficient Compression Scheme For Bitmap Indices", 2002,

Kesheng Wu, Horst D. Simon, "Dynamic Restarting Schemes For Eigenvalue Problems", 1999,

Posters

Dongeun Lee, Alex Sim, Jaesik Choi, Kesheng Wu, "Expanding Statistical Similarity Based Data Reduction to Capture Diverse Patterns", Data Compression Conference (DCC 2017), 2017,

J. Wang, W. Yoo (Advisor), A. Sim (Advisor), K. Wu (Advisor), "Analysis of Variable Selection Methods on Scientific Cluster Measurement Data", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), ACM Student Research Competition (SRC), Second place winner, 2016, 2016,

M. Bae, W. Yoo (Advisor), A. Sim (Advisor), K. Wu (Advisor), "Discovering Energy Resource Usage Patterns on Scientific Clusters", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), ACM Student Research Competition (SRC), Third place winner, 2016, 2016,

M. Bryson, S. Byna (Advisor), A. Sim (Advisor), K. Wu (Advisor), "The Search for Missing Parallel IO Performance on the Cori Supercomputer", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), ACM Student Research Competition (SRC), 2016,

Xiaocheng (Chris) Zou, Suren Byna, Hans Johansen, Daniel Martin, Nagiza F. Samatova, Arie Shoshani, John Wu, "Six-fold Speedup of Ice Calving Detection Achieved by AMR-aware Parallel Connected Component Labeling", SciDAC PI Meeting, July 2015, 2015,

L. Wu, K. Wu, A. Sim, A. Stathopoulos, "Real-Time Outlier Detection Algorithm for Finding Blob-Filaments in Plasma", Super Computing 2014, ACM SRC, 2014,

John Wu, Alex Sim, Lingfei Wu, Abraham Frankl, Scott Klasky, Jong Y Choi, CS Chang, Michael Churchill, "Exercising ICEE Framework with Fusion Blob Detection", DOE/ASCR NGNS PI meeting, 2014,

R. Ryne, B. Austin, J. Byrd, J. Corlett, E. Esarey, C. G. R. Geddes, W. Leemans, X. Li, Prabhat, J. Qiang, O. Rübel, J.-L. Vay, M. Venturini, K. Wu, B. Carlsten, D. Higdon and N. Yampolsky, "High Performance Computing in Accelerator Science: Past Successes, Future Challenges", Workshop on Data and Communications in Basic Energy Sciences: Creating a Pathway for Scientific Discovery, October 2011,

Prabhat, Suren Byna. Chris Paciorek, Gunther Weber, Kesheng Wu, Thomas Yopes, Michael Wehner, William Collins, George Ostrouchov, Richard Strelitz, E. Wes Bethel, "Pattern Detection and Extreme Value Analysis on Large Climate Data", DOE/BER Climate and Earth System Modeling PI Meeting, September 2011,

D. Hasenkamp, A. Sim, M. Wehner, K. Wu, "Finding Tropical Cyclones on Clouds", Supercomputing 2010, ACM SRC 3rd place, 2010,

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "Application of High-performance Visual Analysis Methods to Laser Wakefield Particle Acceleration Data", IEEE Visualization 2008, October 2008,

Others

S. Shannigrahi, A. Barczuk, C. Papadopoulos, A. Sim, I. Monga, H. Newman, K. Wu, E., Named Data Networking in Climate Research and HEP, 21st International Conference on Computing in High and Nuclear Physics (CHEP2015), Okinawa Japan, 2015,

Kesheng Wu, Wes Bethel, Ming Gu, David, Oliver R\ ubel, Testing VPIN on Big Data, Available at SSRN 2318259, 2013,

C. G. R. Geddes, E Cormier-Michel, E. H. Esarey, C. B. Schroeder, J.-L. Vay, W. P. Leemans, D. L.. Bruhwiler, J. R. Cary, B. Cowan, M. Durant, P. Hamill, P. Messmer, P. Mullowney, C. Nieter, K. Paul, S. Shasharina, S. Veitzer, G. Weber, O. Rübel, D. Ushizima, Prabhat, E. W.Bethel, K. Wu, Large Fields for Smaller Facility Sources, SciDAC Review, Pages: 13-21, 2009,

Kesheng Wu, Kurt Stockinger, Arie Shoshani, Performance of Multi-Level and Multi-Component Bitmap Indexes, 2007, doi: 10.1145/1670243.1670245

K. Wu, A. Shoshani, E. J. Otoo, Word aligned bitmap compression method, data and apparatus, US Patent 6,831,575, 2004,

Kesheng Wu, Horst Simon, TRLAN user guide, 1999,