Skip to navigation Skip to content
Careers | Phone Book | A - Z Index
Scientific Data Management Research

John Wu

JohnWu1702d
Kesheng (John) Wu
Senior Computer Scientist
Scientific Networking Division
Berkeley Lab
One Cyclotron Road
MS5R3103
Berkeley, CA 94720 us

John Wu is currently working on indexing technology for searching large datasets. He primarily focuses on improving bitmap index technology with compression, encoding and binning. He is the key developer of FastBit bitmap indexing software, which has been used in a number of applications including High-Energy physics, combustion, network security, and query-driven visualization.  John has also been working on a number of scientific computing projects including developing Thick-Restart Lanczos (TRLan) algorithm for solving eigenvalue problems and devising statistical tests for deterministic effects in broad band time series.  John received a Ph.D. in computer science from the University of Minnesota, an M.S. in physics from the University of Wisconsin-Milwaukee, and a B.S. in physics from Nanjing University, China.

LBNL researcher profile: https://profiles.lbl.gov/20161-john-wu/

Projects

FasTensor

FasTensor, formerly known as ArrayUDF, is a generic parallel programming model for big data analyses with any user-defined functions (UDF). These functions may express data analysis operations from traditional database (DB) systems to advanced machine learning pipelines. FasTensor exploits the structural-locality in the multidimensional arrays to automate file operations, data partitioning, communication, parallel execution, and common data management operations.

FastBit

FastBit[Publications]: an efficient compressed bitmap index technology for data intensive sciences. This project addresses the challenges of efficiently searching growing amounts of data collected/generated by various scientific applications, such as high-energy physics, combustion, astrophysics, and network traffic analysis. The FastBit software has received an R&D 100 Award; here is a photo from the award receiption.

IDEALEM

A statistical compression technique based on the idea of local exchangeable measure. Here is a list of publications on the topic of idealem compression.

ICEE

The ICEE project aims to introduce the in-transit analysis capability into a collaborative workflow system by leveraging the in-transit capability of ADIOS and selective data access capability of FastBit.

Backtest Overfitting

Exploring the concept of backtest overfitting to demonstrate how too much computing could ruin some artificial intelligence tools.

ExaHDF5

To provide high performance I/O middleware that makes effective use of computational platforms, researching a number of optimization strategies and deploying them through the HDF5 software.

Connected Component Labeling

[Publications]: an efficient connected component labeling algorithm. This grows out our work on feature tracking for a combustion data analysis. The key new insight is that there is a way to make use of an implicit union-find data structure to speed up the connected component labeling algorithms, which in turn leads to faster algorithms for finding regions of interest. In particular, using compressed bitmaps as representations of points in the regions of interest, we can find the regions in time that is proportional to the the number of points on the boundary of the regions. This is faster than the best iso-contouring algorithms and much faster than similar region finding algorithms. This is also a basis of some of the work on visualization and visual analytics.

 

Journal Articles

R. Frehner, K. Wu, A. Sim, J. Kim, K. Stockinger, "Detecting Anomalies in Time Series Using Kernel Density Approaches", IEEE Access, 2024, doi: 10.1109/ACCESS.2024.3371891

H-C. Yang, L. Jin, A. Lazar, A. Todd-Blick, A. Sim, K. Wu, Q. Chen, C. A. Spurlock, "Gender Gaps in Mode Usage, Vehicle Ownership, and Spatial Mobility When Entering Parenthood: A Life Course Perspective", Systems, 2023, 11(6):314, doi: 10.3390/systems11060314

R. Shao, A. Sim, K. Wu, J. Kim, "Leveraging History to Predict Abnormal Transfers in Distributed Workflows", Sensors, 2023, 23(12):5485, doi: 10.3390/s23125485

S. Kim, A. Sim, K. Wu, S. Byna, Y. Son, H. Eom, "Design and Implementation of I/O Performance Prediction Scheme on HPC Systems through Large-scale Log Analysis", Journal of Big Data, 2023, 10(65), doi: 10.1186/s40537-023-00741-4

J. Wang, K. Wu, A. Sim, S. Hwangbo, "Locating Partial Discharges in Power Transformers with Convolutional Iterative Filtering", Sensors, 2023, 23, doi: 10.3390/s23041789

Sunggon Kim, Alex Sim, Kesheng Wu, Suren Byna, Yongseok Son, "Design and implementation of dynamic I/O control scheme for large scale distributed file systems", Cluster Computing, 2022, 25(6):1--16, doi: 10.1007/s10586-022-03640-0

L. Jin, A. Lazar, C. Brown, V. Garikapati, B. Sun, S. Ravulaparthy, Q. Chen, A. Sim, K. Wu, T. Wenzel, T. Ho, C. A. Spurlock, "What Makes You Hold onto That Old Car? Joint Insights from Machine Learning and Multinomial Logit on Vehicle-level Transaction Decisions", Frontiers in Future Transportation, Connected Mobility and Automation, 2022, 3:894654, doi: 10.3389/ffutr.2022.894654

B. Weinger, J. Kim, A. Sim, M. Nakashima, N. Moustafa, K. Wu, "Enhancing IoT Anomaly Detection Performance for Federated Learning", Digital Communications and Networks, Special Issue on Edge Computation and Intelligence, 2022, doi: 10.1016/j.dcan.2022.02.007

Lipeng Wan, Axel Huebl, Junmin Gu, Franz Poeschel, Ana Gainaru, Ruonan Wang, Jieyang Chen, Xin Liang, Dmitry Ganyushin, Todd Munson, Ian Foster, Jean-Luc Vay, Norbert Podhorszki, Kesheng Wu, Scott Klasky, "Improving I/O Performance for Exascale Applications Through Online Data Layout Reorganization", IEEE Transactions on Parallel and Distributed Systems, 2022, 33:878-890, doi: 10.1109/TPDS.2021.3100784

B Mohammed, M Kiran; N Krishnaswamy; Keshang, Wu, "Predicting WAN Traffic Volumes using Fourier and Multivariate SARIMA Approach", International Journal of Big Data Intelligence, November 3, 2021, doi: 10.1504/IJBDI.2021.118742

A. Syal, A. Lazar, J. Kim, A. Sim, K. Wu, "Network traffic performance analysis from passive measurements using gradient boosting machine learning", International Journal of Big Data Intelligence, 2021, 8:13-30, doi: 10.1504/IJBDI.2021.118741

Donghun Koo, Jaehwan Lee, Jialin Liu, Eun-Kyu Byun, Jae-Hyuck Kwak, Glenn K Lockwood, Soonwook Hwang, Katie Antypas, Kesheng Wu, Hyeonsang Eom, "An empirical study of I/O separation for burst buffers in HPC systems", Journal of Parallel and Distributed Computing, 2021, 148:96-108, doi: 10.1016/j.jpdc.2020.10.007

Ling Jin, Alina Lazar, James Sears, Annika Todd, Alex Sim, Kesheng Wu, Hung-Chai Yang, C. Anna Spurlock, "Clustering Life Course to Understand the Heterogeneous Effects of Life Events, Gender and Generation on Habitual Travel Modes", IEEE Access, 2020, 1-17, doi: 10.1109/ACCESS.2020.3032328

William F.Godoy, Norbert Podhorszki, Ruonan Wang, Chuck Atkins, Greg Eisenhauer, Junmin Gu,Philip Davis,J ong Choi, Kai Germaschewski, Kevin Huck, Axel Huebl, Mark Kim, James Kress, Tahsin Kurc, Qing Liu, Jeremy Logan, Kshitij Mehta, George Ostrouchov, Manish Parashar, Franz Poeschel, David Pugmire, Eric Suchyta, KeichiTakahashi, NickThompson, Seiji Tsutsumi, Lipeng Wan, Matthew Wolf, Kesheng Wu, Scott Klasky, "ADIOS 2: The Adaptable Input Output System. A framework for high-performance data management", SoftwareX, 2020, 12,

Alina Lazar, Ling Jin, C Anna Spurlock, Kesheng Wu, Alex Sim, Annika Todd, "Evaluating the effects of missing values and mixed data types on social sequence clustering using t-SNE visualization", Journal of Data and Information Quality (JDIQ), 2019, 11:1--22,

Beytullah Yildiz, Kesheng Wu, Suren Byna, Arie Shoshani, "Parallel membership queries on very large scientific data sets using bitmap indexes", Concurrency and Computation: Practice and Experience, January 1, 2019, 31:e5157,

Many scientific applications produce very large amounts of data as advances in hardware fuel computing and experimental facilities. Managing and analyzing massive quantities of scientific data is challenging as data are often stored in specific formatted files, such as HDF5 and NetCDF, which do not offer appropriate search capabilities. In this research, we investigated a special class of search capability, called membership query, to identify whether queried elements of a set are members of an attribute. Attributes that naturally have classification values appear frequently in scientific domains such as category and object type as well as in daily life such as zip code and occupation. Because classification attribute values are discrete and require random data access, performing a membership query on a large scientific data set creates challenges. We applied bitmap indexing and parallelization to membership queries to overcome these challenges. Bitmap indexing provides high performance not only for low cardinality attributes but also for high cardinality attributes, such as floating‐point variables, electric charge, or momentum in a particle physics data set, due to compression algorithms such as Word‐Aligned Hybrid. We conducted experiments, in a highly parallelized environment, on data obtained from a particle accelerator model and a synthetic data set.

Weijie Zhao, Florin Rusu, Kesheng Wu, Peter Nugent, "Automatic identification and classification of Palomar Transient Factory astrophysical objects in GLADE", International Journal of Computational Science and Engineering, 2018, 16:337--349,

Taehoon Kim, Jaesik Choi, Dongeun Lee, Alex Sim, C Anna Spurlock, Annika Todd, Kesheng Wu, "Predicting baseline for analysis of electricity pricing", International Journal of Big Data Intelligence, 2018, 5:3--20,

Hongyuan Zhan, Gabriel Gomes, Xiaoye S Li, Kamesh Madduri, Alex Sim, Kesheng Wu, "Consensus ensemble system for traffic flow prediction", IEEE Transactions on Intelligent Transportation Systems, 2018, 19:3903--3914,

Deborah A Agarwal, Boris Faybishenko, Vicky L Freedman, Harinarayan Krishnan, Gary Kushner, Carina Lansing, Ellen Porter, Alexandru Romosan, Arie Shoshani, Haruko Wainwright, others, "A science data gateway for environmental management", Concurrency and Computation: Practice and Experience, 2016, 28:1994--2004,

Lingfei Wu, Kesheng John Wu, Alex Sim, Michael Churchill, Jong Y Choi, Andreas Stathopoulos, Choong-Seock Chang, Scott Klasky, "Towards real-time detection and tracking of spatio-temporal features: Blob-filaments in fusion plasma", IEEE Transactions on Big Data, 2016, 2:262--275,

Jung Heon Song, Kesheng Wu, Horst D Simon, "Parameter Analysis of the VPIN (Volume synchronized of Informed Trading) Metric", Quantitative Financial Risk Management: Theory and, 2014,

Kesheng Wu, E Bethel, Ming Gu, David Leinweber, Oliver R\ ubel, "A big data approach to analyzing market volatility", Algorithmic Finance, 2013, 2:241--267, LBNL LBNL-6382E,

Understanding the microstructure of the financial market requires the processing of a vast amount of data related to individual trades, and sometimes even multiple levels of quotes. Analyzing such a large volume of data requires tremendous computing power that is not easily available to financial academics and regulators. Fortunately, public funded High Performance Computing (HPC) power is widely available at the National Laboratories in the US. In this paper we demonstrate that the HPC resource and the techniques for data-intensive sciences can be used to greatly accelerate the computation of an early warning indicator called Volume-synchronized Probability of Informed trading (VPIN). The test data used in this study contains five and a half year's worth of trading data for about 100 most liquid futures contracts, includes about 3 billion trades, and takes 140GB as text files. By using (1) a more efficient file format for storing the trading records, (2) more effective data structures and algorithms, and (3) parallelizing the computations, we are able to explore 16,000 different ways of computing VPIN in less than 20 hours on a 32-core IBM DataPlex machine. Our test demonstrates that a modest computer is sufficient to monitor a vast number of trading activities in real-time -- an ability that could be valuable to regulators.

Our test results also confirm that VPIN is a strong predictor of liquidity-induced volatility. With appropriate parameter choices, the false positive rates are about 7% averaged over all the futures contracts in the test data set. More specifically, when VPIN values rise above a threshold (CDF > 0.99), the volatility in the subsequent time windows is higher than the average in 93% of the cases.

E. W. Bethel and D. Leinweber and O. Rubel and K. Wu, "Federal Market Information Technology in the Post Flash Crash Era: Roles of Supercomputing", The Journal of Trading, 2012, 7:9-24, LBNL 5263E, doi: 10.3905/jot.2012.7.2.009

G. F. Lofstead, Q. Liu, J. Logan, Y. Tian, Abbasi, N. Podhorszki, J. Y. Choi, S., R. Tchoua, R. A. Oldfield, others, "Hello ADIOS: The Challenges and Lessons of Leadership Class I/O Frameworks", 2012,

Kesheng Wu, Rishi R Sinha, Chad Jones, Stephane Ethier, Scott Klasky, Kwan-Liu Ma, Arie Shoshani, Marianne Winslett, "Finding regions of interest on toroidal meshes", Computational Science \& Discovery, 2011, 4:015003,

Ichitaro Yamazaki, Zhaojun Bai, Horst D. Simon Lin-Wang Wang, Kesheng Wu, "Adaptive Projection Subspace Dimension for the Lanczos Method", ACM Transactions on Mathematical Software, 2010, 37, doi: 10.1145/1824801.1824805

Oliver R\ ubel, Cameron GR Geddes, Estelle Cormier-Michel, Kesheng Wu, Gunther H Weber, Daniela M Ushizima, Peter Messmer, Hans Hagen, Bernd Hamann, Wes Bethel, others, "Automatic beam path analysis of laser wakefield particle acceleration data", Computational Science \& Discovery, January 2009, 2:015005, LBNL 2734E,

Lifeng He, Yuyan Chao, Kenji Suzuki, Kesheng Wu, "Fast connected-component labeling", Pattern recognition, 2009, 42:1977--1987,

Kesheng Wu, Ekow Otoo, Kenji Suzuki, "Optimizing two-pass connected-component labeling", Pattern Analysis \& Applications, 2009, 12:117--135,

Kesheng Wu, Ekow J Otoo, Arie Shoshani, "Optimizing bitmap indices with efficient compression", ACM Transactions on Database Systems (TODS), 2006, 31:1--38,

Kurt Stockinger, Kesheng Wu, Rene Brun, Canal, "Bitmap indices for fast end-user physics analysis in", Nuclear Instruments and Methods in Physics Research A: Accelerators, Spectrometers, Detectors and Equipment, 2006, 559:99--102,

Andreas Stathopoulos, Kesheng Wu, "A Block Orthogonalization Procedure with Constant Rquirements", SIAM Journal on Scientific Computing, 2002, 23:2165--2182,

L Bernardo, H Nordberg, D Olson, A Shoshani, A Sim, A Vaniachine, D Zimmerman, B Gibbard, R Porter, T Wenaus, others, "New capabilities in the HENP grand challenge storage access system and its application at RHIC", Computer physics communications, 2001, 140:179--188,

Kesheng Wu, Horst Simon, "Thick-restart Lanczos method for large symmetric problems", SIAM J. Matrix Anal. Appl., 2000, 22:602--616,

Kesheng Wu, Horst Simon, "A Parallel Lanczos method for symmetric generalized problems", Computing and Visualization in Science, 1999, 2:37--46,

Kesheng Wu, Andrew Canning, Horst D. Simon, Wang, "Thick-Restart Lanczos method for electronic calculations", Journal of Computational Physics, 1999, 154:156--173,

K Wu, A Canning, HD Simon, LW Wang, "Thick-Restart Lanczos Method for Electronic Structure Calculations", Journal of Computational Physics, 1999, 154:156--173,

Kesheng Wu, Robert Savit, William Brock, "Statistical tests for deterministic effects in broad time series", Physica D, 1993, 69:172--188, doi: 10.1016/0167-2789(93)90188-7

Conference Papers

D.K. Sung, Y. Son, A. Sim, K. Wu, S. Byna, H. Tang, H. Eom, C. Kim, S. Kim, "A2FL: Autonomous and Adaptive File Layout in HPC through Real-time Access Pattern Analysis", 38th IEEE International Parallel & Distributed Processing Symposium (IPDPS2024), 2024,

L. Zhou, Q. Lin, K. Chowdhury, S. Masood, A. Eichenberger, H. Min, A. Sim, J. Wang, Y. Wang, K. Wu, B. Yuan, J. Zou, "Serving Deep Learning Model in Relational Databases", 27th International Conference on Extending Database Technology (EDBT2024), 2024,

A, Sharma, X. Li, H. Guan, G. Sun, L. Zhang, L. Wang, K. Wu, L. Cao, E. Zhu, A. Sim, T. Wu, J. Zou, "Automatic Data Transformation Using Large Language Model – An Experimental Study on Building Energy Data", IEEE International Conference on Big Data (BigData), 2023,

C. M. Oguchi, D. Ghosal, A. Sim, K. Wu, "Counterfactual Analysis: A Case Study on Impact of External Events on Building Energy Consumption", International Workshop on Big Data Analytics for Sustainability (BDA4S), 2023,

J. Bellavita, C. Sim, K. Wu, A. Sim, S. Yoo, H. Ito, V. Garonne, E. Lancon, "Understanding Data Access Patterns for dCache System", 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP2023), 2023, doi: 10.1051/epjconf/202429501053

C. Sim, K. Wu, A. Sim, I. Monga, C. Guok, D. Hazen, F. Würthwein, D. Davila, H. Newman, J. Balcas, "Predicting Resource Utilization Trends with Southern California Petabyte Scale Cache", 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP2023), 2023, doi: 10.1051/epjconf/202429501044

Z. Deng, A. Sim, K. Wu, C. Guok, I. Monga, F. Andrijauskas, F. Wuerthwein, D. Weitzel, "Analyzing Transatlantic Network Traffic Patterns with Scientific Data Caches", 6th ACM International Workshop on ​System and Network Telemetry and Analysis (SNTA 2023), 2023, doi: 10.1145/3589012.3594897

C. Sim, K. Wu, A. Sim, I. Monga, C. Guok, F. Wurthwein, D. Davila, H. Newman, J. Balcas, "Effectiveness and predictability of in-network storage cache for Scientific Workflows", International Conference on Computing, Networking and Communication (ICNC 2023), 2023, doi: 10.1109/ICNC57223.2023.10074058

R. Shao, J. Kim A. Sim, K. Wu, "Predicting Slow Connections in Scientific Computing", 5th ACM International Workshop on ​System and Network Telemetry and Analysis (SNTA) 2022, in conjunction with The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2022, doi: 10.1145/3526064.3534112

J. Bellavita, A. Sim, K. Wu, I. Monga, C. Guok, F. Würthwein, D. Davila, "Studying Scientific Data Lifecycle in On-demand Distributed Storage Caches", 5th ACM International Workshop on ​System and Network Telemetry and Analysis (SNTA) 2022, in conjunction with The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2022, doi: 10.1145/3526064.3534111

R. Han, A. Sim, K. Wu, I. Monga, C. Guok, F. Würthwein, D. Davila, J. Balcas, H. Newman, "Access Trends of In-network Cache for Scientific Data", 5th ACM International Workshop on ​System and Network Telemetry and Analysis (SNTA), in conjunction with The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2022, doi: 10.1145/3526064.3534110

K. Wang, S. Lee, J. Balewski, A. Sim, P. Nugent, A. Agrawal, A. Choudhary, K. Wu, W-K. Liao, "Using Multi-resolution Data to Accelerate Neural Network Training in Scientific Applications", 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid 2022), 2022, doi: 10.1109/CCGrid54584.2022.00050

S. Lee, Q. Kang, K. Wang, J. Balewski, A. Sim, A. Agrawal, A. Choudhary, P. Nugent, K. Wu, W-K. Liao, "Asynchronous I/O Strategy for Large-Scale Deep Learning Applications", IEEE International Conference on High Performance Computing, Data & Analytics (HiPC2021), 2021, doi: 10.1109/HiPC53243.2021.00046

J. Bang, C. Kim, K. Wu, A. Sim, S. Byna, H. Sung, H. Eom, "An In-Depth I/O Pattern Analysis in HPC Systems", IEEE International Conference on High Performance Computing, Data & Analytics (HiPC2021), 2021, doi: 10.1109/HiPC53243.2021.00056

A. Lazar, L. Jin, C. Brown, C. A. Spurlock, A. Sim, K. Wu, "Performance of the Gold Standard and Machine Learning in Predicting Vehicle Transactions", the 3rd International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD 2021), 2021, doi: 10.1109/BigData52589.2021.9671286

E. Copps, H. Zhang, A. Sim, K. Wu, I. Monga, C. Guok, F. Würthwein, D. Davila, E. Fajardo, "Analyzing scientific data sharing patterns with in-network data caching", 4th ACM International Workshop on ​System and Network Telemetry and Analysis (SNTA 2021), 2021, doi: 10.1145/3452411.3464441

Y. Wang, K. Wu, A. Sim, S. Yoo, S. Misawa, "Access Patterns of Disk Cache for Large Scientific Archive", 4th ACM International Workshop on ​System and Network Telemetry and Analysis (SNTA 2021), 2021, doi: 10.1145/3452411.3464444

A. Lazar, A. Sim, K. Wu, "GPU-based Classification for Wireless Intrusion Detection", 4th ACM International Workshop on ​System and Network Telemetry and Analysis (SNTA 2021), 2021, doi: 10.1145/3452411.3464445

Y. Ma, F. Ruso, A. Sim, K. Wu, "Adaptive Stochastic Gradient Descent for Deep Learning on Heterogeneous CPU+GPU Architectures", Heterogeneity in Computing Workshop (HCW 2021), in conjunction with the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2021, doi: 10.1109/IPDPSW52791.2021.00012

B. Weinger, J. Kim, A. Sim, M. Nakashima, N. Moustafa, K. Wu, "Enhancing IoT Anomaly Detection Performance for Federated Learning", The 16th IEEE International Conference on Mobility, Sensing and Networking (IEEE MSN 2020), 2020, doi: 10.1109/MSN50589.2020.00045

B. Cho, T. Dayrit, Y. Gao, Z. Wang, T. Hong, A. Sim, K. Wu, "Effective Missing Value Imputation Methods for Building Monitoring Data", The 2nd International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD 2020) in conjunction with IEEE International Conference on Big Data (IEEE BigData 2020), 2020, doi: 10.1109/BigData50022.2020.9378230

V. Dumont, V. Rodriguez Tribaldos, J. Ajo-Franklin, K. Wu, "Deep Learning for Surface Wave Identification in Distributed Acoustic Sensing Data", IEEE BigData 2020, December 8, 2020,

J. Kim, A. Sim, J. Kim, K. Wu, "Botnets Detection Using Recurrent Variational Autoencoder", IEEE Global Communications Conference (Globecom 2020), 2020, doi: 10.1109/GLOBECOM42002.2020.9348169

V. Dumont, V. Rodriguez Tribaldos, J. Ajo-Franklin, K. Wu, "Deep Learning on Real Geophysical Data: A Case Study for Distributed Acoustic Sensing Research", NeurIPS "Machine Learning and the Physical Sciences" workshop, 2020,

Bin Dong, Ver\ onica Rodr\ \iguez Tribaldos, Xin Xing, Suren Byna, Jonathan Ajo-Franklin, Kesheng Wu, "DASSA: Parallel DAS Data Storage and Analysis for Subsurface Event Detection", 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), July 14, 2020, 254--263,

Sunggon Kim, Alex Sim, Kesheng Wu, Suren Byna, Yongseok Son, Hyeonsang Eom, "Towards hpc i/o performance prediction through large-scale log analysis", Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020), 2020, 77--88, doi: 10.1145/3369583.3392678

Gaurav R Ghosal, Dipak Ghosal, Alex Sim, Aditya V Thakur, Kesheng Wu, "A Deep Deterministic Policy Gradient Based Network Scheduler For Deadline-Driven Data Transfers", Proceedings of International Federation for Information Processing (IFIP) Networking Conference (NETWORKING 2020), 2020, 253--261,

Jeeyung Kim, Alex Sim, Jinoh Kim, Kesheng Wu, Jaegyoon Hahm, "Transfer Learning Approach for Botnet Detection Based on Recurrent Variational Autoencoder", ACM International Workshop on ​System and Network Telemetry and Analysis (SNTA 2020), in conjunction with The 29th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020), 2020, 41--47, doi: 10.1145/3391812.3396273

Jiwoo Bang, Chungyong Kim, Kesheng Wu, Alex Sim, Suren Byna, Sunggon Kim, Hyeonsang Eom, "HPC Workload Characterization Using Feature Selection and Clustering", ACM International Workshop on ​System and Network Telemetry and Analysis (SNTA 2020), in conjunction with The 29th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020), 2020, 33--40, doi: 10.1145/3391812.3396270

S. Bhandari, A. K. Kukreja, A. Lazar, A. Sim, K. Wu, "Feature Selection and Tree-based Classification for Wireless Intrusion Detection", the 3rd ACM International Workshop on System and Network Telemetry and Analysis (SNTA) 2020, in conjunction with The 29th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2020, doi: 10.1145/3391812.3396274

Qiao Kang, Alex Sim, Peter Nugent, Sunwoo Lee, Wei-keng Liao, Ankit Agrawal, Alok Choudhary, Kesheng Wu, "Predicting Resource Requirement in Intermediate Palomar Transient Factory Workflow", 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID 2020), 2020, 619--628, doi: 10.1109/CCGrid49817.2020.00-31

L. Jin, A. Lazar, J. Sears, A. Todd, A. Sim, K. Wu, C. A. Spurlock, "Life Course as a Contextual System to Investigate the Effects of Life Events, Gender, and Generation on Travel Mode Use", Transportation Research Board (TRB) 99th Annual Meeting, 2020,

A. Lazar, A. Ballow, L. Jin, C. A. Spurlock, A. Sim, K. Wu, "Machine Learning for Prediction of Mid to LongTerm Habitual Transportation Mode Use", International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD), in conjunction with the IEEE International Conference on Big Data (Big Data), 2019, doi: 10.1109/BigData47090.2019.9006411

Junmin Gu, Burlen Loring, Kesheng Wu, E. Wes Bethel, "HDF5 as a vehicle for in transit data movement", The Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (ISAV'19), 2019, doi: 10.1145/3364228.3364237

S. Kim, A. Sim, K. Wu, S. Byna, T. Wang, Y. Son, H. Eom, "DCA-IO: A Dynamic I/O Control Scheme for Parallel and Distributed File System", 19th Annual IEEE/ACM International Symposium in Cluster, Cloud, and Grid Computing (CCGrid 2019), 2019, doi: 10.1109/CCGRID.2019.00049

Bin Dong, Kesheng Wu, Suren Byna, Houjun Tang, "SLOPE: Structural Locality-Aware Programming Model for Composing Array Data Analysis", International Conference on High Performance Computing, January 1, 2019, 61--80,

Sambit Shukla, Dipak Ghosal, Kesheng Wu, Alex Sim, Matthew Farrens, "Co-optimizing Latency and Energy for IoT services using HMP servers in Fog Clusters", 2019 Fourth International Conference on Fog and Mobile Edge Computing (FMEC), 2019, 121--128,

Hanul Sung, Jiwoo Bang, Alexander Sim, Kesheng Wu, Hyeonsang Eom, "Understanding Parallel I/O Performance Trends Under Various HPC Configurations", Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, 2019, 29--36,

Mengtian Jin, Youkow Homma, Alex Sim, Wilko Kroeger, Kesheng Wu, "Performance prediction for data transfers in LCLS workflow", Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, 2019, 37--44,

Olivia Del Guercio, Rafael Orozco, Alex Sim, Kesheng Wu, "Similarity-based Compression with Multidimensional Pattern Matching", Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, 2019, 19--24,

Astha Syal, Alina Lazar, Jinoh Kim, Alex Sim, Kesheng Wu, "Automatic detection of network traffic anomalies and changes", Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, 2019, 3--10,

Bin Dong, Patrick Kilian, Xiaocan Li, Fan Guo, Suren Byna, Kesheng Wu, "Terabyte-scale Particle Data Analysis: An ArrayUDF Case Study", Proceedings of the 31st International Conference on Scientific and Statistical Database Management, January 1, 2019, 202--205,

Dipak Ghosal, Sambit Shukla, Alex Sim, Aditya V Thakur, Kesheng Wu, "A Reinforcement Learning Based Network Scheduler For Deadline-Driven Data Transfers", 2019 IEEE Global Communications Conference (GLOBECOM), 2019, 1--6,

Qiao Kang, Ankit Agrawal, Alok Choudhary, Alex Sim, Kesheng Wu, Rajkumar Kettimuthu, Peter H Beckman, Zhengchun Liu, Wei-keng Liao, "Spatiotemporal Real-Time Anomaly Detection for Supercomputing Systems", 2019 IEEE International Conference on Big Data (Big Data), 2019, 4381--4389,

Jongbeen Han, Heemin Kim, Hyeonsang Eom, Jonathan Coignard, Kesheng Wu, Yongseok Son, "Enabling SQL-Query Processing for Ethereum-based Blockchain Systems", Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics, 2019, 1--7,

Payton A Linton, William M Melodia, Alina Lazar, Deborah Agarwal, Ludovico Bianchi, Devarshi Ghoshal, Kesheng Wu, Gilberto Pastorello, Lavanya Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", 2019,

Devarshi Ghoshal, Kesheng Wu, Eric Pouyoul, Erich Strohmaier, "Analysis and Prediction of Data Transfer Throughput for Data-Intensive Workloads", 2019 IEEE International Conference on Big Data (Big Data), 2019, 3648--3657,

Junmin Gu, Burlen Loring, Kesheng Wu, E Wes Bethel, "HDF5 as a vehicle for in transit data movement", Proceedings of the Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, 2019, 39--43,

Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, Anna YQ Ho, Peter Nugent, "Distributed Caching for Complex Querying of Raw Arrays", SSDBM, 2018,

Haoyuan Xing, Sofoklis Floratos, Spyros Blanas, Suren Byna, Prabhat, Kesheng Wu, and Paul Brown,, "ArrayBridge: Interweaving declarative array processing with imperative high-performance computing", 34th IEEE International Conference on Data Engineering (ICDE) 2018, April 17, 2018,

Haoyuan Xing, Sofoklis Floratos, Spyros Blanas, Suren Byna, M Prabhat, Kesheng Wu, Paul Brown, "ArrayBridge: Interweaving declarative array processing in SciDB with imperative HDF5-based programs", 2018 IEEE 34th International Conference on Data Engineering (ICDE), 2018, 977--988,

Cecilia Dao, Xinyu Liu, Alex Sim, Craig Tull, Kesheng Wu, "Modeling data transfers: change point and anomaly detection", 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), 2018, 1589--1594,

Junmin Gu, Scott Klasky, Norbert Podhorszki, Ji Qiang, Kesheng Wu, "Querying large scientific data sets with adaptable IO system ADIOS", Asian Conference on Supercomputing Frontiers, 2018, 51--69,

Rajkumar Kettimuthu, Zhengchun Liu, Ian Foster, Peter H Beckman, Alex Sim, Kesheng Wu, Wei-keng Liao, Qiao Kang, Ankit Agrawal, Alok Choudhary, "Towards autonomic science infrastructure: architecture, limitations, and open issues", Proceedings of the 1st International Workshop on Autonomous Infrastructure for Science, 2018, 1--9,

Mengying Yang, Xinyu Liu, Wilko Kroeger, Alex Sim, Kesheng Wu, "Identifying anomalous file transfer events in LCLS workflow", Proceedings of the 1st International Workshop on Autonomous Infrastructure for Science, 2018, 1--4,

Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, Anna YQ Ho, Peter Nugent, "Distributed caching for processing raw arrays", Proceedings of the 30th International Conference on Scientific and Statistical Database Management, 2018, 1--12,

Sowmya Balasubramanian, Dipak Ghosal, Kamala Narayanan Balasubramanian Sharath, Eric Pouyoul, Alex Sim, Kesheng Wu, Brian Tierney, "Auto-tuned publisher in a pub/sub system: Design and performance evaluation", 2018 IEEE International Conference on Autonomic Computing (ICAC), 2018, 21--30,

Jonathan Wang, Kesheng Wu, Alex Sim, Seongwook Hwangbo, "Feature Engineering and Classification Models for Partial Discharge in Power Transformers", Mij, 2018, 1001:60,

Tal Shachaf, Alexander Sim, Kesheng Wu, Wilko Kroeger, "Detecting Anomalies in the LCLS Workflow", 2018 IEEE International Conference on Big Data (Big Data), 2018, 3256--3260,

Xin Xing, Bin Dong, Jonathan Ajo-Franklin, Kesheng Wu, "Automated Parallel Data Processing Engine with Application to Large-Scale Feature Extraction", 2018 IEEE/ACM Machine Learning in HPC Environments (MLHPC), January 1, 2018, 37--46,

Bin Dong, Teng Wang, Houjun Tang, Quincey Koziol, Kesheng Wu, Suren Byna, "ARCHIE: Data analysis acceleration with array caching in hierarchical storage", 2018 IEEE International Conference on Big Data (Big Data), January 1, 2018, 211--220,

Hongyuan Zhan, Gabriel Gomes, Xiaoye S Li, Kamesh Madduri, Kesheng Wu, "Efficient online hyperparameter learning for traffic flow prediction", 2018 21st International Conference on Intelligent Transportation Systems (ITSC), 2018, 164--169,

Kesheng Wu, Bin Dong, Surendra Byna, "Scientific Data Services Framework for Plasma Physics", APS, 2018, 2018:BM10--006,

Shashanka Ubaru, Kesheng Wu, Kristofer E. Bouchard, "UoI-NMF Cluster: A Robust Nonnegative Matrix Factorization Algorithm for Improved Parts-Based Decomposition and Reconstruction of Noisy Data", the 16th IEEE International Conference on Machine Learning and Applications (ICMLA 2017), 2017, 241-248, doi: 10.1109/ICMLA.2017.0-152

Ling Jin, Doris Lee, Alex Sim, Sam Borgeson, Kesheng Wu, C Anna Spurlock, Annika Todd, "Comparison of clustering techniques for residential energy behavior using smart meter data", 2017,

Jonathan Wang, Wucherl Yoo, Alex Sim, Peter Nugent, Kesheng Wu, "Parallel variable selection for effective performance prediction", 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2017, 208--217,

Dongeun Lee, Alex Sim, Jaesik Choi, Kesheng Wu, "Improving statistical similarity based data reduction for non-stationary data", Proceedings of the 29th International Conference on Scientific and Statistical Database Management, 2017, 1--6,

Updated experiment version: https://sdm.lbl.gov/oapapers/ssdbm17-lee-upd.pdf
Original version: http://dl.acm.org/citation.cfm?doid=3085504.3085583

Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, Peter Nugent, "Incremental view maintenance over array data", Proceedings of the 2017 ACM International Conference on Management of Data, January 1, 2017, 139--154,

Bin Dong, Kesheng Wu, Surendra Byna, Jialin Liu, Weijie Zhao, Florin Rusu, "ArrayUDF: User-defined scientific data analysis on arrays", Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, January 1, 2017, 53--64,

Kesheng Wu, Dongeun Lee, Alex Sim, Jaesik Choi, "Statistical data reduction for streaming data", 2017 New York Scientific Data Summit (NYSDS), 2017, 1--6,

Jonathan Wang, Kesheng Wu, Alex Sim, Seongwook Hwangbo, "Convolutional Filtering for Accurate Signal Timing from Noisy Streaming Data", 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech, 2017, 941--948,

Alina Lazar, Ling Jin, C Anna Spurlock, Kesheng Wu, Alex Sim, "Data quality challenges with missing values and mixed types in joint sequence analysis", 2017 IEEE International Conference on Big Data (Big Data), 2017, 2620--2627,

Tzuhsien Wu, Jerry Chou, Shyng Hao, Bin Dong, Scott Klasky, Kesheng Wu, "Optimizing the query performance of block index through data analysis and I/O modeling", Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, January 1, 2017, 1--10,

Bin Dong, Surendra Byna, Kesheng Wu, "SDS-Sort: Scalable Dynamic Skew-aware Parallel", HPDC 16, New York, NY, USA, ACM, 2016, 57--68, doi: 10.1145/2907294.2907300

Houjun Tang, Suren Byna, Steve Harenberg, Xiaocheng Zou, Wenzhao Zhang, Kesheng Wu, Bin Dong, Oliver Rubel, Kristofer Bouchard, Scott Klasky, others, "Usage pattern-driven dynamic data layout reorganization", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), January 1, 2016, 356--365,

Wenzhao Zhang, Houjun Tang, Steve Harenberg, Surendra Byna, Xiaocheng Zou, Dharshi Devendran, Daniel F Martin, Kesheng Wu, Bin Dong, Scott Klasky, others, "Amrzone: A runtime amr data sharing framework for scientific applications", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), January 1, 2016, 116--125,

Tzuhsien Wu, Hao Shyng, Jerry Chou, Bin Dong, Kesheng Wu, "Indexing blocks to reduce space and time requirements for searching large data files", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), January 1, 2016, 398--402,

Bin Dong, Surendra Byna, Kesheng Wu, "Sds-sort: Scalable dynamic skew-aware parallel sorting", Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, January 1, 2016, 57--68,

Xiaocheng Zou, David A Boyuka II, Dhara Desai, Daniel F Martin, Suren Byna, Kesheng Wu, "AMR-aware in situ indexing and scalable querying", Proceedings of the 24th High Performance Computing Symposium, January 1, 2016, 26,

Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, "Similarity Join over Array Data", SIGMOD, January 1, 2016, 2007--2022,

Dongeun Lee, Alex Sim, Jaesik Choi, Kesheng Wu, "Novel data reduction based on statistical similarity", Proceedings of the 28th International Conference on Scientific and Statistical Database Management, 2016, 1--12,

Wucherl Yoo, Alex Sim, Kesheng Wu, "Machine learning based job status prediction in scientific clusters", 2016 SAI Computing Conference (SAI), 2016, 44--53,

David Pugmire, James Kress, Jong Choi, Scott Klasky, Tahsin Kurc, Randy Michael Churchill, Matthew Wolf, Greg Eisenhower, Hank Childs, Kesheng Wu, others, "Visualization and analysis for near-real-time decision making in distributed workflows", 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2016, 1007--1013,

Bin Dong, Suren Byna, Kesheng Wu, Hans Johansen, Jeffrey N Johnson, Noel Keen, others, "Data elevator: Low-contention data movement in hierarchical storage system", 2016 IEEE 23rd international conference on high performance computing (HiPC), January 1, 2016, 152--161,

Houjun Tang, Suren Byna, Steve Harenberg, Wenzhao Zhang, Xiaocheng Zou, Daniel F Martin, Bin Dong, Dharshi Devendran, Kesheng Wu, David Trebotich, others, "In situ storage layout optimization for amr spatio-temporal read accesses", 2016 45th International Conference on Parallel Processing (ICPP), January 1, 2016, 406--415,

Wenzhao Zhang, Houjun Tang, Stephen Ranshous, Surendra Byna, Daniel F Mart\ \in, Kesheng Wu, Bin Dong, Scott Klasky, Nagiza F Samatova, "Exploring memory hierarchy and network topology for runtime AMR data sharing across scientific applications", 2016 IEEE International Conference on Big Data (Big Data), January 1, 2016, 1359--1366,

Utkarsh Ayachit, Andrew Bauer, Earl PN Duque, Greg Eisenhauer, Nicola Ferrier, Junmin Gu, Kenneth E Jansen, Burlen Loring, Zarija Lukic, Suresh Menon, others, "Performance analysis, design considerations, and applications of extreme-scale in situ infrastructures", SC 16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2016, 921--932, LBNL 1007264,

D. Pugmire, J. Kress, J. Choi, S. Klasky, Kurc, R. M. Churchill, M. Wolf, G., H. Childs, K. Wu, A. Sim, J. Gu, J. Low, "Visualization and Analysis for Near-Real-Time Decision in Distributed Workflows", 2016 IEEE International Parallel and Distributed Symposium Workshops (IPDPSW), 2016, 1007--1013, doi: 10.1109/IPDPSW.2016.175

Jinoh Kim, Bin Dong, Suren Byna, and Kesheng Wu, "Security for the Scientific Data Service Framework", 2nd International Workshop on Privacy and Security of Big Data (PSBD 2015), in conjunction with IEEE BigData 2015, 2015,

Xiaocheng Zou, Kesheng Wu, David A. Boyuka, Daniel F. Martin, Suren Byna, Houjun, Kushal Bansal, Terry J. Ligocki, Hans Johansen, and Nagiza F. Samatova, "Parallel In Situ Detection of Connected Components Adaptive Mesh Refinement Data", Proceedings of the Cluster, Cloud and Grid Computing (CCGrid) 2015, 2015,

S. Shannigrahi, A. J. Barczyk, C. Papadopoulos, A. Sim, I. Monga, H. Newman, K. Wu, E. Yeh, "Named Data Networking in Climate Research and HEP Applications", 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015), 2015,

Bin Dong, Surendra Byna, Kesheng Wu, "Heavy-tailed distribution of parallel I/O system response time", Proceedings of the 10th Parallel Data Storage Workshop, 2015, 37--42,

Bin Dong, Surendra Byna, Kesheng Wu, "Spatially clustered join on heterogeneous scientific data sets", 2015 IEEE International Conference on Big Data (Big Data), 2015, 371--380,

Wucherl Yoo, Michelle Koo, Yi Cao, Alex Sim, Peter Nugent, Kesheng Wu, "Patha: Performance analysis tool for hpc applications", 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC), 2015, 1--8,

Taehoon Kim, Dongeun Lee, Jaesik Choi, Anna Spurlock, Alex Sim, Annika Todd, Kesheng Wu, "Extracting baseline electricity usage using gradient tree boosting", 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), 2015, 734--741,

Taehoon Kim, Dongeun Lee, Jaesik Choi, C. Anna Spurlock, Alex Sim, Annika Todd, Kesheng Wu, "Extracting Baseline Electricity Usage with Gradient Boosting", International Conference on Big Intelligence and Computing (DataCom 2015), 2015, doi: 10.1109/SmartCity.2015.156

Spyros Blanas, Kesheng Wu, Surendra Byna, Bin Dong, Arie Shoshani, "Parallel Data Analysis Directly on Scientific File", SIGMOD 14, 2014, 385--396, doi: 10.1145/2588555.2612185

Spyros Blanas, Kesheng Wu, Surendra Byna, Bin Dong, Arie Shoshani, "Parallel Data Analysis Directly on Scientific File Formats", SIGMOD 14, 2014, 385--396, doi: 10.1145/2588555.2612185

Lingfei Wu, Kesheng Wu, Alex Sim, Michael Churchill, Jong Y Choi, Andreas Stathopoulos, CS Chang, Scott Klasky, "High-performance outlier detection algorithm for finding blob-filaments in plasma", Proc. of 5rd International Workshop on Big Data Analytics: Challenges and Opportunites (BDAC-14), held in conjunction with ACM/IEEE SC14, 2014,

Hsuan-Te Chiu, Jerry Chou, Venkat Vishwanath, Surendra Byna, Kesheng Wu, "Simplifying index file structure to improve I/O performance of parallel indexing", 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS), 2014, 576--583,

Bin Dong, Surendra Byna, Kesheng Wu, "Parallel query evaluation as a Scientific Data Service", 2014 IEEE International Conference on Cluster Computing (CLUSTER), January 1, 2014, 194--202,

Jialin Liu, S. Byna, Bin Dong, Kesheng Wu, Chen, "Model-Driven Data Layout Selection for Improving Read", Parallel Distributed Processing Symposium Workshops 2014 IEEE International, 2014, 1708--1716, doi: 10.1109/IPDPSW.2014.190

Jung Heon Song, Marcos L\ opez de Prado, Horst Simon, Kesheng Wu, "Exploring Irregular Time Series Through Non-uniform Fourier Transform", WHPCF 14, Piscataway, NJ, USA, IEEE Press, 2014, 37--44, doi: 10.1109/WHPCF.2014.8

Qian Sun, Fan Zhang, Tong Jin, Hoang Bui, Kesheng Wu, Arie Shoshani, Hemanth Kolla, Scott Klasky, Jacqueline Chen, Manish Parashar, "Scalable run-time data indexing and querying for scientific simulations", Big Data Analytics: Challenges and Opportunities (BDAC-14) Workshop at Supercomputing Conference, 2014,

Spyros Blanas, Kesheng Wu, Surendra Byna, Bin Dong, Arie Shoshani, "Parallel data analysis directly on scientific file formats", Proceedings of the 2014 ACM SIGMOD international conference on Management of data, January 1, 2014, 385--396,

L. Wu, K. Wu, A. Sim, M. Churchill, J. Y. Choi, A. Stathopoulos, CS Chang, S. Klasky, "High-Performance Outlier Detection Algorithm for Blob-Filaments in Plasma", 5th International Workshop on Big Data Analytics: and Opportunities (BDAC 14), 2014,

William Gu, Jaesik Choi, Ming Gu, Horst Simon, Kesheng Wu, "Fast Change Point Detection for Electricity Market Analysis", IEEE International Conference on Big Data, 2013, LBNL LBNL-6388E, doi: 10.1109/BigData.2013.6691733

E Wes Bethel, Prabhat Prabhat, Suren Byna, Oliver R\ ubel, K John Wu, Michael Wehner, "Why high performance visual data analytics is both relevant and difficult", Visualization and Data Analysis 2013, January 2013, 8654:86540B, LBNL LBNL-6063E,

Alex Romosan, Arie Shoshani, Kesheng Wu, Victor Markowitz, Kostas Mavrommatis, "Accelerating gene context analysis using bitmaps", Proceedings of the 25th International Conference on Scientific and Statistical Database Management, 2013, 1--12, LBNL 6397E,

Jong Y Choi, Kesheng Wu, Jacky C Wu, Alex Sim, Qing G Liu, Matthew Wolf, C Chang, Scott Klasky, "Icee: Wide-area in transit data processing framework for near real-time scientific applications", 4th SC Workshop on Petascale (Big) Data Analytics: Challenges and Opportunities in conjunction with SC13, 2013, 11,

Bin Dong, Surendra Byna, Kesheng Wu, "SDS: a framework for scientific data services", Proceedings of the 8th Parallel Data Storage Workshop, January 1, 2013, 27--32,

Bin Dong, Surendra Byna, Kesheng Wu, "Expediting scientific data analysis with reorganization of data", 2013 IEEE International Conference on Cluster Computing (CLUSTER), January 1, 2013, 1--8,

Kuan-Wu Lin, Surendra Byna, Jerry Chou, Wu, "Optimizing FastQuery performance on Lustre file", Proceedings of the 25th International Conference on and Statistical Database Management, 2013, 29,

W. Gu, J. Choi, M. Gu, H. D. Simon, K., "Fast Change Point Detection for electricity market", 2013 IEEE International Conference on Big Data, 2013, 50--57, doi: 10.1109/BigData.2013.6691733

Surendra Byna, Jerry Chou, Oliver Rubel, Homa Karimabadi, William S Daughter, Vadim Roytershteyn, E Wes Bethel, Mark Howison, Ke-Jou Hsu, Kuan-Wu Lin, others, "Parallel I/O, analysis, and visualization of a trillion particle simulation", SC 12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, January 2012, 1--12,

Allen R Sanderson, Brad Whitlock, H Childs, GH Weber, K Wu, others, "A system for query based analysis and visualization", January 2012, LBNL 5507E,

Oliver R\ ubel, Surendra Byna, Kesheng Wu, Fuyu Li, Michael Wehner, Wes Bethel, others, "Teca: A parallel toolkit for extreme climate analysis", Procedia Computer Science, Elsevier, January 2012, 9:866--876, LBNL 5352E,

We present TECA, a parallel toolkit for detecting extreme events in large climate datasets. Modern climate datasets expose parallelism across a number of dimensions: spatial locations, timesteps and ensemble members. We design TECA to exploit these modes of parallelism and demonstrate a prototype implementation for detecting and tracking three classes of extreme events: tropical cyclones, extra-tropical cyclones and atmospheric rivers. We process a modern TB-sized CAM5 simulation dataset with TECA, and demonstrate good runtime performance for the three case studies.

Elaheh Pourabbas, Arie Shoshani, Kesheng Wu, "Minimizing index size by reordering rows and columns", International Conference on Scientific and Statistical Database Management, January 2012, 467--484,

Benson Ma, Arie Shoshani, Alex Sim, Kesheng, Yong-Ik Byun, Jaegyoon Hahm, Min-Su Shin, "Efficient Attribute-Based Data Access in Astronomy", The 2nd International Workshop on Network-Aware Data Workshop (NDM2012), 2012, 562--571,

Ichitaro Yamazaki, Kesheng Wu, "A Communication-Avoiding Thick-Restart Lanczos Method a Distributed-Memory System", Lecture Notes in Computer Science, 2012, 7155:345--354, doi: 10.1007/978-3-642-29737-3_39

E. W. Bethel, Surendra Byna, Jerry Chou, Cormier-Michel, Cameron G. R. Geddes, Howison, Fuyu Li, Prabhat, Ji Qiang, R\ ubel, Rob D. Ryne, Michael Wehner, Wu, "Big Data Analysis and Visualization: What Do LINACS Tropical Storms Have In Common?", 11th International Computational Accelerator Physics ICAP 2012, Germany, 2012,

E. Wes Bethel, David Leinweber, Oliver Rübel, Kesheng Wu, "Federal Market Information Technology in the Post Flash Crash Era: Roles of Supercomputing", Workshop on High Performance Computational Finance at SC11, Seattle, WA, USA, November 2011, LBNL 5263E,

A. Shoshani, I. Altintas, J. Chen, G. Chin, A. Choudhary, D. Crawl, T. Critchlow, K. Gao, B. Grimm, H. Iyer, C. Kamath, A. Khan, S. Klasky, S. Koehler, S. Lang, R. Latham, J. W. Li, W. Liao, J. Ligon, Q. Liu, B. Ludaescher, P. Mouallem, M. Nagappan, N. Podhorszki, R. Ross, D. Rotem, N. Samatova, C. Silva, A. Sim, R. Tchoua, R. Thakur, M. Vouk, K. Wu, W. Yu, "The Scientific Data Management Center: Available Technologies and Highlights", SciDAC Conference, 2011,

Jerry Chou, Mark Howison, Brian Austin, Kesheng Wu, Ji Qiang, E Wes Bethel, Arie Shoshani, Oliver R\ ubel, Rob D Ryne, "Parallel index and query for large scale data analysis", Proceedings of 2011 international conference for high performance computing, networking, storage and analysis, 2011, 1--11, LBNL 5317E,

Surendra Byna, Michael F Wehner, Kesheng John Wu, "Detecting atmospheric rivers in large climate datasets", Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities, 2011, 7--14,

Extreme precipitation events on the western coast of North America are often traced to an unusual weather phenomenon known as atmospheric rivers. Although these storms may provide a significant fraction of the total water to the highly managed western US hydrological system, the resulting intense weather poses severe risks to the human and natural infrastructure through severe flooding and wind damage. To aid the understanding of this phenomenon, we have developed an efficient detection algorithm suitable for analyzing large amounts of data. In addition to detecting actual events in the recent observed historical record, this detection algorithm can be applied to global climate model output providing a new model validation methodology. Comparing the statistical behavior of simulated atmospheric river events in models to observations will enhance confidence in projections of future extreme storms. Our detection algorithm is based on a thresholding condition on the total column integrated water vapor established by Ralph et al. (2004) followed by a connected component labeling procedure to group the mesh points into connected regions in space. We develop an efficient parallel implementation of the algorithm and demonstrate good weak and strong scaling. We process a 30-year simulation output on 10,000 cores in under 3 seconds.

Kesheng Wu, Surendra Byna, Doron Rotem, Arie, "Scientific Data Services -- A High-Performance I/O with Array Semantics", HPCDB, IEEE, 2011, doi: 10.11v45/2125636.2125640

J. Chou, K. Wu, O. R\ ubel, M. Howison, Qiang, Prabhat, B. Austin, E. W. Bethel, D. Ryne, A. Shoshani, "Parallel Index and Query for Large Scale Data", SC11, 2011, doi: 10.1145/2063384.2063424

Jinoh Kim, Hasan Abbasi, Luis Chac\ on, Docan, Scott Klasky, Qing Liu, Norbert, Arie Shoshani, Kesheng Wu, "Parallel In Situ Indexing for Data-intensive", LDAV, 2011, 65--72, doi: 10.1109/LDAV.2011.6092319

Jerry Chou, Kesheng Wu, Prabhat, "FastQuery: A General Indexing and Querying System Scientific Data", SSDBM, 2011, 573--574, doi: 10.1007/978-3-642-22351-8_42

Jerry Chou, Kesheng Wu, Prabhat, "FastQuery: A Parallel Indexing System for Data", IASDS, IEEE, 2011, doi: 10.1109/CLUSTER.2011.86

Jerry Chou, Kesheng Wu, others, "Fastquery: A parallel indexing system for scientific data", 2011 IEEE International Conference on Cluster Computing, 2011, 455--464,

Kamesh Madduri, Kesheng Wu, "Massive-Scale RDF Processing Using Compressed Bitmap", SSDBM, Springer, 2011, 470--479, doi: 10.1007/978-3-642-22351-8_30

Oliver R\ ubel, Sean Ahern, E Wes Bethel, Mark D Biggin, Hank Childs, Estelle Cormier-Michel, Angela DePace, Michael B Eisen, Charless C Fowlkes, Cameron GR Geddes, others, "Coupling visualization and data analysis for knowledge discovery from multi-dimensional scientific data", Procedia computer science, Elsevier, January 2010, 1:1757--1764, LBNL 3669E,

Gunther Weber, "Recent advances in visit: Amr streamlines and query-driven visualization", 2010,

Daren Hasenkamp, Alexander Sim, Michael Wehner, Kesheng Wu, "Finding tropical cyclones on a cloud computing cluster: Using parallel virtualization for large-scale climate simulation analysis", 2010 IEEE Second International Conference on Cloud Computing Technology and Science, 2010, 201--208, LBNL 4218E,

 

 

Kesheng Wu, Arie Shoshani, Kurt Stockinger, "Analyses of multi-level and multi-component compressed indexes", ACM Transactions on Database Systems, ACM, 2010, 35:1--52, doi: 10.1145/1670243.1670245

Kesheng Wu, Kamesh Madduri, Shane Canon, "Multi-level bitmap indexes for flash memory storage", Proceedings of the Fourteenth International Database Engineering \& Applications Symposium, 2010, 114--116,

E. W. Bethel, C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, M. Day, E. Deines, T. Fogal, C. Garth, C. G. R. Geddes, H. Hagen, B. Hamann, C. Hansen, J. Jacobsen, K. Joy, J. Kruger, J. Meredith, P. Messmer, G. Ostrouchov, V. Pascucci, K. Potter, Prabhat, D. Pugmire, O. Rubel, A. Sanderson, C. Silva, D. Ushizima, G. Weber, B. Whitlock, K. Wu, "Occam's Razor and Petascale Visual Data Analysis", SciDAC 2009, J. of Physics: Conference Series, San Diego, California, July 2009, LBNL 2210E,

Luke Gosink, Kesheng Wu, E. Wes Bethel, John D. Owens, Kenneth I. Joy, "Data Parallel Bin-based Indexing for Answering Queries on Multi-core Architecture", Proceedings of the 21st International Conference on Scientific and Statistical Database Management (SSDBM), June 2009, 5566:110-129, LBNL 2211E,

K Wu, S Ahern, EW Bethel, J Chen, H Childs, C Geddes, J Gu, H Hagen, B Hamann, J Lauret, others, "FastBit: Interactively Searching Massive Data", Proc. of SciDAC 2009, 2009, LBNL 2164E,

E Bethel, "Modern Scientific Visualization is More than Just Pretty Pictures", January 2009, LBNL 1450E,

Luke J Gosink, Kesheng Wu, E Wes Bethel, John D Owens, Kenneth I Joy, "Data parallel bin-based indexing for answering queries on multi-core architectures", International Conference on Scientific and Statistical Database Management, 2009, 110--129,

 

 

Meiyappan Nagappan, Kesheng Wu, Mladen A Vouk, "Efficiently extracting operational profiles from execution logs using suffix arrays", 2009 20th International Symposium on Software Reliability Engineering, January 1, 2009, 41--50,

An important software reliability engineering tool is operational profiles. In this paper we propose a cost effective automated approach for creating second generation operational profiles using execution logs of a software product. Our algorithm parses the execution logs into sequences of events and produces an ordered list of all possible subsequences by constructing a suffix array of the events. The difficulty in using execution logs is that the amount of data that needs to be analyzed is often extremely large (more than a million records per day in many applications). Our approach is very efficient. We show that our approach requires O(N) in space and time to discover all possible patterns in N events. We discuss a practical implementation of the algorithm in the context of the logs from a large cloud computing system.

E Wes Bethel, Chris Johnson, Sean Ahern, John Bell, Peer-Timo Bremer, Hank Childs, Estelle Cormier-Michel, Marc Day, Eduard Deines, Tom Fogal, others, "Occam s razor and petascale visual data analysis", Journal of Physics: Conference Series, 2009, 180:012084,

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "High Performance Multivariate Visual Data Exploration for Extemely Large Data", Supercomputing (SC), Austin, Texas, USA, November 2008, LBNL 716E,

Rishi Rakesh Sinha, Marianne Winslett, Kesheng, Kurt Stockinger, Arie Shoshani, "Adaptive Bitmap Indexes for Space-Constrained", ICDE 2008, 2008, 1418--1420,

Kesheng Wu, Kurt Stockinger, Arie Shoshani, "Breaking the curse of cardinality on bitmap indexes", International Conference on Scientific and Statistical Database Management, 2008, 348--365,

Meiyappan Nagappan, Mladen A. Vouk, Kesheng Wu Alex Sim, Arie Shoshani, "Efficient Operational Profiling of Systems Using Arrays on Execution Logs", ISSRE, 2008, 313--314, doi: 10.1109/ISSRE.2008.45

E. Wes Bethel, Oliver Rübel, Prabhat, Wu, Gunther H. Weber, Valerio Pascucci Hank Childs, Ajith Mascarenhas, Jeremy, Sean Ahern, "Modern Scientific Visualization is More than Just Pictures", Numerical Modeling of Space Plasma Flows: (Astronomical Society of the Pacific Series), St. Thomas, USVI, 2008, 301--317,

Frederick Reiss, Kurt Stockinger, Kesheng Wu, Shoshani, Joseph M. Hellerstein, "Enabling Real-Time Querying of Live and Historical Data", SSDBM 2007, 2007,

Luke Gosink, John Shalf, Kurt Stockinger, Wu, Wes Bethel, "HDF5-FastQuery: Accelerating Complex Queries on Datasets using Fast Bitmap Indices", SSDBM 2006, Vienna, Austria, July 2006, IEEE Computer Society Press., 2006, 149--158,

F. Reiss, K. Stockinger, K. Wu, A. Shoshani J. M. Hellerstein, "Efficient analysis of live and historical streaming and its application to cybersecurity", 2006,

Kesheng Wu, "FastBit: an efficient indexing technology for data-intensive science", Journal of Physics: Conference Series, IOP Publishing, 2005, 16:556--560, LBNL LBNL-2164E, doi: 10.1088/1742-6596/16/1/077

Kesheng Wu, Junmin Gu, Jerome Lauret, Arthur Poskanzer, Arie Shoshani, Alexander Sim, Zhang, "Grid Collector: Facilitating Efficient Selective from Data Grids", International Supercomputer Conference 2005, 2005,

Kesheng Wu, Ekow Otoo, Arie Shoshani, "Optimizing connected component labeling algorithms", Medical Imaging 2005: Image Processing, 2005, 5747:1965--1976,

E. Wes Bethel, Scott Campbell, Eli Dart, Lee, Steven A. Smith, Kurt Stockinger, Tierney, Kesheng Wu, "Interactive Analysis of Large Network Data Collections Query-Driven Visualization", 2005,

Kurt Stockinger, John Shalf, Kesheng Wu, E Wes Bethel, "Query-driven visualization of large data sets", VIS 05. IEEE Visualization, 2005., 2005, 167--174,

Kesheng Wu, Wei-Ming Zhang, Victor, Jerome Lauret, Arie Shoshani, "The Grid Collector: Using an Event Catalog to Speed up Analysis in Distributed Environment", Proceedings of Computing in High Energy and Nuclear (CHEP) 2004, 2004,

Kesheng Wu, Wei-Ming Zhang, Alexander Sim, Gu, Arie Shoshani, "Grid Collector: An Event Catalog With Automated File", Proceedings of IEEE Nuclear Science Symposium 2003, 2003, doi: 10.1109/NSSMIC.2003.1351830

L. M. Bernardo, B. Gibbard, D. Malon, H. Nordberg, D. Olson, R. Porter, A. Shoshani, A. Sim, A. Vaniachine, T. Wenaus, K. Wu, D. Zimmerman, "New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC", Computing in High Energy Physics, 2000,

Kesheng Wu, Horst Simon, "An Evaluation of Parallel Shift-and-Invert Lanczos", Proceedings of The 1999 International Conference on and Distributed Processing Techniques and Las Vegas, Nevada, June 28 - July 1, 1999, 2913--19,

Kesheng Wu, Horst Simon, "Parallel Efficiency of the Lanczos method for problems", Berkeley, CA, 1999,

Kesheng Wu, Andrew Canning, Horst D. Simon, "A new Lanczos method for electronic structure", Proceedings of ACM/IEEE SC98 Conference, November 1998, in Orlando, FL, New York, NY, IEEE, 1998,

Book Chapters

E. Wes Bethel, Burlen Loring, Utkarsh Ayachit, P. N. Duque, Nicola Ferrier, Joseph Insley, Junmin Gu, Kress, Patrick O’Leary, Dave Pugmire, Silvio Rizzi, Thompson, Will Usher, Gunther H. Weber, Brad Whitlock, Wolf, Kesheng Wu, "Proximity Portability and In Transit, M-to-N Data Partitioning and Movement in SENSEI", In Situ Visualization for Computational Science, ( 2022) doi: 10.1007/978-3-030-81627-8_20

E. Wes Bethel, Burlen Loring, Utkarsh Ayatchit, David Camp, P. N. Duque, Nicola Ferrier, Joseph Insley, Junmin Gu, Kress, Patrick O’Leary, David Pugmire, Silvio Rizzi, Thompson, Gunther H. Weber, Brad Whitlock, Matthew Wolf, Kesheng Wu, "The SENSEI Generic In Situ Interface: Tool and Processing Portability at Scale", In Situ Visualization for Computational Science, ( 2022) doi: 10.1007/978-3-030-81627-8_13

Antoine Bambade, Kesheng Wu, "An Assessment of the Prediction Quality of VPIN", Advanced Analytics and Artificial Intelligence Applications, (IntechOpen: 2019)

Wucherl Yoo, Michelle Koo, Yi Cao, Alex Sim, Peter Nugent, Kesheng Wu, "Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters", Conquering Big Data with High Performance Computing, (Springer, Cham: 2016) Pages: 139--161

F. Rusu, P. Nugent, K. Wu, "Implementing the Palomar Transient Factory Real-Time Pipeline in GLADE: Results and", Lecture Notes in Computer Science, ( 2014) Pages: 53--66

David H. Bailey, Stephanie Ger, Marcos L\ opez Prado, Alexander Sim, Kesheng Wu, "Statistical Overfitting and Backtest Performance", http://ssrn.com/abstract2507040, ( January 1, 2014)

ISBN 978-1-78548-008-9

Kurt Stockinger, John Cieslewicz, Kesheng Wu, Rotem, Arie Shoshani, "Using Bitmap Indexing Technology for Combined and Text Queries", Annals of Information Systems, (Springer: 2008) Pages: 1--23

Kurt Stockinger, Kesheng Wu, "Bitmap indices for data warehouses", Data Warehouses and OLAP: Concepts, Architectures and Solutions, (IGI Global: 2007) Pages: 157--178

Presentation/Talks

C. Sim, K. Wu, A. Sim, I. Monga, C. Guok, F. Wurthwein, D. Davila, H. Newman, J. Balcas, Predicting Resource Usage Trends with Southern California Petabyte Scale Cache, 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2023), 2023,

J. Bellavita, C. Sim, K. Wu, A. Sim, S. Yoo, H. Ito, V. Garonne, E. Lancon, Understanding Data Access Patterns for dCache System, 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2023), 2023,

H-C. Yang, L. Jin, A. Lazar, A. Todd-Blick, A. Sim, K. Wu, Q. Chen, C. A. Spurlock, Gender Gaps in Mode Usage, Vehicle Ownership, and Spatial Mobility When Entering Parenthood: A Life Course Perspective, Transportation Research Board 102nd Annual Meeting,, 2023,

John Wu, Bin Dong, Alex Sim, Automating Data Management Through Unified Runtime Systems, DOE ASCR Workshop on the Management and Storage of Scientific Data, 2022, doi: 10.2172/1843500

John Wu, Ben Brown, Paolo Calafiura, Quincey Koziol, Dongeun Lee, Alex Sim, Devesh Tiwari, Support for In-Flight Data Analyses in Scientific Workflows, DOE ASCR Workshop on the Management and Storage of Scientific Data, 2022, doi: 10.2172/1843500

L. Jin, A. Lazar, J. Sears, A. Todd, A. Sim, K. Wu, C. A. Spurlock, Life course as a contextual system to investigate the effects of life events, gender and generation on travel mode usage, The Behavior, Energy & Climate Change Conference (BECC), 2019,

Reports

C. A. Spurlock, A. Gopal, J. Auld, P. Leiby, C. Sheppard, T. Wenzel, S. Belal, A. Duvall, A. Enam, S. Fujita, A. Henao, L. Jin, E. Kontou, A. Lazar, Z. Needell, C. Rames, T. Rashidi, J. Sears, A. Sim, M. Stinson, M. Taylor, A. Todd-Blick, O. Verbas, V. Walker, J. Ward, G. Wong-Parodi, K. Wu, H.-C. Yang, "SMART Mobility, Mobility Decision Science Capstone Report", Vehicle Technologies Office (VTO), Office of Energy Efficiency and Renewable Energy (EERE), US Department of Energy, 2020,

Hongyuan Zhan, Gabriel Gomes, Xiaoye S Li, Kamesh Madduri, Kesheng Wu, "Efficient Online Hyperparameter Optimization for Kernel Ridge Regression with Applications to Traffic Time Series Prediction", arXiv preprint arXiv:1811.00620, 2018,

Kesheng Wu, Horst D Simon, "High-Performance Computational Intelligence and Forecasting Technologies", 2018,

David H Bailey, Stephanie Ger, Marcos L\ opez de Prado, Alexander Sim, "Statistical overfitting and backtest performance", Risk-Based and Factor Investing, 2015,

http://ssrn.com/abstract=2507040

L. Wu, K. Wu, A. Sim, M. Churchill, J. Y. Choi, A. Stathopoulos, C.S. Chang, S. Klasky, "Towards Real-Time Detection and Tracking of Blob-Filaments in Fusion Plasma Big Data", WM-CS-2015-01, Department of Computer Science, College of William and Mary, 2015, doi: 10.48550/arXiv.1505.03532

Luke J Gosink, "Bin-hash indexing: A parallel method for fast query processing", 2008, LBNL 729E,

I. Yamazaki, K. Wu, H. Simon, "nu-TRLan User Guide version 1.0", 2008, LBNL 1288E,

Kesheng Wu, "Fastbit reference manual", 2007, LBNL LBNL PUB/3192,

K. Wu, K. Stockinger, A. Shoshani, Wes, "FastBit--Helps Finding the Proverbial Needle in a", 2006, LBNL LBNL-PUB/963,

K. Wu, E. Otoo, "A simpler proof of the average case complexity of with path compression", 2005,

Kesheng Wu, Ekow Otoo, Kenji Suzuki, "Two Strategies to Speed up Connected Component Algorithms", 2005,

Kesheng Wu, Ekow J Otoo, Arie Shoshani, "An efficient compression scheme for bitmap indices", 2004,

Kesheng Wu, Wei-Ming Zlang, Alexander Sim, Junmin Gu, Arie Shoshani, "Grid collector: An event catalog with automated file management", 2003 IEEE Nuclear Science Symposium. Conference Record (IEEE Cat. No. 03CH37515), 2003, LBNL 55563,

Kesheng Wu, Horst D Simon, "Dynamic restarting schemes for eigenvalue problems", 1999,

Posters

J. W. Chung, A. Sim, B. Quiter, Y. Wu, W. Zhao, K. Wu, "Preparing Spectral Data for Machine Learning: A Study of Geological Classification from Aerial Surveys", Machine Learning and the Physical Sciences Workshop (ML4PS), 2023,

R. Monga, A. Sim (advisor), K. Wu (advisor), "Comparative Study of the Cache Utilization Trends for Regional Scientific Data Caches", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’23), ACM Student Research Competition (SRC), First place winner, 2023,

Julian Bellavita, Alex Sim (advisor), John Wu (advisor), "Predicting Scientific Dataset Popularity Using dCache Logs", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’22), ACM Student Research Competition (SRC), Second place winner, 2022,

Poster (PDF)

The dCache installation is a storage management system that acts as a disk cache for high-energy physics (HEP) data. Storagespace on dCache is limited relative to persistent storage devices, therefore, a heuristic is needed to determine what data should be kept in the cache. A good cache policy would keep frequently accessed data in the cache, but this requires knowledge of future dataset popularity. We present methods for forecasting the number of times a dataset stored on dCache will be accessed in the future. We present a deep neural network that can predict future dataset accesses accurately, reporting a final normalized loss of 4.6e-8. We present a set of algorithms that can forecast future dataset accesses given an access sequence. Included are two novel algorithms, Backup Predictor and Last N Successors, that outperform other file prediction algorithms. Findings suggest that it is possible to anticipate dataset popularity in advance.

C. Sim, C. Guok (advisor), A. Sim (advisor), K. Wu (advisor), "Data Throughput Performance Trends of Regional Scientific Data Cache", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’22), ACM Student Research Competition (SRC), 2022,

A. Pereira, A. Sim, K. Wu, S. Yoo, H. Ito, "Data access pattern analysis for dCache storage system", International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2022), 2022,

J. Cheung, A. Sim, J. Kim, K. Wu, "Performance Prediction of Large Data Transfers", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC21), ACM Student Research Competition (SRC), 2021,

E. Copps, A. Sim (Advisor), K. Wu (Advisor), "Analyzing scientific data sharing patterns with in-network data caching", ACM Richard Tapia Celebration of Diversity in Computing (TAPIA 2021), ACM Student Research Competition (SRC), 2021,

Brett Weinger, Alex Sim (Advisor), John Wu (Advisor), Jinoh Kim (Advisor), "Enhancing IoT Anomaly Detection Performance for Federated Learning", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’20), ACM Student Research Competition (SRC), 2020,

P. Linton, W. Melodia, A. Lazar, D. Agarwal, L. Bianchi, D. Ghoshal, K. Wu, G. Pastorello, L. Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), 2019,

Alexandra Ballow, Alina Lazar (Advisor), Alex Sim (Advisor), Kesheng Wu (Advisor), "Handling Missing Values in Joint Sequence Analysis", ACM Richard Tapia Celebration of Diversity in Computing (TAPIA 2019), ACM Student Research Competition (SRC), First place winner, Pages: 19 2019,

Alexandra Ballow, Alina Lazar, Alex Sim, Kesheng Wu, "Joint Sequence Analysis Challenges: How to Handle Missing Values and Mixed Variable Types", SIAM Conference on Computational Science and Engineering (CSE19), 2019,

Tyler Leibengood, Alina Lazar, Alex Sim, Kesheng Wu, "Network Traffic Performance Prediction with Multivariate Clusters in Time Windows", SIAM Conference on Computational Science and Engineering (CSE19), 2019,

Olivia Del Guercio, Rafael Orozco, Alex Sim, Kesheng Wu, "Multidimensional Compression with Pattern Matching", 2019 Data Compression Conference (DCC), Pages: 567--567 2019,

Burak Cetin, Alina Lazar, Jinoh Kim, Alex Sim, Kesheng Wu, "Federated Wireless Network Intrusion Detection", 2019 IEEE International Conference on Big Data (Big Data), Pages: 6004--6006 2019,

Karen Tu, Alex Sim (Advisor), John Wu (Advisor), "Identification of Network Data Transfer Bottlenecks in HPC Systems", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’18), ACM Student Research Competition (SRC), 2018,

Alina Lazar, Kesheng Wu, Alex Sim, "Predicting Network Traffic Using TCP Anomalies", 2018 IEEE International Conference on Big Data (Big Data), Pages: 5369--5371 2018,

Dongeun Lee, Alex Sim, Jaesik Choi, Kesheng Wu, "Expanding statistical similarity based data reduction to capture diverse patterns", 2017 Data Compression Conference (DCC), Pages: 445--445 2017,

Jonathan Wang, Kesheng Wu, Alex Sim, Seongwook Hwangbo, "Feature Engineering and Classification Models for Partial Discharge Events in Power Transformers", Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, Pages: 269--270 2017,

Peter Harrington, Wucherl Yoo, Alexander Sim, Kesheng Wu, "Diagnosing parallel I/O bottlenecks in HPC applications", International Conference for High Performance Computing Networking Storage and Analysis (SCI7) ACM Student Research Competition (SRC), 2017,

Jonathan Wang, Kesheng Wu, Alex Sim, Seongwook Hwangbo, "Accurate signal timing from high frequency streaming data", 2017 IEEE International Conference on Big Data (Big Data), Pages: 4852--4854 2017,

M. Bae, W. Yoo (Advisor), A. Sim (Advisor), K. Wu (Advisor), "Discovering Energy Resource Usage Patterns on Scientific Clusters", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), ACM Student Research Competition (SRC), Third place winner, 2016, 2016,

M. Bryson, S. Byna (Advisor), A. Sim (Advisor), K. Wu (Advisor), "The Search for Missing Parallel IO Performance on the Cori Supercomputer", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), ACM Student Research Competition (SRC), 2016,

Jonathan Wang, Wucherl Yoo, Alex Sim, K John Wu, "Analysis of Variable Selection Methods on Scientific Cluster Measurement Data", 2016,

Xiaocheng (Chris) Zou, Suren Byna, Hans Johansen, Daniel Martin, Nagiza F. Samatova, Arie Shoshani, John Wu, "Six-fold Speedup of Ice Calving Detection Achieved by AMR-aware Parallel Connected Component Labeling", SciDAC PI Meeting, July 2015, 2015,

John Wu, Alex Sim, Lingfei Wu, Abraham Frankl, Scott Klasky, Jong Y Choi, CS Chang, Michael Churchill, "Exercising ICEE Framework with Fusion Blob Detection", DOE/ASCR NGNS PI meeting, 2014,

Lingfei Wu, Kesheng Wu, Alex Sim, Andreas Stathopoulos, "Real-time outlier detection algorithm for finding blob-filaments in plasma", ACM/IEEE SC14 ACM SRC Poster, 2014,

R. Ryne, B. Austin, J. Byrd, J. Corlett, E. Esarey, C. G. R. Geddes, W. Leemans, X. Li, Prabhat, J. Qiang, O. Rübel, J.-L. Vay, M. Venturini, K. Wu, B. Carlsten, D. Higdon and N. Yampolsky, "High Performance Computing in Accelerator Science: Past Successes, Future Challenges", Workshop on Data and Communications in Basic Energy Sciences: Creating a Pathway for Scientific Discovery, October 2011,

M Prabhat, S Byna, C Paciorek, G Weber, K Wu, T Yopes, MF Wehner, G Ostrouchov, D Pugmire, R Strelitz, others, "Pattern Detection and Extreme Value Analysis on Large Climate Data", AGUFM, Pages: IN41C--03 January 2011,

D. Hasenkamp, A. Sim, M. Wehner, K. Wu, "Finding Tropical Cyclones on Clouds", Supercomputing 2010, ACM SRC 3rd place, 2010,

O. Rübel, Prabhat, K. Wu, H. Childs, J. Meredith, C.G.R. Geddes, E. Cormier-Michel, S. Ahern, G.H. Weber, P. Messmer, H. Hagen, B. Hamann and E.W. Bethel, "Application of High-performance Visual Analysis Methods to Laser Wakefield Particle Acceleration Data", IEEE Visualization 2008, October 2008,

Others

Yujing Ma, Florin Rusu, Kesheng Wu, Alexander Sim, 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Pages: 1088--1097 2022, doi: 10.1109/IPDPSW55747.2022.00177

Ling Jin, Alina Lazar, Caitlin Brown, Bingrong Sun, Venu Garikapati, Srinath Ravulaparthy, Qianmiao Chen, Alexander Sim, Kesheng Wu, Tin Ho, Thomas Wenzel, C. Anna Spurlock, What Makes You Hold on to That Old Car? Joint Insights from Machine Learning and Multinomial Logit on Vehicle-level Transaction Decisions, Transportation Research Board 101st Annual Meeting, 2022,

Y. Ma, F. Rusu, K. Wu, A. Sim, Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers, arXiv preprint arXiv:2110.07029, 2021,

J. Kim, A. Sim, J. Kim, K, Wu, J. Hahm, Improving Botnet Detection with Recurrent Neural Network and Transfer Learning, arXiv preprint arXiv:2104.12602, 2021,

Veronica Rodr\iguez Tribaldos, Nathaniel J Lindsey, Shan Dou, Craig Ulrich, Michelle Robertson, Bin Dong, Vincent Dumont, Kesheng Wu, Inder Monga, Chris Tracy, others, Combining Ambient Noise and Distributed Acoustic Sensing (DAS) Deployed on Dark Fiber Networks for High-resolution Imaging at the Basin Scale, AGU Fall Meeting 2020, 2020,

Jonathan Blair Ajo-Franklin, Ver\ onica Rodr\ \iguez Tribaldos, Avinash Nayak, Nathaniel J Lindsey, Feng Cheng, Benxin Chi, Bin Dong, Kesheng Wu, Inder Monga, Distributed Acoustic Sensing (DAS) at the Plot to Basin Scale: Connecting Near-Surface Sensing and Seismology with a Common Observational Tool, AGU Fall Meeting 2020, 2020,

Jeeyung Kim, Alex Sim, Jinoh Kim, Kesheng Wu, Botnet Detection Using Recurrent Variational Autoencoder, arXiv preprint arXiv:2004.00234, 2020,

Jung Heon Song, Marcos L\ opez de Prado, Horst D Simon, Kesheng Wu, Extracting Signals from High-Frequency Trading with Digital Signal Processing Tools, The Journal of Financial Data Science, Pages: 124--138 2019,

Kesheng Wu, Alex Sim, Jonathan Wang, Seongwook Hwangbo, Methods, systems, and devices for accurate signal timing of power component events, 2019,

US Patent app no. 20190138371, “Methods, systems, and devices for accurate signal timing of power component events”

Payton Linton, William Melodia, Alina Lazar, Deborah Agarwal, Ludovico Bianchi, Devarshi Ghoshal, Gilberto Pastorello, Lavanya Ramakrishnan, Kesheng Wu, Understanding Data Similarity in Large-Scale Scientific Datasets, 2019 IEEE International Conference on Big Data (Big Data), Pages: 4525--4531 2019,

Kesheng Wu, Surendra Byna, Bin Dong, others, VPIC IO utilities, 2018,

Kesheng Wu, Elizabeth N Coviello, SM Flanagan Martin Greenwald, Xia Lee, Alex Romosan, P Schissel, Arie Shoshani, Josh Stillerman John Wright, MPO: A System to Document and Analyze Distributed Workflows, International Provenance and Annotation Workshop, Pages: 166--170 2016, doi: 10.1007/978-3-319-40593-3_14

W. Yoo, M. Koo, Y. Cao, A. Sim, P. Nugent, K. Wu, PATHA: Performance Analysis Tool for HPC, 2015 IEEE 34th International Performance Computing and Conference (IPCCC), Pages: 1--8 2015, doi: 10.1109/PCCC.2015.7410313

Jung Heon Song, Marcos Lopez de Prado, Horst D, Kesheng Wu, Understanding Natural Gas Futures Trading Through Data, Available at SSRN 2657224, 2015,

Gili Rosenberg, Poya Haghnegahdar, Phil Goddard Peter Carr, Kesheng Wu, Marcos L\ opez de, Solving the optimal trading trajectory problem using a annealer, Proceedings of the 8th Workshop on High Performance Finance, Pages: 7 2015,

Bin Dong, S. Byna, Kesheng Wu, Parallel query evaluation as a Scientific Data, Cluster Computing (CLUSTER), 2014 IEEE International on, Pages: 194--202 2014, doi: 10.1109/CLUSTER.2014.6968765

Hsuan-Te Chiu, Jerry Chou, Venkat Vishwanath, Byna, Kesheng Wu, Simplifying Index File Structure to Improve I/O of Parallel Indexing, The 20th IEEE International Conference on Parallel and Systems (ICPADS 2014), 2014,

Kesheng Wu, Wes Bethel, Ming Gu, David, Oliver R\ ubel, Testing VPIN on Big Data, Available at SSRN 2318259, 2013,

Bin Dong, S. Byna, Kesheng Wu, Expediting scientific data analysis with of data, Cluster Computing (CLUSTER), 2013 IEEE International on, Pages: 1--8 2013, doi: 10.1109/CLUSTER.2013.6702675

Jong Y. Choi, Kesheng Wu, Jacky C. Wu, Alex, Qing G. Liu, Matthew Wolf, CS Chang, Klasky, ICEE: Wide-area In Transit Data Processing Framework Near Real-Time Scientific Applications, PDAC workshop, SC13, 2013,

E. Wes Bethel, David Leinweber, Oliver Rübel Kesheng Wu, Federal Market Information Technology in the Crash Era: Roles for Supercomputing, The Journal of Trading, Pages: 9--25 2012, doi: 10.3905/jot.2012.7.2.009

Weikuan Yu, Kesheng Wu, Wei-Shinn Ku, Cong Xu Juan Gao, BMF: Bitmapped Mass Fingerprinting for Fast Protein, CLUSTER, 2011, doi: 10.1109/CLUSTER.2011.11

C. G. R. Geddes, E Cormier-Michel, E. H. Esarey, C. B. Schroeder, J.-L. Vay, W. P. Leemans, D. L.. Bruhwiler, J. R. Cary, B. Cowan, M. Durant, P. Hamill, P. Messmer, P. Mullowney, C. Nieter, K. Paul, S. Shasharina, S. Veitzer, G. Weber, O. Rübel, D. Ushizima, Prabhat, E. W.Bethel, K. Wu, Large Fields for Smaller Facility Sources, SciDAC Review, Pages: 13-21, 2009,

Ekow Otoo, Kesheng Wu, Accelerating queries on very large datasets, 2009,

Meiyappan Nagappan, Kesheng Wu, Mladen A. Vouk, Efficiently Extracting Operational Profiles from Logs Using Suffix Arrays, ISSRE, Pages: 41--50 2009, doi: 10.1109/ISSRE.2009.23

Kamesh Madduri, Kesheng Wu, Efficient joins with compressed bitmap indexes, Proceedings of the 18th ACM conference on Information and knowledge management, Pages: 1017--1026 2009,

Luke J. Gosink, Kesheng Wu, E. Wes Bethel, D. Owens, Kenneth I. Joy, Bin-Hash Indexing: A Parallel Method For Fast Processing, 2008,

Oliver R\ ubel, Prabhat, Kesheng Wu, Hank, Jeremy Meredith, Cameron G. R. Geddes, Cormier-Michel, Sean Ahern, Gunther H., Peter Messmer, Hans Hagen, Bernd Hamann E. Wes Bethel, Application of High-performance Visual Analysis to Laser Wakefield Particle Acceleration Data, IEEE Visualization 2008, 2008,

Oliver R\ ubel, Prabhat, Kesheng Wu, Hank, Jeremy Meredith, Cameron G. R. Geddes, Cormier-Michel, Sean Ahern, Gunther H., Peter Messmer, Hans Hagen, Bernd Hamann E. Wes Bethel, High Performance Multivariate Visual Data Exploration Extemely Large Data, SuperComputing 2008 (SC08), Pages: 51 2008,

Kesheng Wu, Kurt Stockinger, Arie Shoshani, Performance of Multi-Level and Multi-Component Bitmap Indexes, 2007, doi: 10.1145/1670243.1670245

Elizabeth O Neil, Patrick O Neil, Kesheng Wu, Bitmap Index Design Choices and Their Performance, IDEAS 2007, Pages: 72--84 2007,

Luke Gosink, John Shalf, Kurt Stockinger, Kesheng Wu, Wes Bethel, HDF5-FastQuery: Accelerating complex queries on HDF datasets using fast bitmap indices, 18th International Conference on Scientific and Statistical Database Management (SSDBM 06), Pages: 149--158 2006,

E. Wes Bethel, Scott Campbell, Eli Dart, Kurt Stockinger, Kesheng Wu, Accelerating Network Traffic Analysis Using Visualization, Symposium on Visual Analytics Science and Technology Baltimore, Maryland, USA, October 31 - November 2006, Pages: 115--122 2006,

E. Wes Bethel, Scott Campbell, Eli Dart, John Shalf, Kurt Stockinger, Kesheng Wu, High Performance Visualization using Query-Driven and Analytics, 2006,

Kurt Stockinger, E. Wes Bethel, Scott Campbell, Eli Dart, Kesheng Wu, Detecting distributed scans using high-performance visualization, SC 06, Pages: 82 2006,

Doron Rotem, Kurt Stockinger, Kesheng Wu, Minimizing I/O Costs of Multi-Dimensional Queries Bitmap Indices, SSDBM 2006, Vienna, Austria, July 2006, 2006,

Kurt Stockinger, John Shalf, Wes Bethel, Kesheng Wu, DEX: Increasing the Capability of Scientific Data Analysis by Using Efficient Bitmap Indices to Accelerate Scientific Visualization, SSDBM, Pages: 35-44 2005,

Kurt Stockinger, Kesheng Wu, Scott Campbell, Lau, Mike Fisk, Eugene Gavrilov, Alex, Christopher E. Davis, Rick Olinger, Rob, Jim Prewett, Paul Weber, Thomas P., E. Wes Bethel, Steve Smith, Network Traffic Analysis With Query Driven, SC 2005, 2005,

Doron Rotem, Kurt Stockinger, Kesheng Wu, Optimizing I/O Costs of Multi-dimensional Queries Bitmap Indices., DEXA, Pages: 220--229 2005,

K. Wu, A. Shoshani, E. J. Otoo, Word aligned bitmap compression method, data and apparatus, US Patent 6,831,575, 2004,

Kurt Stockinger, Kesheng Wu, Arie Shoshani, Evaluation Strategies for Bitmap Indices with, International Conference on Database and Expert Applications (DEXA 2004), Zaragoza, Spain, 2004,

Kesheng Wu, Ekow Otoo, Arie Shoshani, Compressing Bitmap Indexes for Faster Search, Proceedings of SSDBM 02, Pages: 99--108 2002,

Kurt Stockinger, Kesheng Wu, Arie Shoshani, Strategies for processing ad hoc queries on large data, Proceedings of DOLAP 02, Pages: 72--79 2002,

Kesheng Wu, Ekow J Otoo, Arie Shoshani, A performance comparison of bitmap indexes, Proceedings of the tenth international conference on Information and knowledge management, Pages: 559--561 2001,

Kesheng Wu, Horst Simon, TRLAN user guide, 1999,

Kesheng Wu, Yousef Saad, Andreas Stathopoulos, Inexact Newton Preconditioning Techniques for Problems, Electronic Transactions on Numerical Analysis, Pages: 202--214 1998,

Kesheng Wu, Horst D. Simon, Thick-restart Lanczos method for symmetric problems, Lecture Notes in Computer Science, Pages: 43--55 1998,

Kesheng Wu, Stability of midpoint methods on second order ODEs, 1992,