Kesheng "John" Wu

武克胜

Telephone: 1(510)486-6609
E-mail: kwu@lbl.gov or john.wu@acm.org
Mailstop 50B-3238, 1 Cyclotron Road, Berkeley, CA 94720, USA
 

Research Interests
* Analysis and management of large datasets
* Parallel distributed software systems design and implementation
* Component architecture for scientific software
* Performance tuning for large distributed software

Projects
Make It A Bit Faster with FastBit
* FastBit [Publications]: an efficient compressed bitmap index technology for data intensive sciences. This project addresses the challenges of efficiently searching growing amounts of data collected/generated by various scientific applications, such as high-energy physics, combustion, astrophysics, and network traffic analysis. The FastBit software has received an R&D 100 Award; here is a photo from the award receiption.
* Connected Component Labeling [Publications]: an efficient connected component labeling algorithm. This grows out our work on feature tracking for a combustion data analysis. The key new insight is that there is a way to make use of an implicit union-find data structure to speed up the connected component labeling algorithms, which in turn leads to faster algorithms for finding regions of interest. In particular, using compressed bitmaps as representations of points in the regions of interest, we can find the regions in time that is proportional to the the number of points on the boundary of the regions. This is faster than the best iso-contouring algorithms and much faster than similar region finding algorithms. This is also a basis of some of the work on visualization and visual analytics.
* DEX [Publications]: a query-based visualization tool. This project provides a new visualization capability based on the FastBit technology and the fast connected component labeling technology. This effective combination was first demonstrated on a project of analyzing combustion simulation data. It is extensively documented in our paper at IEEE Vis 2005, and also appeared in a SciDAC review report about the Scientific Data Management Center.
* TRLan [Publications]: Thick-restart Lanczos method for symmetric eigenvalue problems. An implementation in Fortran 90 is available with a BSD-like license.

Selected Publications
* Kesheng Wu, Ekow Otoo and Kenji Suzuki. Optimizing two-pass connected-component labeling algorithms. Pattern Analysis & Applications v11 DOI 10.1007/s10044-008-0109-y. 2008.
[Abstract] [Preprint as LBNL-59102]
* Kesheng Wu, Ekow Otoo, and Arie Shoshani, Optimizing bitmap indices with efficient compression. ACM Transactions on Database Systems, v 31, pages 1-38, 2006.
[Abstract] [Preprint as LBNL-49626]
* Kesheng Wu, Ekow Otoo, and Arie Shoshani, On the Performance of Bitmap Indices for High Cardinality Attributes. VLDB 2004, pages 24 - 35.
[Abstract] [Preprint as LBNL-54673]
* Kesheng Wu and Horst Simon, Thick-restart Lanczos method for large symmetric eigenvalue problems. SIAM Journal on Matrix Analysis and Applications. Vol. 22, No. 2, pp. 602-616, 2001.
[Abstract] [Preprint as LBNL-41412].

*Publications listed elsewhere on the web: Google Scholar, DBLP, CiteSeer, and Microsoft Libra.

---

Work environment
University of California
Lawrence Berkeley National Laboratory, LBNL on youTube
Computational Research Division
High Performance Computing Research Department
Scientific Data Management Group
Earlier work
Scientific computing work at University of Minnesota
Related sites on database research
University of Minnesota Database group
UC Berkeley Database group
Stanford InfoLab
DBLife
Related sites on scientific computing (eigenvalues in particular)
PRIMME
ARPACK
PRISM
John Wu Disclaimers

http://lbl.gov/~kwu