Berkeley Lab Scientific Computing Seminar

Date:
Friday, April 28, 2006
Time:
1:00pm-2:00pm
Location:
Building 50A-5132
Seminar Speaker:
Yitzhak Mandelbaum
Princeton University
http://www.cs.princeton.edu/~yitzhakm/
Title:
The Theory and Practice of Data Description
Abstract:
Massive amounts of useful data are stored and processed in non-standard or ad hoc formats, for which critical tools like parsers and formatters do not exist. Traditional databases and XML systems provide rich infrastructure for processing well-behaved data, but are of little help when dealing with data in ad hoc formats.

I will discuss my attempts to address the challenges of ad hoc data with my work on the PADS project. I will present an introduction to PADS/ML, a declarative data description language that permits analysts to describe the physical layout of their data and its semantic properties. From a description, the PADS compiler can automatically generate a collection of useful data-processing tools for the data source described, including parsing routines, statistical profiling tools, and translators to standard formats like XML. I will discuss the formal semantics of the PADS language and two of its essential properties. Finally, I will describe support for querying ad hoc data with the PADS tool PADX. I will discuss PADX from the users perspective and review the main challenges encountered in implementing PADX and their solutions.

Sponsor of Seminar:
Arie Shoshani
Scientific Computing

Contact Esmond G. Ng EGNg@lbl.gov