CRD’s pyGlobus Tools Proving Popular
January 10, 2006
CRD’s Distributed Systems Department (DSD), which has led the development of the de facto standard tools for developing Grid Services, applications and portals using the Python programming language, proved a popular draw at the LBNL booth at the SC05 conference in Seattle.
Python is a high-level interpreted language that supports a rapid application development cycle. Python’s minimal syntax makes it an ideal language for use by non-computer scientists. It also easily supports binding together C/C++ and Fortran codes and exposing them through a thin Python interface. By enabling scientists to focus less on computer science details, our Python Grid tools allow scientists to focus on their science.
Keith Jackson led DSD’s development of the Python Commodity Grid (CoG) Kit, or pyGlobus, to provide access to the original Globus Toolkit developed at Argonne National Lab and USC’s Information Sciences Institute. The Python CoG Kit has been an important part of Grid development for a number of projects, including the DOE-funded Access Grid project and the NSF-funded Laser Interferometer Gravitational Wave Observatory (LIGO).
The Python CoG Kit provides a thin veneer of Python over the underlying Globus Toolkit C code. pyGlobus provides a simple, easy to use, object-oriented interface to Globus while still providing the full power and performance of the underlying C code.
The department also developed a Python implementation, called pyGridWare, of the next generation of Grid standards based on Web services, i.e., the Web Service Resource Framework (WSRF) and Web Service Notification (WS-N) specifications. In addition to building lower-level toolkits, DSD has also developed a Visual Composition Environment, or ViCE (see sidebar) to support the collaborative development and execution of complex scientific workflows.
The pyGridWare Toolkit provides a vital set of tools for current Python Grid projects that are transitioning to the new Web service-based Grid architecture. The most recent Grid standards, WSRF and WS-N, are based on industry-standard Web services. A Web service is simply any service that describes its interface in a standard format based on XML and is accessible via standard Web protocols. The use of standard protocols enables the scientific world to leverage the significant corporate investment in Web service infrastructure, and allows multiple interoperable implementations to be developed. pyGridWare is interoperable with the Java and C implementations from Argonne and is mostly compatible with the .Net implementation from the University of Virginia.
In addition to support for developing WSRF applications from scratch, DSD has developed a tool to automatically wrap an existing command-line application to expose it as a WSRF service. This allows a scientist to take an existing application that is run locally and expose it as a Grid service that is accessible over the network.
By allowing existing applications to be easily converted into Grid services, the goal is to leverage the significant investment DOE has made in high performance codes, such as those developed under the SciDAC Integrated Software Infrastructure Centers, while still exposing these applications as Grid services as part of the emerging national middleware infrastructure.
ViCE
DSD’s work in higher-level interfaces to the Grid has led to the development of a visual programming tool, ViCE, that is used to collaboratively develop and execute complex scientific workflows.
Scientific projects today are frequently large collaborations among geographically and organizationally distributed teams. ViCE is designed to support scientists collaborating over a visual description of their workflow. A workflow is represented as a set of nodes and links. The nodes represent actions, such as querying a protein sequence database, and links represent the data transfer between nodes. By dragging and dropping a series of domain-specific nodes onto a palette, a scientist can construct a complete workflow.
The accompanying figure shows a typical visual workflow description from biology. The biologists are searching several protein sequence databases, looking for a likely match to a newly sequenced protein.
ViCE supports collaboration by allowing multiple groups to have the same view of the changing workflow description. They can use integrated chat tools to discuss the workflow. Future versions will support the collaborative editing of the visual workflow description.
About Berkeley Lab
Founded in 1931 on the belief that the biggest scientific challenges are best addressed by teams, Lawrence Berkeley National Laboratory and its scientists have been recognized with 16 Nobel Prizes. Today, Berkeley Lab researchers develop sustainable energy and environmental solutions, create useful new materials, advance the frontiers of computing, and probe the mysteries of life, matter, and the universe. Scientists from around the world rely on the Lab’s facilities for their own discovery science. Berkeley Lab is a multiprogram national laboratory, managed by the University of California for the U.S. Department of Energy’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.