What is the Parallel Problems Server?


The Parallel Problems Server

The Parallel Problems Server (PPServer) project bridges the gap between scientific computing in the workstation and supercomputer worlds. It is a collaborative project between the MIT Artificial Intelligence Laboratory and the MIT Laboratory for Computer Science. The Parallel Problems Server provides interactive clients with access to powerful functionality and users of parallel machines access to interactive environments where they can manipulate and visualize large data sets.

Simply put, the Parallel Problems Server is a linear algebra compute server for large matrices. It contains functions for creating and removing distributed dense and sparse matrices, performing elementary matrix operations, and loading and storing matrices from/to disk using a portable format. Because matrices are created on the PPServer itself, functions are also provided for transferring matrix sections to and from a client.

PPServer matrices are two-dimensional and single precision. Dense matrices can be distributed by row or by column. Sparse matrices are distributed by column. Replicated dense matrices are also provided, though very few operations use them.

The PPServer communicates with clients using a simple request-response protocol. A client requests that an action be performed by issuing a command with the appropriate arguments, the server executes that command, and then notifies the client that the action is complete.

The PPServer is directly extensible via compiled libraries called em packages. The PPServer implements a robust protocol for communicating with packages. Clients (and other packages) can load and remove packages on-the-fly, as well as execute commands within packages.

The PPServer provides a library of calls that enables package programmers access to direct information about the PPServer and its matrices. Programmers can thus write MPI code that operates directly on the PPServer matrices. Each package represents its own namespace, defining a set of functions and visible function names. This not only supports data encapulation, but also allows users to hide a subset of functions in one package by loading another that defines the same function names. Finally, packages support common parallel idioms (like applying a function to every element of a matrix), making it easier to add common functionality.

All but a few PPServer commands are implemented as packages, including basic matrix operations. Many highly-optimized public libraries have been realized as packages using appropriate wrapper functions. These packages include ScaLAPACK, S3L (Sun's optimized version of ScaLAPACK), PARPACK, and Petsc.

To demonstrate the flexibility of a client-server model of computation, we have developed a Matlab 5 front end to the Parallel Problems Server's computational engine. By using Matlab 5's object oriented programming features, most server operations are completely transparent. Combining the Matlab environment with the PPServer, we have been able to build applications for information retrieval, machine learning, and scientific computing.