Careers | Phone Book | A - Z Index

CORVETTE: Correctness Verification and Testing of Parallel Programs

Research Topics

The goal of this project is to provide tools to assess the correctness of parallel programs written using hybrid parallelism. There is a dire lack of both theoretical and engineering know-how in the area of finding bugs in hybrid or large scale parallel programs, which our research aims to change. As intranode programming is likely to be the most disrupted by the transition to Exascale, we will emphasize support for a large spectrum of programming and execution models such as dynamic tasking, directive based programming, and data parallelism. For inter-node programming we plan to handle both one-sided (PGAS) and two-sided (MPI) communication paradigms.

We aim to provide tools that identify sources of non-determinism in program executions and make concurrency bugs (data races, atomicity violations, deadlocks) and floating-point behavior reproducible. In order to increase the adoption of automatic program bug finding and exploration tools, novel techniques to increase precision and scalability are required. Precision implies that false alarms/positives are filtered and only the real problems are reported to users. During our research we will explore state-of- the-art methods that use dynamic program analysis. Since dynamic analysis monitors the program execution the resulting tools are precise, at the expense of scalability. Current approaches exhibit 10X- 100X runtime overhead: it is our goal to provide precise tools with no more than 2X runtime overhead at large scale. We will also research techniques to maximize the tool efficacy on a time budget, e.g. no more than 10% overhead. These principles have been incorporated into the UPC-Thrille data race detection tool for the Unified Parallel C programming language.

We will also research novel approaches to assist with program debugging using the minimal amount of concurrency needed to reproduce a bug and to support two-level debugging of high-level programming abstractions (DSLs). Furthermore, we plan to apply the combination of techniques developed for bug finding to provide an environment for exploratory programming. We will develop tools that allow developers to specify their intentions (not implementation) for code transformations and that are able to provide feedback about correctness. Besides code transformations, we plan to allow for automatic algorithmic tuning, i.e. transparently choosing at runtime the best implementation with respect to a metric from a collection of algorithms. As an initial case study we have developed PRECIMONIOUS, a tool  that determines program phases where double floating-point precision can be replaced by single precision.

Progress 2015 

This document summarizes the project progress and status in 2015.

Software

UPC-Thrille

Precimonious

ReproBLAS

Researchers

2015

Cindy Rubio-Gonz ́alez, Cuong Nguyen, James Demmel, William Kahan, Koushik Sen, Wim Lavrijsen, Costin Iancu, "Floating Point Precision Tuning Using Blame Analysis", LBNL TR, April 17, 2015,

Xuehai Qian, Koushik Sen, Paul Hargrove, Costin Iancu, "OPR: Partial Deterministic Record and Replay for One-Sided Communication", LBNL TR, April 17, 2015,

James Demmel, Costin Iancu, Kouhsik Sen, "Corvette Progress Report 2015", April 1, 2015,

Milind Chabbi, Wim Lavrijsen, Wibe de Jong, Koushik Sen, John Mellor Crummey, Costin Iancu, "Barrier Elision for Production Parallel Programs", PPOPP 2015, February 5, 2015,

2014

James Demmel, Hong-Diep Nguyen, "Parallel Reproducible Summation", To appear in IEEE Transactions on Computers, Special Section on Computer Arithmetic 2014, April 30, 2014,

2013

Cindy Rubio-Gonzalez, Cuong Nguyen, Hong Diep Nguyen, James Demmel, William Kahan, Koushik Sen, David H. Bailey, Costin Iancu, David Hough, "Precimonious: Tuning Assistant for Floating-Point Precision", Supercomputing 2013, July 19, 2013,

Chang-Seo Park, Koushik Sen, Costin Iancu, "Scaling Data Race Detection for Partitioned Global Address Space Programs", International Supercomputing Conference (ICS) 2013, 2013,

James Demmel, Hong-Diep Nguyen, "Fast Reproducible Floating-Point Summation", Proceedings of the 21st IEEE Symposium on Computer Arithmetic (ARITH'13), April 10, 2013,

James Demmel, Hong-Diep Nguyen, "Numerical Accuracy and Reproducibility at ExaScale", Proceedings of the 21st IEEE Symposium on Computer Arithmetic (ARITH'13), April 10, 2013,

Chang-Seo Park, Koushik Sen, Costin Iancu, "Scaling Data Race Detection for Partitioned Global Address Space Programs", Principles and Practice of Parallel Programming (PPoPP 2013), March 4, 2013,

2012

"Covette Handout - XStack Kick-Off 2012", 2012,

"Corvette Project Description - XStack Kick-Off", 2012,