# Publications

## Journal Article

### Daniel T. Graves, Phillip Colella, David Modiano, Jeffrey Johnson, Bjorn Sjogreen, Xinfeng Gao,"A Cartesian Grid Embedded Boundary Method for the Compressible Navier Stokes Equations",Communications in Applied Mathematics and Computational Science,December 23, 2013,

In this paper, we present an unsplit method for the time-dependent
compressible Navier-Stokes equations in two and three dimensions.
We use a a conservative, second-order Godunov algorithm.
We use a Cartesian grid, embedded boundary method to resolve complex
boundaries.  We solve for viscous and conductive terms with a
second-order semi-implicit algorithm.  We demonstrate second-order
accuracy in solutions of smooth problems in smooth geometries and
demonstrate robust behavior for strongly discontinuous initial
conditions in complex geometries.

### Katherine Yelick, Paul Hilfinger, Susan Graham, Dan Bonachea, Jimmy Su, Amir Kamil, Kaushik Datta, Phillip Colella, and Tong Wen,"Parallel Languages and Compilers: Perspective from the Titanium Experience",The International Journal Of High Performance Computing Applications,June 16, 2006,21,

We describe the rationale behind the design of key features of Titanium—an explicitly parallel dialect of JavaTM for high-performance scientific programming—and our experiences in building applications with the language. Specifically, we address Titanium’s Partitioned Global Address Space model, SPMD parallelism support, multi-dimensional arrays and array-index calculus, memory management, immutable classes (class-like types that are value types rather than reference types), operator overloading, and generic programming. We provide an overview of the Titanium compiler implementation, covering various parallel analyses and optimizations, Titanium runtime technology and the GASNet network communication layer. We summarize results and lessons learned from implementing the NAS parallel benchmarks, elliptic and hyperbolic solvers using Adaptive Mesh Refinement, and several applications of the Immersed Boundary method.

## Conference Paper

### Katherine Yelick, Dan Bonachea, Wei-Yu Chen, Phillip Colella, Kaushik Datta, Jason Duell, Susan L. Graham, Paul Hargrove, Paul Hilfinger, Parry Husbands, Costin Iancu, Amir Kamil, Rajesh Nishtala, Jimmy Su, Michael Welcome, Tong Wen,"Productivity and Performance Using Partitioned Global Address Space Languages",Parallel Symbolic Computation (PASCO'07),July 2007,

Partitioned Global Address Space (PGAS) languages combine the programming convenience of shared memory with the locality and performance control of message passing. One such language, Unified Parallel C (UPC) is an extension of ISO C defined by a consortium that boasts multiple proprietary and open source compilers. Another PGAS language, Titanium, is a dialect of Java T M designed for high performance scientific computation. In this paper we describe some of the highlights of two related projects, the Titanium project centered at U.C. Berkeley and the UPC project centered at Lawrence Berkeley National Laboratory. Both compilers use a source-to-source strategy that translates the parallel languages to C with calls to a communication layer called GASNet. The result is portable highperformance compilers that run on a large variety of shared and distributed memory multiprocessors. Both projects combine compiler, runtime, and application efforts to demonstrate some of the performance and productivity advantages to these languages.

## Report

### Adrian Tate, Amir Kamil, Anshu Dubey, Armin Größlinger, Brad Chamberlain, Brice Goglin, Carter Edwards, Chris J. Newburn, David Padua, Didem Unat, Emmanuel Jeannot, Frank Hannig, Gysi Tobias, Hatem Ltaief, James Sexton, Jesus Labarta, John Shalf, Karl Fuerlinger, Kathryn O’Brien, Leonidas Linardakis, Maciej Besta, Marie-Christine Sawley, Mark Abraham, Mauro Bianco, Miquel Pericàs, Naoya Maruyama, Paul Kelly, Peter Messmer, Robert B. Ross, Romain Cledat, Satoshi Matsuoka, Thomas Schulthess, Torsten Hoefler, Vitus Leung,"Programming Abstractions for Data Locality",2014 Workshop on Programming Abstractions for Data Locality,April 29, 2014,

The goal of the workshop and this report is to identify common themes and standardize concepts for locality-preserving abstractions for exascale programming models. Current software tools are built on the premise that computing is the most expensive component, we are rapidly moving to an era that computing is cheap and massively parallel while data movement dominates energy and performance costs. In order to respond to exascale systems (the next generation of high performance computing systems), the scientific computing community needs to refactor their applications to align with the emerging data-centric paradigm. Our applications must be evolved to express information about data locality. Unfortunately current programming environments offer few ways to do so. They ignore the incurred cost of communication and simply rely on the hardware cache coherency to virtualize data movement. With the increasing importance of task-level parallelism on future systems, task models have to support constructs that express data locality and affinity. At the system level, communication libraries implicitly assume all the processing elements are equidistant to each other. In order to take advantage of emerging technologies, application developers need a set of programming abstractions to describe data locality for the new computing ecosystem. The new programming paradigm should be more data centric and allow to describe how to decompose and how to layout data in the memory.
Fortunately, there are many emerging concepts such as constructs for tiling, data layout, array views, task and thread affinity, and topology aware communication libraries for managing data locality. There is an opportunity to identify commonalities in strategy to enable us to combine the best of these concepts to develop a comprehen- sive approach to expressing and managing data locality on exascale programming systems. These programming model abstractions can expose crucial information about data locality to the compiler and runtime system to en- able performance-portable code. The research question is to identify the right level of abstraction, which includes techniques that range from template libraries all the way to completely new languages to achieve this goal.

The goal of the workshop and this report is to identify common themes and standardize concepts for locality-preserving abstractions for exascale programming models. Current software tools are built on the premise that computing is the most expensive component, we are rapidly moving to an era that computing is cheap and massively parallel while data movement dominates energy and performance costs. In order to respond to exascale systems (the next generation of high performance computing systems), the scientific computing community needs to refactor their applications to align with the emerging data-centric paradigm. Our applications must be evolved to express information about data locality. Unfortunately current programming environments offer few ways to do so. They ignore the incurred cost of communication and simply rely on the hardware cache coherency to virtualize data movement. With the increasing importance of task-level parallelism on future systems, task models have to support constructs that express data locality and affinity. At the system level, communication libraries implicitly assume all the processing elements are equidistant to each other. In order to take advantage of emerging technologies, application developers need a set of programming abstractions to describe data locality for the new computing ecosystem. The new programming paradigm should be more data centric and allow to describe how to decompose and how to layout data in the memory.

Fortunately, there are many emerging concepts such as constructs for tiling, data layout, array views, task and thread affinity, and topology aware communication libraries for managing data locality. There is an opportunity to identify commonalities in strategy to enable us to combine the best of these concepts to develop a comprehensive approach to expressing and managing data locality on exascale programming systems. These programming model abstractions can expose crucial information about data locality to the compiler and runtime system to enable performance-portable code. The research question is to identify the right level of abstraction, which includes techniques that range from template libraries all the way to completely new languages to achieve this goal.

### Brian Van Straalen, David Trebotich, Terry Ligocki, Daniel T. Graves, Phillip Colella, Michael Barad,"An Adaptive Cartesian Grid Embedded Boundary Method for the Incompressible Navier Stokes Equations in Complex Geometry",LBNL Report Number: LBNL-1003767,2012,LBNL LBNL Report Numb,

We present a second-order accurate projection method to solve the
incompressible Navier-Stokes equations on irregular domains in two
and three dimensions.  We use a finite-volume discretization
obtained from intersecting the irregular domain boundary with a
Cartesian grid.  We address the small-cell stability problem
associated with such methods by hybridizing a conservative
discretization of the advective terms with a stable, nonconservative
discretization at irregular control volumes, and redistributing the
difference to nearby cells.  Our projection is based upon a
finite-volume discretization of Poisson's equation.  We use a
second-order, $L^\infty$-stable algorithm to advance in time.  Block
structured local refinement is applied in space.  The resulting
method is second-order accurate in $L^1$ for smooth problems.  We
demonstrate the method on benchmark problems for flow past a
cylinder in 2D and a sphere in 3D as well as flows in 3D geometries
obtained from image data.