Careers | Phone Book | A - Z Index

Eric Roman

EricRoman.jpg
Eric Roman
Computer Systems Engineer
Phone: +1 510 486 6420
Fax: +1 510 486 6900

Eric Roman is a computer systems engineer at Lawrence Berkeley National Laboratory. He joined LBNL in 1999.
From 2004-2006 he took leave to pursue a Ph.D. In physics at the University of California at Berkeley, where he per -
formed ab initio simulations of nonlinear optical properties of semiconductors, spin transport in metals, and the anoma-
lous Hall effect. He completed his doctoral dissertation entitled “Orientation Dependence of the Anomalous Hall Ef -
fect in 3d Ferromagnets” in 2010.

His research at LBNL focuses on operating systems for high-performance computing. He has participated in the devel-
opment of Berkeley Lab's Checkpoint/Restart (BLCR) since the start of the project in 2001. He wrote the initial re-
quirements and implementation surveys before moving on to implement multithreaded checkpoints and restarts. He
later implemented BLCR's support for files, pipes, on-the-fly compression of checkpoint files, and direct I/O. In 2008,
he worked with Cluster Resources Inc. to add BLCR support to the Torque batch system. In 2003 he led a seminar on
the Linux kernel attended by NERSC and Sandia-Livermore. In 2004 he organized the FastOS project “High-End
Computing with K42.” He currently is working to optimize file I/O operations in BLCR, and participates in LBNL's
collaboration with the Berkeley ParLab.

Conference Papers

Khaled Z. Ibrahim, S. Hofmeyr, Eric Roman, "Optimized Pre-Copy Live Migration for Memory Intensive Applications", The International Conference for High Performance Computing, Networking, Storage, and Analysis, 2011,

Paul Hargrove, Jason Duell, Eric Roman, "Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters", Proceedings of SciDAC 2006, June 27, 2006,

S. Sankaran, J. M. Squyres, B. Barrett, A. Lumsdaine, J. Duell, P. Hargrove, E. Roman, "The LAM/MPI Checkpoint/Restart Framework: System-Initiated Checkpointing", Los Alamos Computer Science Institute Symposium Proceedings (LACSI'03), Santa Fe, NM, October 2003,

Presentation/Talks

Paul Hargrove, Jason Duell, Eric Roman, Berkeley Lab Checkpoint/Restart (BLCR): Status and Future Plans, Dagstuhl Seminar: Fault Tolerance in High-Performance Computing and Grids, May 2009,

Paul Hargrove, Jason Duell, Eric Roman, System-level Checkpoint/Restart with BLCR, TeraGrid 2009 Fault Tolerance Workshop, March 19, 2009,

Paul Hargrove, Jason Duell, Eric Roman, System-level Checkpoint/Restart with BLCR, Los Alamos Computer Science Symposium (LACSS08), October 15, 2008,

Paul Hargrove, Jason Duell, Eric Roman, Advanced Checkpoint Fault Tolerance Solutions for HPC, Workshop on Trends, Technologies and Collaborative Opportunities in High Performance and Grid Computing, Bangkok and Phuket Thailand, June 9, 2008,

Paul Hargrove, Jason Duell and Eric Roman, An Overview of Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters, Presentation to ParLab group at UC Berkeley, March 18, 2008,

Paul Hargrove, Eric Roman, Jason Duell, Job Preemption with BLCR, Urgent Computing Workshop, April 25, 2007,

J. Duell, P. Hargrove, E. Roman, An Overview of Berkeley Lab's Linux Checkpoint/Restart, Presentation at LLNL, January 2004,

Reports

J. Duell, P. Hargrove, E. Roman, "The Design and Implementation of Berkeley Lab's Linux Checkpoint/Restart", LBNL Technical Report, December 2002, LBNL 54941,

J. Duell, P. Hargrove, E. Roman, "Requirements for Linux Checkpoint/Restart", LBNL Technical Report, May 2002, LBNL 49659,

Posters

Alex Druinsky, Brian Austin, Sherry Li, Osni Marques, Eric Roman, Samuel Williams, "A Roofline Performance Analysis of an Algebraic Multigrid Solver", Supercomputing (SC), November 2014,