Careers | Phone Book | A - Z Index



DFT Beyond Moore’s Law: Extreme Hardware Specialization for the Future of HPC

The project goal is to demonstrate the performance potential of purpose-built architectures as potential future for HPC applications in absence of Moore’s Law. Our approach is to reformulate the LS3DF algorithm to make it amenable to specialized hardware and to develop a custom accelerator for Density Functional Theory. The initial design/prototype will target an FPGA, and results will also be projected to an ASIC. Later, we intend to generalize our results to to broader implications for DOE… Read More »

IARPA logo

iARPA SuperTools

The Intelligence Community (IC) is well known to be a major consumer of high performance computing, but is increasingly finding itself frustrated by limitations in overall power consumption and clock speed. The amazing successes of semiconductor technology embodied in Moore’s Law give the impression that computing power might continue on its exponential growth curve indefinitely. However there are limits of miniaturization and switching speeds imposed by physics as applied to semiconductors,… Read More »

Traffic Graph


Overview Transportation systems are becoming increasingly complex with the evolution of emerging technologies, including deeper connectivity and automation, which will require more advanced control mechanisms for efficient operation (in terms of energy, mobility, and productivity).Stakeholders, including government agencies, industry, and local populations, all have an interest in efficient outcomes, yet there are few tools for developing a holistic understanding of urban… Read More »


Mota Mapper

Mota is a library that provides several heuristics for the purpose of AMR task placement.  It is multi-objective in the sense that it simultaneously balances the computational load on each rank as well as the communication traffic between the boxes.  We are investigating a variety of approaches to do the task placement and utilizing modeling and simulation tools to evaluate these approaches.  The heuristics used for mapping include algorithms such as greedy list assignment and space-filling… Read More »



A cache-coherent memory subsystem plays an important role in complex digital computing systems. It maintains memory consistency across on-chip caches that hide the memory latency to improve computational performance. Being managed by hardware, the cache subsystem facilitates multi-core system programming and allows developers to focus on other crucial aspects. However, due to extensive protocol-related traffic and lack of explicit data movement management, cache memory scalability becomes a big… Read More »

NoC Abstract

OpenSoC Fabric

Abstract Recent advancements in technology scaling have shown a trend towards greater integration with large-scale chips containing thousands of processors connected to memories and other I/O devices using non-trivial network topologies. Software simulation proves insufficient to study the tradeoffs in such complex systems due to slow execution time, whereas hardware RTL development is too time-consuming. We present OpenSoC Fabric, an on-chip network generation infrastructure which aims to… Read More »



An increasing number of technologies are being proposed to preserve digital computing performance scaling as the benefits of lithographic scaling begin to wane. PARADISE is an open-source comprehensive methodology to evaluate emerging technologies with a vertical simulation flow from the individual device level all the way up to the architectural level. PARADISE can be extended to incorporate new technologies for which a compact model exists. In addition, PARADISE is modular with well-defined… Read More »



The cost and complexity of existing interconnects prevent designing datacenter racks tailored to emerging applications such as machine learning. The PINE interconnect (Photonically Interconnected datacenter Elements) allows compute, memory or storage modules to be flexibly combined through one-model-fits-all embedded photonic connectivity and better utilize distant resources. In addition, PINE allows system-level bandwidth to be reconfigured to better match application demands via bandwidth… Read More »

Screen Shot 2016 05 24 at 10.19.14 PM

Continuing the Scaling of Digital Computing Post Moore’s Law

The approaching end of traditional CMOS technology scaling that up until now followed Moore's law is coming to an end in the next decade. However, the DOE has come to depend on the rapid, predictable, and cheap scaling of computing performance to meet mission needs for scientific theory, large scale experiments, and national security. Moving forward, performance scaling of digital computing will need to originate from energy and cost reductions that are a result of novel architectures, devices,… Read More »



In order to model the behavior of AMR solvers that run in an asynchronous fashion, we have developed a tool that builds a skeleton task dependency graph for a variety of AMR algorithms.   The task dependency graph generated contains critical performance information, such as compute time estimates and required communication traffic volume.  The task graph exposes the true data dependencies of the constituent tasks and removes false dependencies that are often introduced as a byproduct of… Read More »


Project 38

Project 38 is a set of vendor-agnostic architectural explorations involving DOD, the DOE Office of Science, and NNSA (these latter 2 organizations are referred to below as “DOE”). These explorations are expected to accomplish the following: Near-term goal: Quantify the performance value and identify the potential costs of specific architectural concepts against a limited set of applications of interest to both the DOE and DOD. Long-term goal: Develop an enduring capability for DOE and DOD… Read More »



Sitting in between the QUASAR Ice core and the Quantum Controller Hardware, the Quantum Controller Firmware (or qFirm) provides the glue logic between QUASAR Ice core and the Quantum Controlerl Hardware. This layer converts the users’ code into digital signals that will be sent to the QPU. This layer also provides a supervisor mode, which provides advanced use cases not handled by the core, as well as system administration (i.e. calibration). Project Participants Anastasiia Butko Farzad… Read More »



To pave the way towards future quantum accelerators adoption, we propose to define several abstraction levels throughout the entire control hardware stack that starts with comprehensive software-hardware interface - quantum instruction set architecture (QUASAR). By extending RV32/64, QUASAR supports single- and dual-qubit gates (serial or parallel application), controlled measurement, bit manipulation, arbitrary phase rotation, and advanced pulse shaping. The first extended processor (QUASAR… Read More »


VTE is a library that allows fast and simple generation of C++ testers for modules written in Chisel and build an efficient interface to existing C++ based simulators. It contains Scala-based interface and C++ testbench class. Scala interface interacts with Firtl Interpreter and generates a basic set of C++ testbench files. These files contain the list of the Device Under Test (DUT) input-output ports as well as their parameters. C++ testbench provides functionalities similar to the Chisel… Read More »