Skip to navigation Skip to content
Careers | Phone Book | A - Z Index
Computer Architecture Group




In this project, we propose to build a post-Moore HPC (High-Performance Computing) system simulation framework to enable large scale simulations of post-Moore architectures built using emerging devices and technologies. With the HPC systems performance reaching exaflops and the transistor scaling reaching the saturation, the HPC systems to be built for post Moore era are evolving to extremely heterogeneous systems. For the Beyond Moore era, new computing, memory, interconnect and storage models… Read More »


Superconducting Race Logic Accelerators

The aim of this project is to make computation in superconducting circuits, circuits that operate around 4K temperatures and have close to zero resistance, as efficient as possible. Many approaches today try to re-use computing architectures (the design of logic gates and circuits) inspired by traditional technologies into superconducting logic. This is not efficient since superconducting circuits are different: accessing memory is very expensive but the circuit itself can operate at… Read More »


DFT Beyond Moore’s Law: Extreme Hardware Specialization for the Future of HPC

The project goal is to demonstrate the performance potential of purpose-built architectures as potential future for HPC applications in absence of Moore’s Law. Our approach is to reformulate the LS3DF algorithm to make it amenable to specialized hardware and to develop a custom accelerator for Density Functional Theory. The initial design/prototype will target an FPGA, and results will also be projected to an ASIC. Later, we intend to generalize our results to to broader implications for DOE… Read More »

IARPA logo

iARPA SuperTools

The Intelligence Community (IC) is well known to be a major consumer of high performance computing, but is increasingly finding itself frustrated by limitations in overall power consumption and clock speed. The amazing successes of semiconductor technology embodied in Moore’s Law give the impression that computing power might continue on its exponential growth curve indefinitely. However there are limits of miniaturization and switching speeds imposed by physics as applied to semiconductors,… Read More »

Traffic Graph


Overview Transportation systems are becoming increasingly complex with the evolution of emerging technologies, including deeper connectivity and automation, which will require more advanced control mechanisms for efficient operation (in terms of energy, mobility, and productivity).Stakeholders, including government agencies, industry, and local populations, all have an interest in efficient outcomes, yet there are few tools for developing a holistic understanding of urban… Read More »


Mota Mapper

Mota is a library that provides several heuristics for the purpose of AMR task placement.  It is multi-objective in the sense that it simultaneously balances the computational load on each rank as well as the communication traffic between the boxes.  We are investigating a variety of approaches to do the task placement and utilizing modeling and simulation tools to evaluate these approaches.  The heuristics used for mapping include algorithms such as greedy list assignment and space-filling… Read More »



A cache-coherent memory subsystem plays an important role in complex digital computing systems. It maintains memory consistency across on-chip caches that hide the memory latency to improve computational performance. Being managed by hardware, the cache subsystem facilitates multi-core system programming and allows developers to focus on other crucial aspects. However, due to extensive protocol-related traffic and lack of explicit data movement management, cache memory scalability becomes a big… Read More »

NoC Abstract

OpenSoC Fabric

Abstract Recent advancements in technology scaling have shown a trend towards greater integration with large-scale chips containing thousands of processors connected to memories and other I/O devices using non-trivial network topologies. Software simulation proves insufficient to study the tradeoffs in such complex systems due to slow execution time, whereas hardware RTL development is too time-consuming. We present OpenSoC Fabric, an on-chip network generation infrastructure which aims to… Read More »



The cost and complexity of existing interconnects prevent designing datacenter racks tailored to emerging applications such as machine learning. The PINE interconnect (Photonically Interconnected datacenter Elements) allows compute, memory or storage modules to be flexibly combined through one-model-fits-all embedded photonic connectivity and better utilize distant resources. In addition, PINE allows system-level bandwidth to be reconfigured to better match application demands via bandwidth… Read More »

Screen Shot 2016 05 24 at 10.19.14 PM

Continuing the Scaling of Digital Computing Post Moore’s Law

With the impending end of Moore’s law, it is imperative for the Office of Advanced Scientific Computing Research (ASCR) to develop a balanced research agenda to assess the viability of novel semiconductor technologies and navigate the ensuing challenges. Read More »



In order to model the behavior of AMR solvers that run in an asynchronous fashion, we have developed a tool that builds a skeleton task dependency graph for a variety of AMR algorithms.   The task dependency graph generated contains critical performance information, such as compute time estimates and required communication traffic volume.  The task graph exposes the true data dependencies of the constituent tasks and removes false dependencies that are often introduced as a byproduct of… Read More »


Project 38

Project 38 is a set of vendor-agnostic architectural explorations involving DOD, the DOE Office of Science, and NNSA (these latter two organizations are referred to below as “DOE”). These explorations are expected to accomplish the following: Near-term goal: Quantify the performance value and identify the potential costs of specific architectural concepts against a limited set of applications of interest to both the DOE and DOD. Long-term goal: Develop an enduring capability for DOE and DOD… Read More »



To pave the way towards future quantum accelerators adoption, we propose to define several abstraction levels throughout the entire control hardware stack that starts with comprehensive software-hardware interface - quantum instruction set architecture (QUASAR). By extending RV32/64, QUASAR supports single- and dual-qubit gates (serial or parallel application), controlled measurement, bit manipulation, arbitrary phase rotation, and advanced pulse shaping. The first extended processor (QUASAR… Read More »


VTE is a library that allows fast and simple generation of C++ testers for modules written in Chisel and build an efficient interface to existing C++ based simulators. It contains Scala-based interface and C++ testbench class. Scala interface interacts with Firtl Interpreter and generates a basic set of C++ testbench files. These files contain the list of the Device Under Test (DUT) input-output ports as well as their parameters. C++ testbench provides functionalities similar to the Chisel… Read More »