Koushik Sen

Faculty Scientist, UC Berkeley

ksen@cs.berkeley.edu

Phone: +1 510 495 2517 | +1 510 642 2420

Web: http://srl.cs.berkeley.edu/~ksen/doku.php

Conference Papers

Marc G. Davis, Ethan Smith, Ana Tudor, Koushik Sen, Irfan Siddiqi, Costin Iancu, "Towards Optimal Topology Aware Quantum Circuit Synthesis", 2020 IEEE International Conference on Quantum Computing and Engineering (QCE), Denver, CO, USA, IEEE, October 12, 2020, doi: 10.1109/QCE49297.2020.00036

We present an algorithm for compiling arbitrary unitaries into a sequence of gates native to a quantum processor. As CNOT gates are error-prone for the foreseeable Noisy-Intermediate-Scale Quantum devices era, our A* inspired algorithm minimizes their count while accounting for connectivity. We discuss the formulation of synthesis as a search problem as well as an algorithm to find solutions. For a workload of circuits with complexity appropriate for the NISQ era, we produce solutions well within the best upper bounds published in literature and match or exceed hand tuned implementations, as well as other existing synthesis alternatives. In particular, when comparing against state-of-the-art available synthesis packages we show 2.4× average (up to 5.3×) reduction in CNOT count. We also show how to re-target the algorithm for a different chip topology and native gate set while obtaining similar quality results. We believe that tools like ours can facilitate algorithmic exploration and guide gate set discovery for quantum processor designers, as well as being useful for optimization in the quantum compilation tool-chain.

Marc Grau Davis, Ethan Smith, Ana Tudor, Koushik Sen, Irfan Siddiqi, Costin Iancu, "Heuristics for Quantum Compiling with a Continuous Gate Set", 3rd International Workshop on Quantum Compilation as part of the International Conference On Computer Aided Design 2019, December 5, 2019,

We present an algorithm for compiling arbitrary unitaries into a sequence of gates native to a quantum processor. As accurate CNOT gates are hard for the foreseeable Noisy- Intermediate-Scale Quantum devices era, our A* inspired algorithm attempts to minimize their count, while accounting for connectivity. We discuss the search strategy together with metrics to expand the solution frontier. For a workload of circuits with complexity appropriate for the NISQ era, we produce solutions well within the best upper bounds published in literature and match or exceed hand tuned implementations, as well as other existing synthesis alternatives. In particular, when comparing against state-of-the-art available synthesis packages we show 2.4x average (up to 5.3x) reduction in CNOT count. We also show how to re-target the algorithm for a different chip topology and native gate set, while obtaining similar quality results. We believe that empirical tools like ours can facilitate algorithmic exploration, gate set discovery for quantum processor designers, as well as providing useful optimization blocks within the quantum compilation tool-chain.

Cindy Rubio-Gonzalez, Cuong Nguyen, Hong Diep Nguyen, James Demmel, William Kahan, Koushik Sen, David H. Bailey, Costin Iancu, David Hough, "Precimonious: Tuning Assistant for Floating-Point Precision", Supercomputing 2013, November 17, 2013, doi: 10.1145/2503210.2503296

Download File: sc13.pdf (pdf: 555 KB)

Given the variety of numerical errors that can occur, floating-point programs are difficult to write, test and debug. One common practice employed by developers without an advanced background in numerical analysis is using the highest available precision. While more robust, this can degrade program performance significantly. In this paper we present Precimonious, a dynamic program analysis tool to assist developers in tuning the precision of floating-point programs. Precimonious performs a search on the types of the floating-point program variables trying to lower their precision subject to accuracy constraints and performance goals. Our tool recommends a type instantiation that uses lower precision while producing an accurate enough answer without causing exceptions. We evaluate Precimonious on several widely used functions from the GNU Scientific Library, two NAS Parallel Benchmarks, and three other numerical programs. For most of the programs analyzed, Precimonious reduces precision, which results in performance improvements as high as 41%.

Chang-Seo Park, Koushik Sen, Costin Iancu, "Scaling Data Race Detection for Partitioned Global Address Space Programs", International Supercomputing Conference (ICS) 2013, 2013, doi: 10.1145/2464996.2465000

Download File: thrille-exp4.pdf (pdf: 744 KB)

Contemporary and future programming languages for HPC promote hybrid parallelism and shared memory abstractions using a global address space. In this programming style, data races occur easily and are notoriously hard to find. Existing state-of-the-art data race detectors exhibit 10X-100X performance degradation and do not handle hybrid parallelism. In this paper we present the first complete implementation of data race detection at scale for UPC programs. Our implementation tracks local and global memory references in the program and it uses two techniques to reduce the overhead: 1) hierarchical function and instruction level sampling; and 2) exploiting the runtime persistence of aliasing and locality specific to Partitioned Global Address Space applications. The results indicate that both techniques are required in practice: well optimized instruction sampling introduces overheads as high as 6500% (65X slowdown), while each technique in separation is able to reduce it only to 1000% (10X slowdown). When applying the optimizations in conjunction our tool finds all previously known data races in our benchmark programs with at most 50% overhead when running on 2048 cores. Furthermore, while previous results illustrate the benefits of function level sampling, our experiences show that this technique does not work for scientific programs: instruction sampling or a hybrid approach is required.

Reports

Xuehai Qian, Koushik Sen, Paul Hargrove, Costin Iancu, "OPR: Partial Deterministic Record and Replay for One-Sided Communication", LBNL TR, April 17, 2015,

Download File: main3.pdf (pdf: 295 KB)

Others

Ed Younis, Koushik Sen, Katherine Yelick, Costin Iancu, QFAST: Quantum Synthesis Using a Hierarchical Continuous Circuit Space, Bulletin of the American Physical Society, March 2021,

We present QFAST, a quantum synthesis tool designed to produce short circuits and to scale well in practice. Our contributions are: 1) a novel representation of circuits able to encode placement and topology; 2) a hierarchical approach with an iterative refinement formulation that combines "coarse-grained" fast optimization during circuit structure search with a good, but slower, optimization stage only in the final circuit instantiation. When compared against state-of-the-art techniques, although not always optimal, QFAST can reduce circuits for "time-dependent evolution" algorithms, as used by domain scientists, by 60x in depth. On typical circuits, it provides 4x better depth reduction than the widely used Qiskit and UniversalQ compilers. We also show the composability and tunability of our formulation in terms of circuit depth and running time. For example, we show how to generate shorter circuits by plugging in the best available third party synthesis algorithm at a given hierarchy level. Composability enables portability across chip architectures, which is missing from similar approaches.
QFAST is integrated with Qiskit and available at github.com/bqskit.