# All Publications

## Journal Article

### Francois P. Hamon, Martin Schreiber, Michael L. Minion,"Parallel-in-Time Multi-Level Integration of the Shallow-Water Equations on the Rotating Sphere",April 12, 2019,

Submitted to Journal of Computational Physics

### Beytullah Yildiz, Kesheng Wu, Suren Byna, Arie Shoshanii,,"Parallel membership queries on very large scientific data sets using bitmap indexes",Concurrency and Computation: Practice and Experience,January 28, 2019,31, doi: https://doi.org/10.1002/cpe.5157

Many scientific applications produce very large amounts of data as advances in hardware fuel computing and experimental facilities. Managing and analyzing massive quantities of scientific data is challenging as data are often stored in specific formatted files, such as HDF5 and NetCDF, which do not offer appropriate search capabilities. In this research, we investigated a special class of search capability, called membership query, to identify whether queried elements of a set are members of an attribute. Attributes that naturally have classification values appear frequently in scientific domains such as category and object type as well as in daily life such as zip code and occupation. Because classification attribute values are discrete and require random data access, performing a membership query on a large scientific data set creates challenges. We applied bitmap indexing and parallelization to membership queries to overcome these challenges. Bitmap indexing provides high performance not only for low cardinality attributes but also for high cardinality attributes, such as floating‐point variables, electric charge, or momentum in a particle physics data set, due to compression algorithms such as Word‐Aligned Hybrid. We conducted experiments, in a highly parallelized environment, on data obtained from a particle accelerator model and a synthetic data set.

In submission

### Daniel F. Martin, Stephen L. Cornford, Antony J. Payne,"Millennial‐scale Vulnerability of the Antarctic Ice Sheet to Regional Ice Shelf Collapse",Geophysical Research Letters,January 9, 2019,doi: 10.1029/2018gl081229

Abstract:

The Antarctic Ice Sheet (AIS) remains the largest uncertainty in projections of future sea level rise. A likely climate‐driven vulnerability of the AIS is thinning of floating ice shelves resulting from surface‐melt‐driven hydrofracture or incursion of relatively warm water into subshelf ocean cavities. The resulting melting, weakening, and potential ice‐shelf collapse reduces shelf buttressing effects. Upstream ice flow accelerates, causing thinning, grounding‐line retreat, and potential ice sheet collapse. While high‐resolution projections have been performed for localized Antarctic regions, full‐continent simulations have typically been limited to low‐resolution models. Here we quantify the vulnerability of the entire present‐day AIS to regional ice‐shelf collapse on millennial timescales treating relevant ice flow dynamics at the necessary ∼1km resolution. Collapse of any of the ice shelves dynamically connected to the West Antarctic Ice Sheet (WAIS) is sufficient to trigger ice sheet collapse in marine‐grounded portions of the WAIS. Vulnerability elsewhere appears limited to localized responses.

Plain Language Summary:

The biggest uncertainty in near‐future sea level rise (SLR) comes from the Antarctic Ice Sheet. Antarctic ice flows in relatively fast‐moving ice streams. At the ocean, ice flows into enormous floating ice shelves which push back on their feeder ice streams, buttressing them and slowing their flow. Melting and loss of ice shelves due to climate changes can result in faster‐flowing, thinning and retreating ice leading to accelerated rates of global sea level rise.To learn where Antarctica is vulnerable to ice‐shelf loss, we divided it into 14 sectors, applied extreme melting to each sector's floating ice shelves in turn, then ran our ice flow model 1000 years into the future for each case. We found three levels of vulnerability. The greatest vulnerability came from attacking any of the three ice shelves connected to West Antarctica, where much of the ice sits on bedrock lying below sea level. Those dramatic responses contributed around 2m of sea level rise. The second level came from four other sectors, each with a contribution between 0.5‐1m. The remaining sectors produced little to no contribution. We examined combinations of sectors, determining that sectors behave independently of each other for at least a century.

### E. Vecharynski, J. Brabec, M. Shao, N. Govind, C. Yang,"Efficient Block Preconditioned Eigensolvers for Linear Response Time-dependent Density Functional Theory",Computer Physics Communications,2017,221:42-52,doi: https://doi.org/10.1016/j.cpc.2017.07.017

We present two efficient iterative algorithms for solving the linear response eigenvalue problem arising fromthe time dependent density functional theory. Although the matrix to be diagonalized is nonsymmetric, it has a special structure that can be exploited to save both memory and floating point operations. In particular, the nonsymmetric eigenvalue problem can be transformed into a product eigenvalue problem that is self-adjoint with respect to a K-inner product. This product eigenvalue problem can be solved efficiently by a modified Davidson algorithm and a modified locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm that make use of the K-inner product. The solution of the product eigenvalue problem yields one component of the eigenvector associated with the original eigenvalue problem. However, the other component of the eigenvector can be easily recovered in a postprocessing procedure. Therefore, the algorithms we present here are more efficient than existing algorithms that try to approximate both components of the eigenvectors simultaneously.The efficiency of the new algorithms is demonstrated by numerical examples.

### Dai Wang, Junyu Gaob, Pan Li, Bin Wang, Cong Zhang, Samveg Saxena,"Modeling of plug-in electric vehicle travel patterns and charging load based on trip chain generation",Journal of Power Sources,May 13, 2017,359:468 - 479,doi: 10.1016/j.jpowsour.2017.05.036

Modeling PEV travel and charging behavior is the key to estimate the charging demand and further explore the potential of providing grid services. This paper presents a stochastic simulation methodology to generate itineraries and charging load profiles for a population of PEVs based on real-world vehicle driving data. In order to describe the sequence of daily travel activities, we use the trip chain model which contains the detailed information of each trip, namely start time, end time, trip distance, start location and end location. A trip chain generation method is developed based on the Naive Bayes model to generate a large number of trips which are temporally and spatially coupled. We apply the proposed methodology to investigate the multi-location charging loads in three different scenarios. Simulation results show that home charging can meet the energy demand of the majority of PEVs in an average condition. In addition, we calculate the lower bound of charging load peak on the premise of lowest charging cost. The results are instructive for the design and construction of charging facilities to avoid excessive infrastructure.

### Yubo Wan, Wenbo Shi, Bin Wang, Chi-Cheng Ch, Rajit Gadh,"Optimal operation of stationary and mobile batteries in distribution grids",Applied Energy,January 28, 2017,190:1289 - 130,doi: 10.1016/j.apenergy.2016.12.139

The trending integrations of Battery Energy Storage System (BESS, stationary battery) and Electric Vehicles (EV, mobile battery) to distribution grids call for advanced Demand Side Management (DSM) technique that addresses the scalability concerns of the system and stochastic availabilities of EVs. Towards this goal, a stochastic DSM is proposed to capture the uncertainties in EVs. Numerical approximation is then used to make the problem tractable. To accelerate the computational speed, the proposed DSM is tightly relaxed to a convex form using second-order cone programming. Furthermore, in light of the continuous increasing problem size, a distributed method with a guaranteed convergence is applied to shift the centralized computational burden to distributed controllers. To verify the proposed DSM, real-life EV data collected on UCLA campus is used to test the proposed DSM in an IEEE benchmark test system. Numerical results demonstrate the correctness and merits of the proposed approach.

### E. Vecharynski, A. Knyazev,"Preconditioned steepest descent-like methods for symmetric indefinite systems",Linear Algebra and its Applications, Vol. 511, pp. 274–295,2016,

We construct preconditioned steepest descent (PSD)-like methods for iterative solution of symmetric indefinite linear systems using symmetric and positive definite (SPD) preconditioners. Our construction is based on a locally optimal residual minimization over two-dimensional subspaces, mathematically equivalent in exact arithmetic to preconditioned MINRES (PMINRES) restarted after every two steps. A convergence bound is derived. If certain information on the spectrum of the preconditioned system is available, we present a simpler PSD-like algorithm that performs only one-dimensional residual minimization. Search direction randomization for accelerating this algorithm is discussed. Our primary goal is to bridge the theoretical gap between the optimal (PMINRES) and PSD-like methods for solving symmetric indefinite systems. We also demonstrate situations where the suggested PSD-like schemes can be preferable to the optimal PMINRES iteration.

### S.V. Venkatakrishnan, Jeffrey Donatelli, Dinesh Kumar, Abhinav Sarje, Sunil K. Sinha, Xiaoye S. Li, Alexander Hexemer,"A Multi-slice Simulation Algorithm for Grazing-Incidence Small-Angle X-ray Scattering",Journal of Applied Crystallography,December 2016,49-6, doi: 10.1107/S1600576716013273

Grazing-incidence small-angle X-ray scattering (GISAXS) is an important technique in the characterization of samples at the nanometre scale. A key aspect of GISAXS data analysis is the accurate simulation of samples to match the measurement. The distorted-wave Born approximation (DWBA) is a widely used model for the simulation of GISAXS patterns. For certain classes of sample such as nanostructures embedded in thin films, where the electric field intensity variation is significant relative to the size of the structures, a multi-slice DWBA theory is more accurate than the conventional DWBA method. However, simulating complex structures in the multi-slice setting is challenging and the algorithms typically used are designed on a case-by-case basis depending on the structure to be simulated. In this paper, an accurate algorithm for GISAXS simulations based on the multi-slice DWBA theory is presented. In particular, fundamental properties of the Fourier transform have been utilized to develop an algorithm that accurately computes the average refractive index profile as a function of depth and the Fourier transform of the portion of the sample within a given slice, which are key quantities required for the multi-slice DWBA simulation. The results from this method are compared with the traditionally used approximations, demonstrating that the proposed algorithm can produce more accurate results. Furthermore, this algorithm is general with respect to the sample structure, and does not require any sample-specific approximations to perform the simulations.

### Bin Wang, Yubo Wang, Hamidreza Nazaripouya, Charlie Qiu, Chi-Cheng Chu, Rajit Gadh,"Predictive Scheduling Framework for Electric Vehicles with Uncertainties of User Behaviors",IEEE Internet of Things Journal,October 13, 2016,4:52 - 63,doi: 10.1109/JIOT.2016.2617314

The randomness of user behaviors plays a significant role in electric vehicle (EV) scheduling problems, especially when the power supply for EV supply equipment (EVSE) is limited. Existing EV scheduling methods do not consider this limitation and assume charging session parameters, such as stay duration and energy demand values, are perfectly known, which is not realistic in practice. In this paper, based on real-world implementations of networked EVSEs on University of California at Los Angeles campus, we developed a predictive scheduling framework, including a predictive control paradigm and a kernel-based session parameter estimator. Specifically, the scheduling service periodically computes for cost-efficient solutions, considering the predicted session parameters, by the adaptive kernel-based estimator with improved estimation accuracies. We also consider the power sharing strategy of existing EVSEs and formulate the virtual load constraint to handle the future EV arrivals with unexpected energy demand. To validate the proposed framework, 20-fold cross validation is performed on the historical dataset of charging behaviors for over one-year period. The simulation results demonstrate that average unit energy cost per kWh can be reduced by 29.42% with the proposed scheduling framework and 66.71% by further integrating solar generations with the given capacity, after the initial infrastructure investment. The effectiveness of kernel-based estimator, virtual load constraint, and event-based control scheme are also discussed in detail.

### R. Li, Y. Xi, E. Vecharynski, C. Yang, and Y. Saad,"A Thick-Restart Lanczos algorithm with polynomial filtering for Hermitian eigenvalue problems",SIAM Journal on Scientific Computing, Vol. 38, Issue 4, pp. A2512–A2534,2016,doi: 10.1137/15M1054493

Polynomial filtering can provide a highly effective means of computing all eigenvalues of a real symmetric (or complex Hermitian) matrix that are located in a given interval, anywhere in the spectrum. This paper describes a technique for tackling this problem by combining a Thick-Restart version of the Lanczos algorithm with deflation ('locking') and a new type of polynomial filters obtained from a least-squares technique. The resulting algorithm can be utilized in a 'spectrum-slicing' approach whereby a very large number of eigenvalues and associated eigenvectors of the matrix are computed by extracting eigenpairs located in different sub-intervals independently from one another.

### Rafael Garibotti, Anastasiia Butko, Luciano Ost, Abdoulaye Gamatié, Gilles Sassatelli, Chris Adeniyi-Jones,"Efficient Embedded Software Migration towards Clusterized Distributed-Memory Architectures",IEEE Transactions on Computers,August 1, 2016,doi: 10.1109/TC.2015.2485202

A large portion of existing multithreaded embedded software has been programmed according to symmetric shared memory platforms where a monolithic memory block is shared by all cores. Such platforms accommodate popular parallel programming models such as POSIX threads and OpenMP. However with the growing number of cores in modern manycore embedded architectures, they present a bottleneck related to their centralized memory accesses. This paper proposes a solution tailored for an efficient execution of applications defined with shared-memory programming models onto on-chip distributed-memory multicore architectures. It shows how performance, area and energy consumption are significantly improved thanks to the scalability of these architectures. This is illustrated in an open-source realistic design framework, including tools from ASIC to microkernel.

### Nils E. R. Zimmermann, Maciej Haranczyk,"History and Utility of Zeolite Framework-Type Discovery from a Data-Science Perspective",Crystal Growth & Design,May 2, 2016,16:3043-3048,

Mature applications such as fluid catalytic cracking and hydrocracking rely critically on early zeolite structures. With a data-driven approach, we find that the discovery of exceptional zeolite framework types around the new millennium was spurred by exciting new utilization routes. The promising processes have yet not been successfully implemented (“valley of death” effect), mainly because of the lack of thermal stability of the crystals. This foreshadows limited deployability of recent zeolite discoveries that were achieved by novel crystal synthesis routes.

### Nils E. R. Zimmermann, Maciej Haranczyk,"History and Utility of Zeolite Framework-Type Discovery from a Data-Science Perspective",Crystal Growth & Design,May 2, 2016,

Mature applications such as fluid catalytic cracking and hydrocracking rely critically on early zeolite structures. With a data-driven approach, we find that the discovery of exceptional zeolite framework types around the new millennium was spurred by exciting new utilization routes. The promising processes have yet not been successfully implemented (“valley of death” effect), mainly because of the lack of thermal stability of the crystals. This foreshadows limited deployability of recent zeolite discoveries that were achieved by novel crystal synthesis routes.

Watch a movie illustrating our seeded simulation strategy here.

### J. R. Jones, F.-H. Rouet, K. V. Lawler, E. Vecharynski, K. Z. Ibrahim, S. Williams, B. Abeln, C. Yang, C. W. McCurdy, D. J. Haxton, X. S. Li, T. N. Rescigno,"An efficient basis set representation for calculating electrons in molecules",Journal of Molecular Physics,2016,doi: 10.1080/00268976.2016.1176262

The method of McCurdy, Baertschy, and Rescigno, J. Phys. B, 37, R137 (2004) is generalized to obtain a straightforward, surprisingly accurate, and scalable numerical representation for calculating the electronic wave functions of molecules. It uses a basis set of product sinc functions arrayed on a Cartesian grid, and yields 1 kcal/mol precision for valence transition energies with a grid resolution of approximately 0.1 bohr. The Coulomb matrix elements are replaced with matrix elements obtained from the kinetic energy operator. A resolution-of-the-identity approximation renders the primitive one- and two-electron matrix elements diagonal; in other words, the Coulomb operator is local with respect to the grid indices. The calculation of contracted two-electron matrix elements among orbitals requires only O(N log(N)) multiplication operations, not O(N^4), where N is the number of basis functions; N = n^3 on cubic grids. The representation not only is numerically expedient, but also produces energies and properties superior to those calculated variationally. Absolute energies, absorption cross sections, transition energies, and ionization potentials are reported for one- (He^+, H_2^+ ), two- (H_2, He), ten- (CH_4) and 56-electron (C_8H_8) systems.

The method of McCurdy, Baertschy, and Rescigno, J. Phys. B, 37, R137 (2004) is generalized to obtain a straightforward, surprisingly accurate, and scalable numerical representation for calculating the electronic wave functions of molecules. It uses a basis set of product sinc functions arrayed on a Cartesian grid, and yields 1 kcal/mol precision for valence transition energies with a grid resolution of approximately 0.1 bohr. The Coulomb matrix elements are replaced with matrix elements obtained from the kinetic energy operator. A resolution-of-the-identity approximation renders the primitive one- and two-electron matrix elements diagonal; in other words, the Coulomb operator is local with respect to the grid indices. The calculation of contracted two-electron matrix elements among orbitals requires only O(N log(N)) multiplication operations, not O(N^4), where N is the number of basis functions; N = n^3 on cubic grids. The representation not only is numerically expedient, but also produces energies and properties superior to those calculated variationally. Absolute energies, absorption cross sections, transition energies, and ionization potentials are reported for one- (He^+, H_2^+ ), two- (H_2, He), ten- (CH_4) and 56-electron (C_8H_8) systems.The method of McCurdy, Baertschy, and Rescigno, J. Phys. B, 37, R137 (2004) is generalized to obtain a straightforward, surprisingly accurate, and scalable numerical representation for calculating the electronic wave functions of molecules. It uses a basis set of product sinc functions arrayed on a Cartesian grid, and yields 1 kcal/mol precision for valence transition energies with a grid resolution of approximately 0.1 bohr. The Coulomb matrix elements are replaced with matrix elements obtained from the kinetic energy operator. A resolution-of-the-identity approximation renders the primitive one- and two-electron matrix elements diagonal; in other words, the Coulomb operator is local with respect to the grid indices. The calculation of contracted two-electron matrix elements among orbitals requires only O(N log(N)) multiplication operations, not O(N^4), where N is the number of basis functions; N = n^3 on cubic grids. The representation not only is numerically expedient, but also produces energies and properties superior to those calculated variationally. Absolute energies, absorption cross sections, transition energies, and ionization potentials are reported for one- (He^+, H_2^+ ), two- (H_2, He), ten- (CH_4) and 56-electron (C_8H_8) systems.

The method of McCurdy, Baertschy, and Rescigno, J. Phys. B, 37, R137 (2004) is generalized to obtain a straightforward, surprisingly accurate, and scalable numerical representation for calculating the electronic wave functions of molecules. It uses a basis set of product sinc functions arrayed on a Cartesian grid, and yields 1 kcal/mol precision for valence transition energies with a grid resolution of approximately 0.1 bohr. The Coulomb matrix elements are replaced with matrix elements obtained from the kinetic energy operator. A resolution-of-the-identity approximation renders the primitive one- and two-electron matrix elements diagonal; in other words, the Coulomb operator is local with respect to the grid indices. The calculation of contracted two-electron matrix elements among orbitals requires only O(N log(N)) multiplication operations, not O(N^4), where N is the number of basis functions; N = n^3 on cubic grids. The representation not only is numerically expedient, but also produces energies and properties superior to those calculated variationally. Absolute energies, absorption cross sections, transition energies, and ionization potentials are reported for one- (He^+, H_2^+ ), two- (H_2, He), ten- (CH_4) and 56-electron (C_8H_8) systems.The method of McCurdy, Baertschy, and Rescigno, J. Phys. B, 37, R137 (2004) is generalized to obtain a straightforward, surprisingly accurate, and scalable numerical representation for calculating the electronic wave functions of molecules. It uses a basis set of product sinc functions arrayed on a Cartesian grid, and yields 1 kcal/mol precision for valence transition energies with a grid resolution of approximately 0.1 bohr. The Coulomb matrix elements are replaced with matrix elements obtained from the kinetic energy operator. A resolution-of-the-identity approximation renders the primitive one- and two-electron matrix elements diagonal; in other words, the Coulomb operator is local with respect to the grid indices. The calculation of contracted two-electron matrix elements among orbitals requires only O(N log(N)) multiplication operations, not O(N^4), where N is the number of basis functions; N = n^3 on cubic grids. The representation not only is numerically expedient, but also produces energies and properties superior to those calculated variationally. Absolute energies, absorption cross sections, transition energies, and ionization potentials are reported for one- (He^+, H_2^+ ), two- (H_2, He), ten- (CH_4) and 56-electron (C_8H_8) systems.

### E. Vecharynski, C. Yang, and F. Xue,"Generalized preconditioned locally harmonic residual method for non-Hermitian eigenproblems",SIAM Journal on Scientific Computing, Vol. 38, No. 1, pp. A500–A527,2016,doi: 10.1137/15M1027413

We introduce the Generalized Preconditioned Locally Harmonic Residual (GPLHR) method for solving standard and generalized non-Hermitian eigenproblems. The method is particularly useful for computing a subset of eigenvalues, and their eigen- or Schur vectors, closest to a given shift. The proposed method is based on block iterations and can take advantage of a preconditioner if it is available. It does not need to perform exact shift-and-invert transformation. Standard and generalized eigenproblems are handled in a unified framework. Our numerical experiments demonstrate that GPLHR is generally more robust and efficient than existing methods, especially if the available memory is limited.

### E. Vecharynski,"A generalization of Saad's bound on harmonic Ritz vectors of Hermitian matrices",Linear Algebra and its Applications, Vol. 494, pp. 219-235,2016,doi: 10.1016/j.laa.2016.01.013

We prove a Saad's type bound for harmonic Ritz vectors of a Hermitian matrix. The new bound reveals a dependence of the harmonic Rayleigh-Ritz procedure on the condition number of a shifted problem operator. Several practical implications are discussed. In particular, the bound motivates incorporation of preconditioning into the harmonic Rayleigh-Ritz scheme.

### Yubo Wang, Bin Wang, Chi-Cheng Chu, Hemanshu Pota, Rajit Gadh,"Energy management for a commercial building microgrid with stationary and mobile battery storage",Energy and Buildings,December 30, 2015,116:141 - 150,doi: 10.1016/j.enbuild.2015.12.055

This paper investigates the Demand Side Management (DSM) in a commercial building microgrid with solar generation, stationary Battery Energy Management System (BESS) and gridable (V2G) Electric Vehicle (EV) integration. Taking into consideration of a comprehensive pricing model, we first formulate a deterministic DSM as a mixed integer linear programming problem, assuming perfect knowledge of the uncertainties in the system. A two-stage stochastic DSM is further developed that addresses the stochastic nature in solar generation, loads, EV availabilities and EV energy demands. The proposed DSMs are validated with real solar generation, loads, BESS and EV data using sample average approximation. Detailed case studies show that the stochastic DSM outperforms its deterministic counterpart for cost saving for a wide range of prices, though at the expense of higher computational time. Computational results also demonstrate that moderate number of EVs helps to cut down the overall operation cost, which sheds light on the benefit of future large scale EV integration to smart buildings.

### A. Roy, A. Klinefelter, F. B. Yahya, X. Chen, L. P. Gonzalez-Guerrero, C. J. Lukas, D. A. Kamakshi, J. Boley, K. Craig, M. Faisal, S. Oh, N. E. Roberts, Y. Shakhsheer, A. Shrivastava, D. P. Vasudevan, D. D. Wentzloff, B. H. Calhoun,"A 6.45 μW Self-Powered SoC With Integrated Energy-Harvesting Power Management and ULP Asymmetric Radios for Portable Biomedical Systems",IEEE Transactions on Biomedical Circuits and Systems,December 28, 2015,9:862 - 874,doi: 10.1109/TBCAS.2015.2498643

This paper presents a batteryless system-on-chip (SoC) that operates off energy harvested from indoor solar cells and/or thermoelectric generators (TEGs) on the body. Fabricated in a commercial 0.13 μW process, this SoC sensing platform consists of an integrated energy harvesting and power management unit (EH-PMU) with maximum power point tracking, multiple sensing modalities, programmable core and a low power microcontroller with several hardware accelerators to enable energy-efficient digital signal processing, ultra-low-power (ULP) asymmetric radios for wireless transmission, and a 100 nW wake-up radio. The EH-PMU achieves a peak end-to-end efficiency of 75% delivering power to a 100 μA load. In an example motion detection application, the SoC reads data from an accelerometer through SPI, processes it, and sends it over the radio. The SPI and digital processing consume only 2.27 μW, while the integrated radio consumes 4.18 μW when transmitting at 187.5 kbps for a total of 6.45 μW.

### D. B. Szyld, E. Vecharynski, and F. Xue,"Preconditioned eigensolvers for large-scale nonlinear Hermitian eigenproblems with variational characterizations. II. Interior eigenvalues.",SIAM Journal on Scientific Computing, Vol. 37, Issue 6, pp. A2969-A2997,2015,

We consider the solution of large-scale nonlinear algebraic Hermitian eigenproblems of the form $T(\lambda)v=0$ that admit a variational characterization of eigenvalues. These problems arise in a variety of applications and are generalizations of linear Hermitian eigenproblems $Av\!=\!\lambda Bv$. In this paper, we propose a Preconditioned Locally Minimal Residual (PLMR) method for efficiently computing interior eigenvalues of problems of this type. We discuss the development of search subspaces, preconditioning, and eigenpair extraction procedure based on the refined Rayleigh-Ritz projection. Extension to the block methods is presented, and a moving-window style soft deflation is described. Numerical experiments demonstrate that PLMR methods provide a rapid and robust convergence towards interior eigenvalues. The approach is also shown to be efficient and reliable for computing a large number of extreme eigenvalues, dramatically outperforming standard preconditioned conjugate gradient methods.

### Andrew A. Chien, Tung Thanh-Hoang, Dilip Vasudevan, Yuanwei Fang, Amirali Shambayati,"10x10: A Case Study in Highly-Programmable and Energy-Efficient Heterogeneous Federated Architecture",SIGARCH Comput. Archit. News,December 2015,43:2 - 9,doi: 10.1145/2856113.2856115

Customized architecture is widely recognized as an important approach for improved performance and energy efficiency. To balance generality and customization benefit, researchers have proposed to federate heterogeneous micro-engines. Using the 10x10 architecture and an integrated image and vision benchmark as a case study, we explore the performance and energy benefits achievable. Results for current 32nm technology and DDR3 memory show 10x10 architecture benefits of 140x performance and 72x energy overall. Adding 3D-stacked DRAM increase benefits to 171x (performance) and 100x (energy). Finally, considering future 7nm transistor process, benefits as large as 597x (performance) and 137x energy are observed.

### E. Vecharynski, A. Knyazev,"Preconditioned Locally Harmonic Residual Method for computing interior eigenpairs of certain classes of Hermitian matrices",SIAM Journal on Scientific Computing, Vol. 37, Issue 5, pp. S3–S29,2015,

We propose a Preconditioned Locally Harmonic Residual (PLHR) method for computing several interior eigenpairs of a generalized Hermitian eigenvalue problem, without traditional spectral transformations, matrix factorizations, or inversions. PLHR is based on a short-term recurrence, easily extended to a block form, computing eigenpairs simultaneously. PLHR can take advantage of Hermitian positive definite preconditioning, e.g., based on an approximate inverse of an absolute value of a shifted matrix, introduced in [SISC, 35 (2013), pp. A696–A718]. Our numerical experiments demonstrate that PLHR is efficient and robust for certain classes of large-scale interior eigenvalue problems, involving Laplacian and Hamiltonian operators, especially if memory requirements are tight.

### Tobias Titze, Alexander Lauerer, Lars Heinke, Christian Chmelik, Nils E. R. Zimmermann, Frerich J. Keil, Douglas M. Ruthven, Jörg Kärger,"Transport in Nanoporous Materials Including MOFs: The Applicability of Fick’s Laws",Angew. Chem. Int. Ed.,2015,doi: 10.1002/anie.201506954

Diffusion in nanoporous host–guest systems is often considered to be too complicated to comply with such “simple” relationships as Fick’s first and second law of diffusion. However, it is shown herein that the microscopic techniques of diffusion measurement, notably the pulsed field gradient (PFG) technique of NMR spectroscopy and microimaging by interference microscopy (IFM) and IR microscopy (IRM), provide direct experimental evidence of the applicability of Fick’s laws to such systems. This remains true in many situations, even when the detailed mechanism is complex. The limitations of the diffusion model are also discussed with reference to the extensive literature on this subject.

### Nils E. R. Zimmermann, Bart Vorselaars, David Quigley, Baron Peters,"Nucleation of NaCl from Aqueous Solution: Critical Sizes, Ion-Attachment Kinetics, and Rates",J. Am. Chem. Soc.,2015,doi: 10.1021/jacs.5b08098

Nucleation and crystal growth are important in material synthesis, climate modeling, biomineralization, and pharmaceutical formulation. Despite tremendous efforts, the mechanisms and kinetics of nucleation remain elusive to both theory and experiment. Here we investigate sodium chloride (NaCl) nucleation from supersaturated brines using seeded atomistic simulations, polymorph-specific order parameters, and elements of classical nucleation theory. We find that NaCl nucleates via the common rock salt structure. Ion desolvation—not diffusion—is identified as the limiting resistance to attachment. Two different analyses give approximately consistent attachment kinetics: diffusion along the nucleus size coordinate and reaction-diffusion analysis of approach-to-coexistence simulation data from Aragones et al. (J. Chem. Phys. 2012, 136, 244508). Our simulations were performed at realistic supersaturations to enable the first direct comparison to experimental nucleation rates for this system. The computed and measured rates converge to a common upper limit at extremely high supersaturation. However, our rate predictions are between 15 and 30 orders of magnitude too fast. We comment on possible origins of the large discrepancy.

Watch a movie illustrating our seeded simulation strategy here.

### Nathan Hanford, Vishal Ahuja, Mehmet Balman, Matthew. Farrens, Dipak Ghosal, Eric Pouyoul, Brian Tierney,"Improving Network Performance on Multicore Systems: Impact of Core Affinities on High Throughput Flows",The International Journal of eScience, Elsevier,2015,doi: doi:10.1016/j.future.2015.09.012

Network throughput is scaling-up to higher data rates while end-system processors are scaling-out to multiple cores. In order to optimize high speed data transfer into multicore end-systems, techniques such as network adaptor offloads and performance tuning have received a great deal of attention. Furthermore, several methods of multi-threading the network receive process have been proposed. However, thus far attention has been focused on how to set the tuning parameters and which offloads to select for higher performance, and little has been done to understand why the various parameter settings do (or do not) work. In this paper, we build on previous research to track down the sources of the end-system bottleneck for high-speed TCP flows. We define protocol processing efficiency to be the amount of system resources (such as CPU and cache) used per unit of achieved throughput (in Gbps). The amount of various system resources consumed are measured using low-level system event counters. In a multicore end-system, affinitization, or core binding, is the decision regarding how the various tasks of network receive process including interrupt, network, and application processing are assigned to the different processor cores. We conclude that affinitization has a significant impact on protocol processing efficiency, and that the performance bottleneck of the network receive process changes significantly with different affinitization.

### Štěpán Timr, Jiří Brabec, Alexey Bondar, Tomáš Ryba, Miloš Železný, Josef Lazar, Pavel Jungwirth,"Non-Linear Optical Properties of Fluorescent Dyes Allow for Accurate Determination of Their Molecular Orientations in Phospholipid Membranes",The Journal of Physical Chemistry,July 6, 2015,

Several methods based on single- and two-photon fluorescence detected linear dichroism have recently been used to determine the orientational distributions of fluorescent dyes in lipid membranes. However, these determinations relied on simplified descriptions of non-linear anisotropic properties of the dye molecules, using a transition dipole moment-like vector instead of an absorptivity tensor. To investigate the validity of the vector approximation, we have now carried out a combination of computer simulations and polarization microscopy experiments on two representative fluorescent dyes (DiI and F2N12S) embedded in aqueous phosphatidylcholine bilayers. Our results indicate that a simplified vector-like treatment of the two-photon transition tensor is applicable for molecular geometries sampled in the membrane at ambient conditions. Furthermore, our results allow evaluation of several distinct polarization microscopy techniques. In combination, our results point to a robust and accurate experimental and computational treatment of orientational distributions of DiI, F2N12S and related dyes (including Cy3, Cy5, and others), with implications to monitoring physiologically relevant processes in cellular membranes in a novel way.

### E. Vecharynski, C. Yang, J. E. Pask,"A projected preconditioned conjugate gradient algorithm for computing many extreme eigenpairs of a Hermitian matrix",Journal of Computational Physics, Vol. 290, pp. 73–89,2015,

We present an iterative algorithm for computing an invariant subspace associated with the algebraically smallest eigenvalues of a large sparse or structured Hermitian matrix A. We are interested in the case in which the dimension of the invariant subspace is large (e.g., over several hundreds or thousands) even though it may still be small relative to the dimension of A. These problems arise from, for example, density functional theory (DFT) based electronic structure calculations for complex materials. The key feature of our algorithm is that it performs fewer Rayleigh–Ritz calculations compared to existing algorithms such as the locally optimal block preconditioned conjugate gradient or the Davidson algorithm. It is a block algorithm, and hence can take advantage of efficient BLAS3 operations and be implemented with multiple levels of concurrency. We discuss a number of practical issues that must be addressed in order to implement the algorithm efficiently on a high performance computer.

### Wei Hu, Lin Lin and Chao Yang,"Edge reconstruction in armchair phosphorene nanoribbons revealed by discontinuous Galerkin density functional theory",Phys. Chem. Chem. Phys., 2015, Advance Article,February 11, 2015,doi: 10.1039/C5CP00333D

With the help of our recently developed massively parallel DGDFT (Discontinuous Galerkin Density Functional Theory) methodology, we perform large-scale Kohn–Sham density functional theory calculations on phosphorene nanoribbons with armchair edges (ACPNRs) containing a few thousands to ten thousand atoms. The use of DGDFT allows us to systematically achieve a conventional plane wave basis set type of accuracy, but with a much smaller number (about 15) of adaptive local basis (ALB) functions per atom for this system. The relatively small number of degrees of freedom required to represent the Kohn–Sham Hamiltonian, together with the use of the pole expansion the selected inversion (PEXSI) technique that circumvents the need to diagonalize the Hamiltonian, results in a highly efficient and scalable computational scheme for analyzing the electronic structures of ACPNRs as well as their dynamics. The total wall clock time for calculating the electronic structures of large-scale ACPNRs containing 1080–10 800 atoms is only 10–25 s per self-consistent field (SCF) iteration, with accuracy fully comparable to that obtained from conventional planewave DFT calculations. For the ACPNR system, we observe that the DGDFT methodology can scale to 5000–50 000 processors. We use DGDFT based ab initio molecular dynamics (AIMD) calculations to study the thermodynamic stability of ACPNRs. Our calculations reveal that a 2 × 1 edge reconstruction appears in ACPNRs at room temperature.

### Thorsten Kurth, Andrew Pochinsky, Abhinav Sarje, Sergey Syritsyn, Andre Walker-Loud,"High-Performance I/O: HDF5 for Lattice QCD",arXiv:1501.06992,January 2015,

Practitioners of lattice QCD/QFT have been some of the primary pioneer users of the state-of-the-art high-performance-computing systems, and contribute towards the stress tests of such new machines as soon as they become available. As with all aspects of high-performance-computing, I/O is becoming an increasingly specialized component of these systems. In order to take advantage of the latest available high-performance I/O infrastructure, to ensure reliability and backwards compatibility of data files, and to help unify the data structures used in lattice codes, we have incorporated parallel HDF5 I/O into the SciDAC supported USQCD software stack. Here we present the design and implementation of this I/O framework. Our HDF5 implementation outperforms optimized QIO at the 10-20% level and leaves room for further improvement by utilizing appropriate dataset chunking.

### D. Zuev, E. Vecharynski, C. Yang, N. Orms, and A.I. Krylov,"New algorithms for iterative matrix-free eigensolvers in quantum chemistry",Journal of Computational Chemistry, Vol. 36, Issue 5, pp. 273–284,2015,

New algorithms for iterative diagonalization procedures that solve for a small set of eigen-states of a large matrix are described. The performance of the algorithms is illustrated by calculations of low and high-lying ionized and electronically excited states using equation-of-motion coupled-cluster methods with single and double substitutions (EOM-IP-CCSD and EOM-EE-CCSD). We present two algorithms suitable for calculating excited states that are close to a specified energy shift (interior eigenvalues). One solver is based on the Davidson algorithm, a diagonalization procedure commonly used in quantum-chemical calculations. The second is a recently developed solver, called the “Generalized Preconditioned Locally Harmonic Residual (GPLHR) method.” We also present a modification of the Davidson procedure that allows one to solve for a specific transition. The details of the algorithms, their computational scaling, and memory requirements are described. The new algorithms are implemented within the EOM-CC suite of methods in the Q-Chem electronic structure program.

### Wei Hu, Lin Lin, Chao Yang and Jinlong Yang,"Electronic structure and aromaticity of large-scale hexagonal graphene nanoflakes",J. Chem. Phys. 141, 214704 (2014),December 2, 2014,141:214704,doi: 10.1063/1.4902806

With the help of the recently developed SIESTA-PEXSI method [L. Lin, A. García, G. Huhs, and C. Yang, J. Phys.: Condens. Matter26, 305503 (2014)], we perform Kohn-Sham density functional theory calculations to study the stability and electronic structure of hydrogen passivated hexagonal graphene nanoflakes (GNFs) with up to 11 700 atoms. We find the electronic properties of GNFs, including their cohesive energy, edge formation energy, highest occupied molecular orbital-lowest unoccupied molecular orbital energy gap, edge states, and aromaticity, depend sensitively on the type of edges (armchair graphene nanoflakes (ACGNFs) and zigzag graphene nanoflakes (ZZGNFs)), size and the number of electrons. We observe that, due to the edge-induced strain effect in ACGNFs, large-scale ACGNFs’ edge formation energydecreases as their size increases. This trend does not hold for ZZGNFs due to the presence of many edge states in ZZGNFs. We find that the energy gaps E g of GNFs all decay with respect to 1/L, where L is the size of the GNF, in a linear fashion. But as their size increases, ZZGNFs exhibit more localized edge states. We believe the presence of these states makes their gap decrease more rapidly. In particular, when L is larger than 6.40 nm, we find that ZZGNFs exhibit metallic characteristics. Furthermore, we find that the aromatic structures of GNFs appear to depend only on whether the system has 4N or 4N + 2 electrons, where N is an integer.

### Wenqi Xia, Wei Hu, Zhenyu Li and Jinlong Yang,"A first-principles study of gas adsorption on germanene",Phys. Chem. Chem. Phys., 2014,16, 22495-22498,August 29, 2014,doi: 10.1039/C4CP03292F

The adsorption of common gas molecules (N2, CO, CO2, H2O, NH3, NO, NO2, and O2) on germanene is studied with density functional theory. The results show that N2, CO, CO2, and H2O are physisorbed on germanene via van der Waals interactions, while NH3, NO, NO2, and O2 are chemisorbed on germanene via strong covalent (Ge–N or Ge–O) bonds. The chemisorption of gas molecules on germanene opens a band gap at the Dirac point of germanene. NO2 chemisorption on germanene shows strong hole doping in germanene. O2 is easily dissociated on germanene at room temperature. Different adsorption behaviors of common gas molecules on germanene provide a feasible way to exploit chemically modified germanene.

### David H. Bailey, Jonathan M. Borwein, Marcos Lopez de Prado, Qiji Jim Zhu,"Pseudo-mathematics and financial charlatanism: The effects of backtest over fitting on out-of-sample performance",Notices of the American Mathematical Society,May 1, 2014,458-471,

Recent computational advances allow investment managers to search for profitable investment strategies. In many instances, that search involves a pseudo-mathematical argument, which is spuriously validated through a simulation of its historical performance (also called backtest).

We prove that high performance is easily achievable after backtesting a relatively small number of alternative strategy configurations, a practice we denote “backtest overfitting”. The higher the number of configurations tried, the greater is the probability that the backtest is overfit. Because financial analysts rarely report the number of configurations tried for a given backtest, investors cannot evaluate the degree of overfitting in most investment proposals.

The implication is that investors can be easily misled into allocating capital to strategies that appear to be mathematically sound and empirically supported by an outstanding backtest. This practice is particularly pernicious, because due to the nature of financial time series, backtest overfitting has a detrimental effect on the future strategy’s performance.

### E. Vecharynski and Y. Saad,"Fast updating algorithms for latent semantic indexing",SIAM Journal on Matrix Analysis and Applications, Vol. 35, Issue 3, pp. 1105–1131,2014,

This paper discusses a few algorithms for updating the approximate singular value decomposition (SVD) in the context of information retrieval by latent semantic indexing (LSI) methods. A unifying framework is considered which is based on Rayleigh–Ritz projection methods. First, a Rayleigh–Ritz approach for the SVD is discussed and it is then used to interpret the Zha and Simon algorithms [SIAM J. Sci. Comput., 21 (1999), pp. 782–791]. This viewpoint leads to a few alternatives whose goal is to reduce computational cost and storage requirement by projection techniques that utilize subspaces of much smaller dimension. Numerical experiments show that the proposed algorithms yield accuracies comparable to those obtained from standard ones at a much lower computational cost.

### Richard L. Martin, Cory M. Simon, Berend Smit, Maciej Haranczyk,"In-silico design of porous polymer networks: high-throughput screening for methane storage materials",Journal of the American Chemical Society,March 10, 2014,

Porous polymer networks (PPNs) are a class of advanced porous materials that combine the advantages of cheap and stable polymers with the high surface areas and tunable chemistry of metal-organic frameworks. They are of particular interest for gas separation or storage applications, for instance as methane adsorbents for a vehicular natural gas tank or other portable applications.

### Richard L. Martin, Maciej Haranczyk,"Construction and Characterization of Structure Models of Crystalline Porous Polymers",Crystal Growth & Design,March 6, 2014,

Metal-organic frameworks (MOFs) and covalent organic frameworks (COFs) are examples of advanced porous polymeric materials that have emerged in recent years. Their crystalline structure and modular synthesis offer unmatched versatility in their design. By exchanging chemical building blocks, one can both explore the unlimited space of possible structural chemistry within an isoreticular (same crystal topology) series, as well as achieve a wide range of alternative topologies.

### Lev Sarkisov, Richard L. Martin, Maciej Haranczyk, Berend Smit,"On the Flexibility of Metal-Organic Frameworks",Journal of the American Chemical Society,January 24, 2014,

Occasional, large amplitude flexibility in metal-organic frameworks (MOFs) is one of the most intriguing recent discoveries in chemistry and material science. Yet, there is at present no theoretical framework that permits the identification of flexible structures in the rapidly expanding universe of MOFs. Here, we propose a simple method to predict whether a MOF is flexible, based on treating it as a system of rigid elements, connected by hinges. This proposition is correct in application to MOFs based on rigid carboxylate linkers.

### Wei Hu, Nan Xia, Xiaojun Wu, Zhenyu Li and Jinlong Yang,"Silicene as a highly sensitive molecule sensor for NH3, NO and NO2",Phys. Chem. Chem. Phys., 2014,16, 6957-6962,January 23, 2014,doi: 10.1039/C3CP55250K

On the basis of first-principles calculations, we demonstrate the potential application of silicene as a highly sensitive molecule sensor for NH3, NO, and NO2 molecules. NH3, NO and NO2 molecules chemically adsorb on silicene via strong chemical bonds. With distinct charge transfer from silicene to molecules, silicene and chemisorbed molecules form charge-transfer complexes. The adsorption energy and charge transfer in NO2-adsorbed silicene are larger than those of NH3- and NO-adsorbed silicones. Depending on the adsorbate types and concentrations, the silicene-based charge-transfer complexes exhibit versatile electronic properties with tunable band gap opening at the Dirac point of silicene. The calculated charge carrier concentrations of NO2-chemisorbed silicene are 3 orders of magnitude larger than intrinsic charge carrier concentration of graphene at room temperature. The results present a great potential of silicene for application as a highly sensitive molecule sensor.

### J.A. Sobota, S.-L. Yang, D. Leuenberger, A.F. Kemper, J.G. Analytis, I.R. Fisher, P.S. Kirchmann, T.P. Devereaux, Z.-X. Shen,"Ultrafast electron dynamics in the topological insulator Bi2Se3 studied by time-resolved photoemission spectroscopy",Journal of Electron Spectroscopy and Related Phenomena,January 22, 2014,

We characterize the topological insulator Bi2Se3 using time- and angle-resolved photoemission spectroscopy. By employing two-photon photoemission, a complete picture of the unoccupied electronic structure from the Fermi level up to the vacuum level is obtained. We demonstrate that the unoccupied states host a second Dirac surface state which can be resonantly excited by 1.5 eV photons. We then study the ultrafast relaxation processes following optical excitation. We find that they culminate in a persistent non-equilibrium population of the first Dirac surface state, which is maintained by a meta-stable population of the bulk conduction band. Finally, we perform a temperature-dependent study of the electron–phonon scattering processes in the conduction band, and find the unexpected result that their rates decrease with increasing sample temperature. We develop a model of phonon emission and absorption from a population of electrons, and show that this counter-intuitive trend is the natural consequence of fundamental electron–phonon scattering processes. This analysis serves as an important reminder that the decay rates extracted by time-resolved photoemission are not in general equal to single electron scattering rates, but include contributions from filling and emptying processes from a continuum of states.

### M.A. Sentef, M. Claassen, A.F. Kemper, B. Moritz, T. Oka, J.K. Freericks, T.P. Devereaux,"Theory of pump-probe photoemission in graphene and the generation of light-induced Haldane multilayers",arXiv pre-print,January 20, 2014,

The combination of time-reversal and inversion symmetry protects massless Dirac fermions in graphene and on the surface of topological insulators. In a milestone paper, Haldane envisioned that breaking either or both of these symmetries would open a gap at the Dirac points, allowing one to tune between a trivial insulator and a Chern insulator. While equilibrium band gap engineering has become a major theme since the first synthesis of monolayer graphene, it was only recently proposed that circularly polarized laser light could turn trivial equilibrium bands into topological nonequilibrium bands. Here we observe ultrafast band gap openings and paradoxical gap closings at a critical field strength. Importantly, the gap openings are accompanied by nontrivial changes of the band topology, realizing a photo-induced Haldane multilayer system. We show that pump-probe photoemission spectroscopy can track these transitions in real time via energy gaps exceeding 100 meV. The analogy with Haldane multilayers is revealed by nontrivial pseudospin textures, going from a monolayer p-wave to a bilayer d-wave symmetry at the critical field strength. We thus predict a nonequilibrium realization of a tunable Haldane multilayer model with a Berry curvature that can be tipped optically by small changes in external fields on femtosecond time scales. Since we are focused on the physics of chiral Dirac fermions, these results apply equally to all systems possessing Dirac points, such as surface states of topological insulators.

### E. Vecharynski, Y. Saad, and M. Sosonkina,"Graph partitioning using matrix values for preconditioning symmetric positive definite systems",SIAM Journal on Scientific Computing Vol. 36, Issue 1, pp. A63-A87,2014,

Prior to the parallel solution of a large linear system, it is required to perform a partitioning of its equations/unknowns. Standard partitioning algorithms are designed using the considerations of the efficiency of the parallel matrix-vector multiplication, and typically disregard the information on the coefficients of the matrix. This information, however, may have a significant impact on the quality of the preconditioning procedure used within the chosen iterative scheme. In the present paper, we suggest a spectral partitioning algorithm, which takes into account the information on the matrix coefficients and constructs partitions with respect to the objective of enhancing the quality of the nonoverlapping additive Schwarz (block Jacobi) preconditioning for symmetric positive definite linear systems. For a set of test problems with large variations in magnitudes of matrix coefficients, our numerical experiments demonstrate a noticeable improvement in the convergence of the resulting solution scheme when using the new partitioning approach.

### Michael Sentef, Alexander F. Kemper, Brian Moritz, James K. Freericks, Zhi-Xun Shen, and Thomas P. Devereaux,"Examining Electron-Boson Coupling Using Time-Resolved Spectroscopy",Phys. Rev. X 3, 041033 (2013),December 26, 2013,

Nonequilibrium pump-probe time-domain spectroscopies can become an important tool to disentangle degrees of freedom whose coupling leads to broad structures in the frequency domain. Here, using the time-resolved solution of a model photoexcited electron-phonon system, we show that the relaxational dynamics are directly governed by the equilibrium self-energy so that the phonon frequency sets a window for “slow” versus “fast” recovery. The overall temporal structure of this relaxation spectroscopy allows for a reliable and quantitative extraction of the electron-phonon coupling strength without requiring an effective temperature model or making strong assumptions about the underlying bare electronic band dispersion.

### Daniel T. Graves, Phillip Colella, David Modiano, Jeffrey Johnson, Bjorn Sjogreen, Xinfeng Gao,"A Cartesian Grid Embedded Boundary Method for the Compressible Navier Stokes Equations",Communications in Applied Mathematics and Computational Science,December 23, 2013,

In this paper, we present an unsplit method for the time-dependent
compressible Navier-Stokes equations in two and three dimensions.
We use a a conservative, second-order Godunov algorithm.
We use a Cartesian grid, embedded boundary method to resolve complex
boundaries.  We solve for viscous and conductive terms with a
second-order semi-implicit algorithm.  We demonstrate second-order
accuracy in solutions of smooth problems in smooth geometries and
demonstrate robust behavior for strongly discontinuous initial
conditions in complex geometries.

### Cory M. Simon, Jihan Kim, Li-Chiang Lin, Richard L. Martin, Maciej Haranczyk, Berend Smit,"Optimizing nanoporous materials for gas storage",Physical Chemistry Chemical Physics,December 4, 2013,

Natural gas, mostly methane, is an attractive replacement of petroleum fuels for automotive vehicles because of its economic and environmental advantages. The technological obstacle to using methane as a vehicular fuel is its comparatively low volumetric energy density, necessitating densification strategies to yield reasonable driving ranges from a reasonably sized tank.

### N. Plonka, A. F. Kemper, S. Graser, A. P. Kampf, T. P. Devereaux,"Tunneling spectroscopy for probing orbital anisotropy in iron pnictides",Phys. Rev. B 88, 174518 (2013),November 27, 2013,

Using realistic multiorbital tight-binding Hamiltonians and the T-matrix formalism, we explore the effects of a nonmagnetic impurity on the local density of states in Fe-based compounds. We show that scanning tunneling spectroscopy (STS) has very specific anisotropic signatures that track the evolution of orbital splitting (OS) and antiferromagnetic gaps. Both anisotropies exhibit two patterns that split in energy with decreasing temperature, but for OS these two patterns map onto each other under 90 rotation. STS experiments that observe these signatures should expose the underlying magnetic and orbital order as a function of temperature across various phase transitions.

### Slim T. Chourou, Abhinav Sarje, Xiaoye Li, Elaine Chan and Alexander Hexemer,"HipGISAXS: a high-performance computing code for simulating grazing-incidence X-ray scattering data",Journal of Applied Crystallography,2013,46:1781-1795,doi: 10.1107/ S0021889813025843

We have implemented a flexible Grazing Incidence Small-Angle Scattering (GISAXS) simulation code in the framework of the Distorted Wave Born Approximation (DWBA) that effectively utilizes the parallel processing power provided by graphics processors and multicore processors. This constitutes a handy tool for experimentalists facing a massive flux of data, allowing them to accurately simulate the GISAXS process and analyze the produced data. The software computes the diffraction image for any given superposition of custom shapes or morphologies in a user-defined region of the reciprocal space for all possible grazing incidence angles and sample orientations. This flexibility then allows to easily tackle a wide range of possible sample structures such as nanoparticles on top of or embedded in a substrate or a multilayered structure. In cases where the sample displays regions of significant refractive index contrast, an algorithm has been implemented to perform a slicing of the sample and compute the averaged refractive index profile to be used as the reference geometry of the unperturbed system. Preliminary tests show good agreement with experimental data for a variety of commonly encountered nanostrutures.

### Maciej Haranczyk, Li-Chiang Lin, Kyuho Lee, Richard L. Martin, Jeffrey B. Neaton, Berend Smit,"Methane storage capabilities of diamond analogues",Physical Chemistry Chemical Physics,October 31, 2013,

Methane can be an alternative fuel for vehicular usage provided that new porous materials are developed for its efficient adsorption-based storage. Herein, we search for materials for this application within the family of diamond analogues. We used density functional theory to investigate structures in which tetrahedral C atoms of diamond are separated by-CC-or-BN-groups, as well as ones involving substitution of tetrahedral C atoms with Si and Ge atoms.

### Wei Hu, Zhenyu Li and Jinlong Yang,"Structural, electronic, and optical properties of hybrid silicene and graphene nanocomposite",J. Chem. Phys. 139, 154704 (2013),October 16, 2013,doi: 10.1063/1.4824887

Structural, electronic, and optical properties of hybrid silicene and graphene (S/G) nanocomposite are examined with density functional theory calculations. It turns out that weak van der Waals interactions dominate between silicene and graphene with their intrinsic electronic properties preserved. Interestingly, interlayer interactions in hybrid S/G nanocomposite induce tunable p-type and n-type doping of silicene and graphene, respectively, showing their doping carrier concentrations can be modulated by their interfacial spacing.

### Wei Hu, Zhenyu Li and Jinlong Yang,"Surface and size effects on the charge state of NV center in nanodiamonds",Computational and Theoretical Chemistry, 2013, 1021, 49-53,October 1, 2013,doi: 10.1016/j.comptc.2013.06.015

Electronic structures and stability of nitrogen–vacancy (NV) centers doped in nanodiamonds (NDs) have been investigated with large-scale density functional theory (DFT) calculations. Spin polarized defect states are not affected by the particle sizes and surface decorations, while the band gap is sensitive to these effects. Induced by the spherical surface electric dipole layer, surface functionalization has a long-ranged impact on the stability of charged NV centers doped in NDs. NV− center doped in DNs is more favorable for n-type fluorinated diamond, while NV0 is preferred for p-type hydrogenated NDs. Therefore, surface decoration provides a useful way for defect state engineering.

### J. A. Sobota, S.-L. Yang, A. F. Kemper, J. J. Lee, F. T. Schmitt, W. Li, R. G. Moore, J. G. Analytis, I. R. Fisher, P. S. Kirchmann, T. P. Devereaux, and Z.-X. Shen,"Direct Optical Coupling to an Unoccupied Dirac Surface State in the Topological Insulator Bi2Se3",Phys. Rev. Lett. 111, 136802 (2013),September 24, 2013,

We characterize the occupied and unoccupied electronic structure of the topological insulator Bi2Se3 by one-photon and two-photon angle-resolved photoemission spectroscopy and slab band structure calculations. We reveal a second, unoccupied Dirac surface state with similar electronic structure and physical origin to the well-known topological surface state. This state is energetically located 1.5 eV above the conduction band, which permits it to be directly excited by the output of a Ti:sapphire laser. This discovery demonstrates the feasibility of direct ultrafast optical coupling to a topologically protected, spin-textured surface state.

### Y. F. Kung, W.-S. Lee, C.-C. Chen, A. F. Kemper, A. P. Sorini, B. Moritz, and T. P. Devereaux,"Time-dependent charge-order and spin-order recovery in striped systems",Phys. Rev. B 88, 125114 (2013),September 24, 2013,

Using time-dependent Ginzburg-Landau theory, we study the role of amplitude and phase fluctuations in the recovery of charge-stripe and spin-stripe phases in response to a pump pulse that melts the orders. For parameters relevant to the case where charge order precedes spin order thermodynamically, amplitude recovery governs the initial time scales, while phase recovery controls behavior at longer times. In addition to these intrinsic effects, there is a longer spin reorientation time scale related to the scattering geometry that dominates the recovery of the spin phase. Coupling between the charge and spin orders locks the amplitude and similarly the phase recovery, reducing the number of distinct time scales. Our results well reproduce the major experimental features of pump-probe x-ray diffraction measurements on the striped nickelate La1.75Sr0.25NiO4. They highlight the main idea of this work, which is the use of time-dependent Ginzburg-Landau theory to study systems with multiple coexisting order parameters.

### Richard L Martin, Mahdi Niknam Shahrak, Joseph A Swisher, Cory M Simon, Julian P Sculley, Hong-Cai Zhou, Berend Smit, Maciej Haranczyk,"Modeling Methane Adsorption in Interpenetrating Porous Polymer Networks",The Journal of Physical Chemistry C,September 19, 2013,

Porous polymer networks (PPNs) are a class of porous materials of particular interest in a variety of energy-related applications because of their stability, high surface areas, and gas uptake capacities. Computationally derived structures for five recently synthesized PPN frameworks, PPN-2,-3,-4,-5, and-6, were generated for various topologies, optimized using semiempirical electronic structure methods, and evaluated using classical grand-canonical Monte Carlo simulations.

### Richard L. Martin, Maciej Haranczyk,"Insights into Multi-Objective Design of Metal–Organic Frameworks",Crystal Growth & Design,September 18, 2013,

Metal-organic framework (MOF) crystal topologies which permit the highest internal surface areas are identified by means of multiobjective optimization and abstract structure models. We demonstrate that MOF design efforts can be focused within five underlying nets to engineer distinct, Pareto-optimal compromises between high gravimetric and high volumetric surface area materials.

### Marielle Pinheiro, Richard L. Martin, Chris H. Rycroft, Maciej Haranczyk,"High accuracy geometric analysis of crystalline porous materials",CrystEngComm,September 5, 2013,

A number of algorithms to analyze crystalline porous materials and their porosity employ the Voronoi tessellation, whereby the space in the material is divided into irregular polyhedral cells that can be analyzed to determine the pore topology and structure. However, the Voronoi tessellation is only appropriate when atoms all have equal radii, and the natural generalization to structures with unequal radii leads to cells with curved boundaries, which are computationally expensive to compute.

### B. Moritz, A. F. Kemper, M. Sentef, T. P. Devereaux, J. K. Freericks,"Electron-Mediated Relaxation Following Ultrafast Pumping of Strongly Correlated Materials: Model Evidence of a Correlation-Tuned Crossover between Thermal and Nonthermal States",Phys. Rev. Lett. 111, 077401 (2013),2013,

We examine electron-electron mediated relaxation following ultrafast electric field pump excitation of the fermionic degrees of freedom in the Falicov-Kimball model for correlated electrons. The results reveal a dichotomy in the temporal evolution of the system as one tunes through the Mott metal-to-insulator transition: in the metallic regime relaxation can be characterized by evolution toward a steady state well described by Fermi-Dirac statistics with an increased effective temperature; however, in the insulating regime this quasithermal paradigm breaks down with relaxation toward a nonthermal state with a complicated electronic distribution as a function of momentum. We characterize the behavior by studying changes in the energy, photoemission response, and electronic distribution as functions of time. This relaxation may be observable qualitatively on short enough time scales that the electrons behave like an isolated system not in contact with additional degrees of freedom which would act as a thermal bath, especially when using strong driving fields and studying materials whose physics may manifest the effects of correlations.

### Marielle Pinheiro, Richard L. Martin, Chris H. Rycroft, Andrew Jones, Enrique Iglesia, Maciej Haranczyk,"Characterization and comparison of pore landscapes in crystalline porous materials",Journal of Molecular Graphics and Modelling,July 31, 2013,

Crystalline porous materials have many applications, including catalysis and separations. Identifying suitable materials for a given application can be achieved by screening material databases. Such a screening requires automated high-throughput analysis tools that characterize and represent pore landscapes with descriptors, which can be compared using similarity measures in order to select, group and classify materials. Here, we discuss algorithms for the calculation of two types of pore landscape descriptors.

### Wei Hu, Xiaojun Wu, Zhenyu Li and Jinlong Yang,"Helium separation via porous silicene based ultimate membrane",Nanoscale, 2013, 5, 9062-9066,July 11, 2013,doi: 10.1039/C3NR02326E

Helium purification has become more important for increasing demands in scientific and industrial applications. In this work, we demonstrated that the porous silicene can be used as an effective ultimate membrane for helium purification on the basis of first-principles calculations. Prinstine silicene monolayer is impermeable to helium gas with a high penetration energy barrier (1.66 eV). However, porous silicene with either Stone–Wales (SW) or divacancy (555[thin space (1/6-em)]777 or 585) defect presents a surmountable barrier for helium (0.33 to 0.78 eV) but formidable for Ne, Ar, and other gas molecules. In particular, the porous silicene with divacancy defects shows high selectivity for He/Ne and He/Ar, superior to graphene, polyphenylene, and traditional membranes.

### A.F. Kemper, M. Sentef, B. Moritz, C.C. Kao, Z.X. Shen, J.K. Freericks, T.P. Devereaux,"Mapping of the unoccupied states and relevant bosonic modes via the time dependent momentum distribution",Phys. Rev. B 87, 235139 (2013),June 28, 2013,

The unoccupied states of complex materials are difficult to measure, yet they play a key role in determining their properties. We propose a technique that can measure the unoccupied states, called time-resolved Compton scattering, which measures the time-dependent momentum distribution (TDMD). Using a nonequilibrium Keldysh formalism, we study the TDMD for electrons coupled to a lattice in a pump-probe setup. We find a direct relation between temporal oscillations in the TDMD and the dispersion of the underlying unoccupied states, suggesting that both can be measured by time-resolved Compton scattering. We demonstrate the experimental feasibility by applying the method to a model of MgB2 with realistic material parameters.

### Y. S. Lee, S. J. Moon, Scott C. Riggs, M. C. Shapiro, I. R. Fisher, Bradford W. Fulfer, Julia Y. Chan, A. F. Kemper, and D. N. Basov,"Infrared study of the electronic structure of the metallic pyrochlore iridate Bi2Ir2O7",Phys. Rev. B 87, 195143 (2013),May 30, 2013,

We investigated the electronic properties of a single crystal of metallic pyrochlore iridate Bi2Ir2O7 by means of infrared spectroscopy. Our optical conductivity data show the splitting of t2gbands into Jeff ones due to strong spin-orbit coupling. We observed a sizable midinfrared absorption near 0.2 eV which can be attributed to the optical transition within the Jeff,1/2 bands. More interestingly, we found an abrupt suppression of optical conductivity in the very far-infrared region. Our results suggest that the electronic structure of Bi2Ir2O7 is governed by the strong spin-orbit coupling and correlation effects, which are a prerequisite for theoretically proposed nontrivial topological phases in pyrochlore iridates.

### Richard L. Martin, Maciej Haranczyk,"Optimization-Based Design of Metal-Organic Framework Materials",Journal of Chemical Theory and Computation,May 16, 2013,

Metal–organic frameworks (MOFs) are a class of porous materials constructed from metal or metal oxide building blocks connected by organic linkers. MOFs are highly tunable structures that can in theory be custom designed to meet the specific pore geometry and chemistry required for a given application such as methane storage or carbon capture. However, due to the sheer number of potential materials, identification of optimal MOF structures is a significant challenge.

### Richard L. Martin, Li-Chiang Lin, Kuldeep Jariwala, Berend Smit, Maciej Haranczyk,"Mail-Order Metal–Organic Frameworks (MOFs): Designing Isoreticular MOF-5 Analogues Comprising Commercially Available Organic Molecules",The Journal of Physical Chemistry C,April 17, 2013,

Metal–organic frameworks (MOFs), a class of porous materials, are of particular interest in gas storage and separation applications due largely to their high internal surface areas and tunable structures. MOF-5 is perhaps the archetypal MOF; in particular, many isoreticular analogues of MOF-5 have been synthesized, comprising alternative dicarboxylic acid ligands. In this contribution we introduce a new set of hypothesized MOF-5 analogues, constructed from commercially available organic molecules.

### Nils E. R. Zimmermann, Timm J. Zabel, Frerich J. Keil,"Transport into Nanosheets: Diffusion Equations Put to Test",J. Phys. Chem. C,2013,117:7384-7390,doi: 10.1021/jp400152q

Ultrathin porous materials, such as zeolite nanosheets, are prominent candidates for performing catalysis, drug supply, and separation processes in a highly efficient manner due to exceptionally short transport paths. Predictive design of such processes requires the application of diffusion equations that were derived for macroscopic, homogeneous surroundings to nanoscale, nanostructured host systems. Therefore, we tested different analytical solutions of Fick’s diffusion equations for their applicability to methane transport into two different zeolite nanosheets (AFI, LTA) under instationary conditions. Transient molecular dynamics simulations provided hereby concentration profiles and uptake curves to which the different solutions were fitted. Two central conclusions were deduced by comparing the fitted transport coefficients. First, the transport can be described correctly only if concentration profiles are used and the transport through the solid–gas interface is explicitly accounted for by the surface permeability. Second and most importantly, we have unraveled a size limitation to applying the diffusion equations to nanoscale objects. This is because transport-diffusion coefficients, DT, and surface permeabilities, α, of methane in AFI become dependent on nanosheet thickness. Deviations can amount to factors of 2.9 and 1.4 for DT and α, respectively, when, in the worst case, results from the thinnest AFI nanosheet are compared with data from the thickest sheet. We present a molecular explanation of the size limitation that is based on memory effects of entering molecules and therefore only observable for smooth pores such as AFI and carbon nanotubes. Hence, our work provides important tools to accurately predict and intuitively understand transport of guest molecules into porous host structures, a fact that will become the more valuable the more tiny nanotechnological objects get.

Watch a movie illustrating the transient molecular dynamics approach, which was critical for this study, here.

### Wei Hu, Zhenyu Li and Jinlong Yang,"Electronic and optical properties of graphene and graphitic ZnO nanocomposite structures",J. Chem. Phys. 138, 124706 (2013),March 28, 2013,doi: 10.1063/1.4796602

Electronic and optical properties of graphene and graphitic ZnO (G/g-ZnO) nanocomposites have been investigated with density functional theory. Graphene interacts overall weakly with g-ZnO monolayer via van der Waals interaction. There is no charge transfer between the graphene and g-ZnO monolayer, while a charge redistribution does happen within the graphene layer itself, forming well-defined electron-hole puddles. When Al or Li is doped in the g-ZnO monolayer, substantial electron (n-type) and hole (p-type) doping can be induced in graphene, leading to well-separated electron-hole pairs at their interfaces. Improved optical properties in graphene/g-ZnO nanocomposite systems are also observed, with potential photocatalytic and photovoltaic applications.

### Luciano Ost, Rafael Garibotti, Gilles Sassatelli, Gabriel Marchesan Almeida, Rémi Busseuil, Anastasiia Butko, Michel Robert, Jürgen Becker,"Novel Techniques for Smart Adaptive Multiprocessor SoCs",IEEE Transactions on Computers,March 20, 2013,doi: 10.1109/TC.2013.57

The growing concerns of power efficiency, silicon reliability and performance scalability motivate research in the area of adaptive embedded systems, i.e. systems endowed with decisional capacity, capable of online decision making so as to meet certain performance criteria. The scope of possible adaptation strategies is subject to the targeted architecture specifics, and may range from simple scenario-driven frequency/voltage scaling to rather complex heuristic-driven algorithm selection. This paper advocates the design of distributed memory homogeneous multiprocessor systems as a suitable template for best exploiting adaptation features, thereby tackling the aforementioned challenges. The proposed solution lies in the combined use of a typical application processor for global orchestration along with such an adaptive multiprocessor core for the handling of data-intensive computation. This paper describes an exploratory homogeneous multiprocessor template designed from the ground up for scalability and adaptation. The proposed contributions aim at increasing architecture efficiency through smart distributed control of architectural parameters such as frequency, and enhanced techniques for load balancing such as task migration and dynamic multithreading.

### E. Vecharynski and A. Knyazev,"Absolute value preconditioning for symmetric indefinite linear systems",SIAM Journal on Scientific Computing Vol. 35, Issue 2, pp. A696-A718,2013,

We introduce a novel strategy for constructing symmetric positive definite (SPD) preconditioners for linear systems with symmetric indefinite matrices. The strategy, called absolute value preconditioning, is motivated by the observation that the preconditioned minimal residual method with the inverse of the absolute value of the matrix as a preconditioner converges to the exact solution of the system in at most two steps. Neither the exact absolute value of the matrix nor its exact inverse are computationally feasible to construct in general. However, we provide a practical example of an SPD preconditioner that is based on the suggested approach. In this example we consider a model problem with a shifted discrete negative Laplacian and suggest a geometric multigrid (MG) preconditioner, where the inverse of the matrix absolute value appears only on the coarse grid, while operations on finer grids are based on the Laplacian. Our numerical tests demonstrate practical effectiveness of the new MG preconditioner, which leads to a robust iterative scheme with minimalist memory requirements.

### Wei Hu, Xiaojun Wu, Zhenyu Li and Jinlong Yang,"Porous silicene as a hydrogen purification membrane",Phys. Chem. Chem. Phys., 2013, 15, 5753-5757,February 22, 2013,doi: 10.1039/C3CP00066D

We investigated theoretically the hydrogen permeability and selectivity of a porous silicene membrane via first-principles calculations. The subnanometer pores of the silicene membrane are designed as divacancy defects with octagonal and pentagonal rings (585-divacancy). The porous silicene exhibits high selectivity comparable with graphene-based membranes for hydrogen over various gas molecules (N2, CO, CO2, CH4, and H2O). The divacancy defects in silicene are chemically inert to the considered gas molecules. Our results suggest that the porous silicene membrane is expected to find great potential in gas separation and filtering applications.

### Abhinav Sarje, Srinivas Aluru,"All-pairs computations on many-core graphics processors",Parallel Computing,2013,39-2:79-93,doi: 10.1016/j.parco.2013.01.002

Developing high-performance applications on emerging multi- and many-core architectures requires efficient mapping techniques and architecture-specific tuning methodologies to realize performance closer to their peak compute capability and memory bandwidth. In this paper, we develop architecture-aware methods to accelerate all-pairs computations on many-core graphics processors. Pairwise computations occur frequently in numerous application areas in scientific computing. While they appear easy to parallelize due to the independence of computing each pairwise interaction from all others, development of techniques to address multi-layered memory hierarchies, mapping within the restrictions imposed by the small and low-latency on-chip memories, striking the right balanced between concurrency, reuse and memory traffic etc., are crucial to obtain high-performance. We present a hierarchical decomposition scheme for GPUs based on decomposition of the output matrix and input data. We demonstrate that a careful tuning of the involved set of decomposition parameters is essential to achieve high efficiency on the GPUs. We also compare the performance of our strategies with an implementation on the STI Cell processor as well as multi-core CPU parallelizations using OpenMP and Intel Threading Building Blocks.

Developing high-performance applications on emerging multi- and many-core
architectures requires efficient mapping techniques and architecture-specific
tuning methodologies to realize performance closer to their peak compute
capability and memory bandwidth. In this paper, we develop architecture-aware
methods to accelerate all-pairs computations on many-core graphics processors.
Pairwise computations occur frequently in numerous application areas in
scientific computing. While they appear easy to parallelize due to the
independence of computing each pairwise interaction from all others, development
of techniques to address multi-layered memory hierarchies, mapping within the
restrictions imposed by the small and low-latency on-chip memories, striking the
right balanced between concurrency, reuse and memory traffic etc., are crucial
to obtain high-performance. We present a hierarchical decomposition scheme for
GPUs based on decomposition of the output matrix and input data. We demonstrate
that a careful tuning of the involved set of decomposition parameters is
essential to achieve high efficiency on the GPUs. We also compare the
performance of our strategies with an implementation on the STI Cell processor
as well as multi-core CPU parallelizations using OpenMP and Intel Threading
Building Blocks.Developing high-performance applications on emerging multi- and many-core
architectures requires efficient mapping techniques and architecture-specific
tuning methodologies to realize performance closer to their peak compute
capability and memory bandwidth. In this paper, we develop architecture-aware
methods to accelerate all-pairs computations on many-core graphics processors.
Pairwise computations occur frequently in numerous application areas in
scientific computing. While they appear easy to parallelize due to the
independence of computing each pairwise interaction from all others, development
of techniques to address multi-layered memory hierarchies, mapping within the
restrictions imposed by the small and low-latency on-chip memories, striking the
right balanced between concurrency, reuse and memory traffic etc., are crucial
to obtain high-performance. We present a hierarchical decomposition scheme for
GPUs based on decomposition of the output matrix and input data. We demonstrate
that a careful tuning of the involved set of decomposition parameters is
essential to achieve high efficiency on the GPUs. We also compare the
performance of our strategies with an implementation on the STI Cell processor
as well as multi-core CPU parallelizations using OpenMP and Intel Threading
Building Blocks.

### Richard L. Martin, Maciej Haranczyk,"Exploring frontiers of high surface area metal-organic frameworks",Chemical Science,February 6, 2013,4:1781-1785,

Metal–organic frameworks (MOFs) have enjoyed considerable interest due to their high internal surface areas as well as tunable pore geometry and chemistry. However, design of optimal MOFs is a great challenge due to the significant number of possible structures. In this work, we present a strategy to rapidly explore the frontiers of these high surface area materials. Here, organic ligands are abstracted by geometrical (alchemical) building blocks, and an optimization of their defining geometrical parameters is performed to identify shapes of ligands which maximize gravimetric surface area of the resulting MOFs. A strength of our approach is that the space of ligands to be explored can be rigorously bounded, allowing discovery of the optimum ligand shape within any criteria, conforming to synthetic requirements or arbitrary exploratory limits. By modifying these bounds, we can project to what extent achievable surface area increases when moving beyond the present limits of organic synthesis. Projecting optimal ligand shapes onto real chemical species, we achieve blueprints for MOFs of various topologies that are predicted to achieve up to 70% higher surface area than the current benchmark materials.

### Kumari Gaurav Rana, Takeaki Yajima, Subir Parui, Alexander F. Kemper, Thomas P.Devereaux, Yasuyuki Hikita, Harold Y. Hwang, Tamalika Banerjee,"Hot electron transport in a strongly correlated transition-metal oxide",Nature Scientific Reports, Volume 3, id. 1274 (2013).,February 2013,

Oxide heterointerfaces are ideal for investigating strong correlation effects to electron transport, relevant for oxide-electronics. Using hot-electrons, we probe electron transport perpendicular to the La0.7Sr0.3MnO3 (LSMO)- Nb-doped SrTiO3 (Nb:STO) interface and find the characteristic hot-electron attenuation length in LSMO to be 1.48 +/- 0.10 unit cells (u.c.) at -1.9 V, increasing to 2.02 +/- 0.16 u.c. at -1.3 V at room temperature. Theoretical analysis of this energy dispersion reveals the dominance of electron-electron and polaron scattering. Direct visualization of the local electron transport shows different transmission at the terraces and at the step-edges.

### Wei Hu, Zhenyu Li and Jinlong Yang,"Diamond as an inert substrate of graphene",J. Chem. Phys. 138, 054701 (2013),February 1, 2013,doi: 10.1063/1.4789420

Interaction between graphene and semiconducting diamond substrate has been examined with large-scale density functional theory calculations. Clean and hydrogenated diamond (100) and (111) surfaces have been studied. It turns out that weak van der Waals interactions dominate for graphene on all these surfaces. High carrier mobility of graphene is almost not affected, except for a negligible energy gap opening at the Dirac point. No charge transfer between graphene and diamond (100) surfaces is detected, while different charge-transfer complexes are formed between graphene and diamond (111) surfaces, inducing either p-type or n-type doping on graphene. Therefore, diamond can be used as an excellent substrate of graphene, which almost keeps its electronic structures at the same time providing the flexibility of charge doping.

### M. Dandouna, N. Emad and L.A. Drummond,"A Proposed Programming Model for Writing Sustainable Numerical Libraries for Extreme Scale Computing",Conc. and Compt.,January 16, 2013,

The promise of computer systems with very large orders of processing elements cannot be realized without an effective solution that targets the programming model with a suitable programming environ- ment. Nowadays, it is necessary to identify and rapidly make available robust software technologies to enable high-end computer applications to run efficiently on these emerging systems, and to enable the development of more complex and capable simulation codes for scientific and engineering applica- tions. We review some of numerical libraries that have achieved modularity, scalability and extensibility thanks to their use of object-oriented programming approaches. However, only a few of these libraries have managed to effectively implement sequential and parallel code reusability.

Here, we discuss what is currently missing from existing library implementations and propose a pro- gramming model based on a modular and multi-level parallelism approach that has a strict separation between computational operations, data management and communication. We illustrate how this model makes it possible to design more scalable libraries by exploiting better their functionalities and even enable the formulation of hybrid numerical scheme to be run efficiently on multi-level parallel systems with a large number of heterogeneous processing units without confining the parallelism to the program- ming model of the communication library. We use the multiple explicitly restarted Arnoldi method as our test case and our implementations require full reuse of serial/parallel kernels in their implementation. Our experiments include comparisons with state-of-the-art numerical libraries on high-end computing systems.

### George Michelogiannakis, William J. Dally,"Elastic Buffer Flow Control for On-Chip Networks",Transactions on Computers,2013,

Networks-on-chip (NoCs) were developed to meet the communication requirements of large-scale systems. The majority of current NoCs spend considerable area and power for router buffers. In our past work, we have developed elastic buffer (EB) flow control which adds simple control logic in the channels to use pipeline flip-flops (FFs) as EBs with two storage locations. This way, channels act as distributed FIFOs and input buffers are no longer required. Removing buffers and virtual channels (VCs) significantly simplifies router design. Compared to VC networks, EB networks provide an up to 45% shorter cycle time, 16% more throughput per unit power or 22% more throughput per unit area. EB networks provide traffic classes using duplicate physical subnetworks. However, this approach negates the cost gains or becomes infeasible for a large number of traffic classes. Therefore, in this paper we propose a hybrid EB-VC router which provides an arbitrary number of traffic classes by using an input buffer to drain flits facing severe contention or deadlock. Thus, hybrid routers operate as EB routers in the common case, and as VC routers when necessary. For this reason, the hybrid EB-VC scheme offers 21% more throughput per unit power than VC networks and 12% than EB networks.

### Michael F. Wehner,"Very extreme seasonal precipitation in the NARCCAP ensemble: model performance and projections",Climate Dynamics,January 2013,40:59-80,doi: 10.1007/s00382-012-1393-1

Seasonal extreme daily precipitation is analyzed in the ensemble of NARCAPP regional climate models. Significant variation in these models’ abilities to reproduce observed precipitation extremes over the contiguous United States is found. Model performance metrics are introduced to characterize overall biases, seasonality, spatial extent and the shape of the precipitation distribution. Comparison of the models to gridded observations that include an elevation correction is found to be better than to gridded observations without this correction. A complicated model weighting scheme based on model performance in simulating observations is found to cause significant improvements in ensemble mean skill only if some of the models are poorly performing outliers. The effect of lateral boundary conditions are explored by comparing the integrations driven by reanalysis to those driven by global climate models. Projected mid-century future changes in seasonal precipitation means and extremes are presented and discussions of the sources of uncertainty and the mechanisms causing these changes are presented.

### E. O. Ofek, D. Fox, S. B. Cenko, M. Sullivan, O., D. A. Frail, A. Horesh, A. Corsi, R. M., N. Gehrels, S. R. Kulkarni, A., P. E. Nugent, O. Yaron, A. V. Filippenko, M. M., L. Bildsten, J. S. Bloom, D., I. Arcavi, R. R. Laher, D. Levitan, B. Sesar, J. Surace,"X-Ray Emission from Supernovae in Dense Circumstellar Matter Environments: A Search for Collisionless Shocks",Astrophysical Journal,2013,763:42,doi: 10.1088/0004-637X/763/1/42

The optical light curve of some supernovae (SNe) may be powered by the
outward diffusion of the energy deposited by the explosion shock (the
so-called shock breakout) in optically thick (

### Kesheng Wu, Wes Bethel, Ming Gu, David, Oliver R\ ubel,"A Big Data Approach to Analyzing Market Volatility",Algorithmic Finance,2013,2:241--267,LBNL LBNL-6382E, doi: 10.3233/AF-13030

Understanding the microstructure of the financial market requires the processing of a vast amount of data related to individual trades, and sometimes even multiple levels of quotes. Analyzing such a large volume of data requires tremendous computing power that is not easily available to financial academics and regulators. Fortunately, public funded High Performance Computing (HPC) power is widely available at the National Laboratories in the US. In this paper we demonstrate that the HPC resource and the techniques for data-intensive sciences can be used to greatly accelerate the computation of an early warning indicator called Volume-synchronized Probability of Informed trading (VPIN). The test data used in this study contains five and a half year's worth of trading data for about 100 most liquid futures contracts, includes about 3 billion trades, and takes 140GB as text files. By using (1) a more efficient file format for storing the trading records, (2) more effective data structures and algorithms, and (3) parallelizing the computations, we are able to explore 16,000 different ways of computing VPIN in less than 20 hours on a 32-core IBM DataPlex machine. Our test demonstrates that a modest computer is sufficient to monitor a vast number of trading activities in real-time -- an ability that could be valuable to regulators.

Our test results also confirm that VPIN is a strong predictor of liquidity-induced volatility. With appropriate parameter choices, the false positive rates are about 7% averaged over all the futures contracts in the test data set. More specifically, when VPIN values rise above a threshold (CDF > 0.99), the volatility in the subsequent time windows is higher than the average in 93% of the cases.