Berkeley Lab CS Reorganization Brings Data Science to the Forefront
August 16, 2021
Contact: [email protected]
To position Lawrence Berkeley National Laboratory (Berkeley Lab) among the premier institutions for data science research, the Computational Research Division (CRD) will be reorganized to form two new divisions, effective October 1, 2021.
One division will focus on data science research and will be composed primarily of staff currently in CRD’s Data Science and Technology Department; Deb Agarwal will serve as interim director. The other division will focus on math, computer, and computational science research and will be helmed by CRD Director David Brown. Both divisions, the Center for Advanced Mathematics for Energy Research Applications (CAMERA), and the Quantum Systems Accelerator (QSA) will report to Associate Laboratory Director for Computing Sciences Jonathan Carter.
“With the deluge of data being produced by experimental and observational science instruments, it is undeniable how important data science activities have become to science, especially at Berkeley Lab. With the new data science division, we will be able to focus on the whole set of challenges that this brings up and address those,” said Brown.
He adds that over the last decade, under the direction of Agarwal, CRD’s Data Science and Technology Department has been a leader in aspects of data science like scientific data management, machine learning, and science user-centric design. And as part of this effort, CRD researchers have built partnerships in every scientific area at Berkeley Lab.
“Our researchers were pioneers in the concept of science user-centric design; the idea of working closely with domain scientists to understand how they think and work and then developing software solutions tailored to their needs,” said Brown. “The fact that programs like ESS DIVE, the Department of Energy’s (DOE’s) repository of earth science data, is located at Berkeley Lab is evidence of DOE’s confidence in our approach.”
According to Brown, while all of the divisions in the CS Area host researchers who are at the forefront of developing and applying machine learning techniques to increasingly large and complex science datasets, from creating new algorithms for deep learning and scalable analytics to developing the mathematics for self-driving experiments, this new division will also tackle the broader issues of the data lifecycle that is a prerequisite for many machine learning techniques.
With the reorganization, Agarwal sees an opportunity for the two new divisions to work together and holistically explore the new algorithms and capabilities required to look at the future data.
“Data science has been emerging as its own discipline and our research in this area has been evolving to meet the challenges in data science, ” said Agarwal. “This moment is an opportunity for us to recognize and elevate our active and robust research program and put it in a position to anticipate where the field of data science is heading in five to ten years and make a coherent case for DOE to invest in those areas.”
With the new data science division, Agarwal also envisions a formalized process for data science collaboration at Berkeley Lab. She notes that current partnerships with the domain sciences are currently formed through individual projects and relationships.
“The problem with this is if the data scientist who initially formed the relationship retires or leaves the organization, we lose that relationship. I would like to figure out a more consistent way to form partnerships, so we have a durable structure,” said Agarwal. “I would also like to see us develop effective, reusable methods and software that can be used across scientific areas.”
“This move toward creating entities dedicated to data science research is something that is happening at universities and national laboratories across the country, like UC Berkeley’s Computing, Data Science, and Society Division, or Argonne National Laboratory’s Data Science and Learning Division,” said Carter. “This reorganization is a recognition of the increasingly important role that data science plays in the scientific process and gives it the visibility and stature that it deserves.”