Devarshi Ghoshal

Research Scientist

Phone: (510) 486-4351

Devarshi Ghoshal is a Research Scientist in the Usable Software Systems Group (Scientific Data Division) at LBL. His research revolves around different aspects of data and data management in HPC and distributed environments. His current research focuses on real-time stream processing, scientific workflow data management, and data provenance.

He received his Ph.D. in Computer Science from Indiana University, and has a Bachelor's degree in Computer Science and Engineering from IEM, Kolkata. He joined LBL as a Postdoctoral Fellow in 2014.

Journal Articles

Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Michael Beach, Drew Paine, Lavanya Ramakrishnan, "Science Capsule - Capturing the Data Life Cycle", Journal of Open Source Software, 2021, 6:2484, doi: 10.21105/joss.02484

Drew Paine, Devarshi Ghoshal, Lavanya Ramakrishnan, "Experiences with a Flexible User Research Process to Build Data Change Tools", Journal of Open Research Software, September 1, 2020, doi: 10.5334/jors.284

Scientific software development processes are understood to be distinct from commercial software development practices due to uncertain and evolving states of scientific knowledge. Sustaining these software products is a recognized challenge, but under-examined is the usability and usefulness of such tools to their scientific end users. User research is a well-established set of techniques (e.g., interviews, mockups, usability tests) applied in commercial software projects to develop foundational, generative, and evaluative insights about products and the people who use them. Currently these approaches are not commonly applied and discussed in scientific software development work. The use of user research techniques in scientific environments can be challenging due to the nascent, fluid problem spaces of scientific work, varying scope of projects and their user communities, and funding/economic constraints on projects.

In this paper, we reflect on our experiences undertaking a multi-method user research process in the Deduce project. The Deduce project is investigating data change to develop metrics, methods, and tools that will help scientists make decisions around data change. There is a lack of common terminology since the concept of systematically measuring and managing data change is under explored in scientific environments. To bridge this gap we conducted user research that focuses on user practices, needs, and motivations to help us design and develop metrics and tools for data change. This paper contributes reflections and the lessons we have learned from our experiences. We offer key takeaways for scientific software project teams to effectively and flexibly incorporate similar processes into their projects.

D Ghoshal, V Hendrix, W Fox, S Balasubhramanian, L Ramakrishnan, "FRIEDA: Flexible Robust Intelligent Elastic Data Management Framework", The Journal of Open Source Software, 2017, 2:164--164, doi: 10.21105/joss.00164

CS Daley, D Ghoshal, GK Lockwood, S Dosanjh, L Ramakrishnan, NJ Wright, "Performance characterization of scientific workflows for the optimal use of Burst Buffers", CEUR Workshop Proceedings, 2016, 1800:69--73,

Conference Papers

Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Drew Paine, Sarah Poon, Michael Beach, Alpha N'Diaye, Patrick Huck, Lavanya Ramakrishnan, "Science Capsule: Towards Sharing and Reproducibility of Scientiﬁc Workﬂows", 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS), November 15, 2021, doi: 10.1109/WORKS54523.2021.00014

Workflows are increasingly processing large volumes of data from scientific instruments, experiments and sensors. These workflows often consist of complex data processing and analysis steps that might include a diverse ecosystem of tools and also often involve human-in-the-loop steps. Sharing and reproducing these workflows with collaborators and the larger community is critical but hard to do without the entire context of the workflow including user notes and execution environment. In this paper, we describe Science Capsule, which is a framework to capture, share, and reproduce scientific workflows. Science Capsule captures, manages and represents both computational and human elements of a workflow. It automatically captures and processes events associated with the execution and data life cycle of workflows, and lets users add other types and forms of scientific artifacts. Science Capsule also allows users to create `workflow snapshots' that keep track of the different versions of a workflow and their lineage, allowing scientists to incrementally share and extend workflows between users. Our results show that Science Capsule is capable of processing and organizing events in near real-time for high-throughput experimental and data analysis workflows without incurring any significant performance overheads.

Devarshi Ghoshal, Drew Paine, Gilberto Pastorello, Abdelrahman Elbashandy, Dan Gunter, Oluwamayowa Amusat, Lavanya Ramakrishnan, "Experiences with Reproducibility: Case Studies from Scientific Workflows", (P-RECS'21) Proceedings of the 4th International Workshop on Practical Reproducible Evaluation of Computer Systems, ACM, June 21, 2021, doi: 10.1145/3456287.3465478

Reproducible research is becoming essential for science to ensure transparency and for building trust. Additionally, reproducibility provides the cornerstone for sharing of methodology that can improve efficiency. Although several tools and studies focus on computational reproducibility, we need a better understanding about the gaps, issues, and challenges for enabling reproducibility of scientific results beyond the computational stages of a scientific pipeline. In this paper, we present five different case studies that highlight the reproducibility needs and challenges under various system and environmental conditions. Through the case studies, we present our experiences in reproducing different types of data and methods that exist in an experimental or analysis pipeline. We examine the human aspects of reproducibility while highlighting the things that worked, that did not work, and that could have worked better for each of the cases. Our experiences capture a wide range of scenarios and are applicable to a much broader audience who aim to integrate reproducibility in their everyday pipelines.

Payton A Linton, William M Melodia, Alina Lazar, Deborah Agarwal, Ludovico Bianchi, Devarshi Ghoshal, Kesheng Wu, Gilberto Pastorello, Lavanya Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", 2019,

S Swaid, M Maat, H Krishnan, D Ghoshal, L Ramakrishnan, "Usability heuristic evaluation of scientific data analysis and visualization tools", Advances in Intelligent Systems and Computing, 2018, 607:471--482, doi: 10.1007/978-3-319-60492-3_45

D Ghoshal, L Ramakrishnan, D Agarwal, "Dac-Man: Data Change Management for Scientific Datasets on HPC Systems", SC ’18, Piscataway, NJ, USA, IEEE Press, 2018, 72:1--72:1,

Devarshi Ghoshal, Lavanya Ramakrishnan, "MaDaTS: Managing Data on Tiered Storage for Scientific Workflows", Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '17), ACM, 2017, 41--52, doi: 10.1145/3078597.3078611

W Fox, D Ghoshal, A Souza, GP Rodrigo, L Ramakrishnan, "E-HPC: A library for elastic resource management in HPC environments", Proceedings of WORKS 2017: 12th Workshop on Workflows in Support of Large-Scale Science - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis, 2017, doi: 10.1145/3150994.3150996

V Hendrix, J Fox, D Ghoshal, L Ramakrishnan, "Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems", Proceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016, 2016, 146--155, doi: 10.1109/CCGrid.2016.54

Devarshi Ghoshal, Lavanya Ramakrishnan, "FRIEDA: Flexible Robust Intelligent Elastic Data Management in Cloud Environments", 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), IEEE, 2013, 1096--1105, doi: 10.1109/SC.Companion.2012.132

Devarshi Ghoshal, Richard Shane Canon, Lavanya Ramakrishnan, "I/O Performance of Virtualized Cloud Environments", Proceedings of the Second International Workshop on Data Intensive Computing in the Clouds (DataCloud-SC '11), ACM, 2011, 71--80, doi: 10.1145/2087522.2087535

Book Chapters

L Ramakrishnan, D Ghoshal, V Hendrix, E Feller, P Mantha, C Morin, "Storage and Data Life Cycle Management in Cloud Environments with FRIEDA.", Cloud Computing for Data-Intensive Applications, (Springer: 2014) Pages: 357--378

Lavanya Ramakrishnan, Adam Scovel, Iwona Sakrejda, Susan Coghlan, Shane Canon, Anping Liu, Devarshi Ghoshal, Krishna Muriki, Nicholas J. Wright, "Magellan - A Testbed to Explore Cloud Computing for Science", On the Road to Exascale Computing: Contemporary Architectures in High Performance Computing, (Chapman & Hall/CRC Press: 2013)

Presentation/Talks

Cheah You-Wei, Drew Paine, Devarshi Ghoshal, Lavanya Ramakrishnan, Bringing Data Science to Qualitative Analysis, 2018 IEEE 14th International Conference on e-Science, Pages: 325-326 2018, doi: 10.1109/eScience.2018.00076

Reports

Drew Paine, Devarshi Ghoshal, Lavanya Ramakrishnan, "Investigating Scientific Data Change with User Research Methods", August 20, 2020, LBNL LBNL-2001347,

Scientific datasets are continually expanding and changing due to fluctuations with instruments, quality assessment and quality control processes, and modifications to software pipelines. Datasets include minimal information about these changes or their effects requiring scientists manually assess modifications through a number of labor intensive and ad-hoc steps. The Deduce project is investigating data change to develop metrics, methods, and tools that will help scientists systematically identify and make decisions around data changes. Currently, there is a lack of understanding, and common practices, for identifying and evaluating changes in datasets since systematically measuring and managing data change is under explored in scientific work. We are conducting user research to address this need by exploring scientist's conceptualizations, behaviors, needs, and motivations when dealing with changing datasets. Our user research utilizes multiple methods to produce foundational, generative insights and evaluate research products produced by our team. In this paper, we detail our user research process and outline our findings about data change that emerge from our studies. Our work illustrates how scientific software teams can push beyond just usability testing user interfaces or tools to better probe the underlying ideas they are developing solutions to address.

Posters

P. Linton, W. Melodia, A. Lazar, D. Agarwal, L. Bianchi, D. Ghoshal, K. Wu, G. Pastorello, L. Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), 2019,