Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Michael Beach, Drew Paine, Lavanya Ramakrishnan, "Science Capsule - Capturing the Data Life Cycle", Journal of Open Source Software, 2021, 6:2484, doi: 10.21105/joss.02484
Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Drew Paine, Sarah Poon, Michael Beach, Alpha N'Diaye, Patrick Huck, Lavanya Ramakrishnan, "Science Capsule: Towards Sharing and Reproducibility of Scientiﬁc Workﬂows", 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS), November 15, 2021, doi: 10.1109/WORKS54523.2021.00014
Workflows are increasingly processing large volumes of data from scientific instruments, experiments and sensors. These workflows often consist of complex data processing and analysis steps that might include a diverse ecosystem of tools and also often involve human-in-the-loop steps. Sharing and reproducing these workflows with collaborators and the larger community is critical but hard to do without the entire context of the workflow including user notes and execution environment. In this paper, we describe Science Capsule, which is a framework to capture, share, and reproduce scientific workflows. Science Capsule captures, manages and represents both computational and human elements of a workflow. It automatically captures and processes events associated with the execution and data life cycle of workflows, and lets users add other types and forms of scientific artifacts. Science Capsule also allows users to create `workflow snapshots' that keep track of the different versions of a workflow and their lineage, allowing scientists to incrementally share and extend workflows between users. Our results show that Science Capsule is capable of processing and organizing events in near real-time for high-throughput experimental and data analysis workflows without incurring any significant performance overheads.