In the past decades, one of the most common forms of addressing reproducibility in scientific workflow-based computational science has consisted of tracking the provenance of the produced and published results. Such provenance allows inspecting intermediate and final results, improves understanding, and permits replaying a workflow execution. Nevertheless, this approach does not provide any means for capturing and sharing the very valuable knowledge about the experimental equipment of a computational experiment, i.e., the execution environment in which the experiments are conducted. In this work, we propose a novel approach based on semantic vocabularies that describes the execution environment of scientific workflows, so as to conserve it. We define a process for documenting the workflow application and its related management system, as well as their dependencies. Then we apply this approach over three different real workflow applications running in three distinct scenarios, using public, private, and local Cloud platforms. In particular, we study one astronomy workflow and two life science workflows for genomic information analysis. Experimental results show that our approach can reproduce an equivalent execution environment of a predefined virtual machine image on all evaluated computing platforms.
Reproducibility of execution environments in computational science using Semantics and Clouds - ScienceDirect