Provenance of simulation and data analysis workflows

Version 4.2 by adavison on 2020/08/05 08:40

Introduction

Computational provenance is a record of all the steps in a computational scientific workflow, including the code that was run, input data, the computational environment (hardware, OS, compiler versions, library version...), and output data.

Capturing computational provenance facilitates:

  • reproducibility of results
  • management and tracking of workflows/projects by the scientists/engineers involved
  • evaluation/review by other scientists and engineers

 

Standards

Information about the W3C PROV ontology and related tools

Storage of provenance in the Knowledge Graph

Tools for automated capture of provenance

  • on different systems:
    • HPC systems
    • neuromorphic systems
    • Jupyter notebooks
    • users' own computers
  • prospective/pre-emptive vs run-time provenance capture
  • capture of metadata vs capture of artefacts

Communication between computer systems and the KG

  • local cache and synchronization?

User interfaces for browsing, visualizing, and searching provenance information