Provenance of simulation and data analysis workflows

Version 10.1 by adavison on 2020/08/05 11:42

Introduction

Computational provenance is a record of all the steps in a computational scientific workflow, including the code that was run, input data, the computational environment (hardware, OS, compiler versions, library version...), the person who performed each step, and output data.

Capturing computational provenance facilitates:

reproducibility of results
management and tracking of workflows/projects by the scientists/engineers involved
evaluation/review by other scientists and engineers

Standards

The W3C PROV standard provides a data model and related tools for provenance interchange on the web. The following diagram shows the three base classes of the PROV data model: Entity, Activity, and Agent. These three classes form the basis for the representation of provenance in the EBRAINS Knowledge Graph: every node in the KG has a type which is a subclass of one of these base classes.

The three Starting Point classes of the W3C PROV ontology and the properties that relate them.

Storage of provenance in the Knowledge Graph

We present here the current schemas for representing (a) data analysis and (b) simulations in the Knowledge Graph. These schemas will need to be extended to cover neurorobotics simulations, and probably a more explicit representation of pipelines/workflows (the chaining together of multiple analysis / simulation stages) will be needed.

Tools for automated capture of provenance

on different systems:
- HPC systems
- neuromorphic systems
- Jupyter notebooks
- users' own computers
prospective/pre-emptive vs run-time provenance capture
capture of metadata vs capture of artefacts

Communication between computer systems and the KG

local cache and synchronization?

Provenance of simulation and data analysis workflows

Introduction

Standards

Storage of provenance in the Knowledge Graph

Tools for automated capture of provenance

Communication between computer systems and the KG

User interfaces for browsing, visualizing, and searching provenance information

Provenance of simulation and data analysis workflows