Introduction
Computational provenance is a record of all the steps in a computational scientific workflow, including the code that was run, input data, the computational environment (hardware, OS, compiler versions, library version...), the person who performed each step, and output data.
Capturing computational provenance facilitates:
- reproducibility of results
- management and tracking of workflows/projects by the scientists/engineers involved
- evaluation/review by other scientists and engineers
Standards
The W3C PROV standard provides a data model and related tools for provenance interchange on the web. The following diagram shows the three base classes of the PROV data model: Entity, Activity, and Agent. These three classes form the basis for the representation of provenance in the EBRAINS Knowledge Graph: every node in the KG has a type which is a subclass of one of these base classes.
Storage of provenance in the Knowledge Graph
We present here the current schemas for representing (a) data analysis and (b) simulations in the Knowledge Graph. These schemas will need to be extended to cover neurorobotics simulations, and probably a more explicit representation of pipelines/workflows (the chaining together of multiple analysis / simulation stages) will be needed.
Tools for automated capture of provenance
- on different systems:
- HPC systems
- neuromorphic systems
- Jupyter notebooks
- users' own computers
- prospective/pre-emptive vs run-time provenance capture
- capture of metadata vs capture of artefacts
Communication between computer systems and the KG
- local cache and synchronization?
User interfaces for browsing, visualizing, and searching provenance information