Version 7.1 by adavison on 2020/08/05 08:54

Show last authors
1 == Introduction ==
2
3 Computational provenance is a record of all the steps in a computational scientific workflow, including the code that was run, input data, the computational environment (hardware, OS, compiler versions, library version...), the person who performed each step, and output data.
4
5 Capturing computational provenance facilitates:
6
7 * reproducibility of results
8 * management and tracking of workflows/projects by the scientists/engineers involved
9 * evaluation/review by other scientists and engineers
10
11
12
13 == Standards ==
14
15 The [[W3C PROV standard>>https://www.w3.org/TR/2013/NOTE-prov-overview-20130430/||rel=" noopener noreferrer" target="_blank"]] provides a data model and related tools for provenance interchange on the web. The following diagram shows the three base classes of the PROV data model: Entity, Activity, and Agent. These three classes form the basis for the representation of provenance in the EBRAINS Knowledge Graph: every node in the KG has a type which is a subclass of one of these base classes.
16
17 [[image:starting-points.svg||alt="The three Starting Point classes of the W3C PROV ontology and the properties that relate them."]]
18
19 == Storage of provenance in the Knowledge Graph ==
20
21
22 == Tools for automated capture of provenance ==
23
24 * on different systems:
25 ** HPC systems
26 ** neuromorphic systems
27 ** Jupyter notebooks
28 ** users' own computers
29 * prospective/pre-emptive vs run-time provenance capture
30 * capture of metadata vs capture of artefacts
31
32 == Communication between computer systems and the KG ==
33
34 * local cache and synchronization?
35
36
37 == User interfaces for browsing, visualizing, and searching provenance information ==
38
39