Version 10.1 by adavison on 2020/08/05 11:42

Hide last authors
adavison 4.2 1 == Introduction ==
adavison 1.1 2
adavison 7.1 3 Computational provenance is a record of all the steps in a computational scientific workflow, including the code that was run, input data, the computational environment (hardware, OS, compiler versions, library version...), the person who performed each step, and output data.
adavison 1.1 4
adavison 4.2 5 Capturing computational provenance facilitates:
adavison 1.1 6
adavison 4.2 7 * reproducibility of results
8 * management and tracking of workflows/projects by the scientists/engineers involved
9 * evaluation/review by other scientists and engineers
adavison 1.1 10
11
adavison 4.2 12 == Standards ==
adavison 1.1 13
adavison 7.2 14 The [[W3C PROV standard>>https://www.w3.org/TR/2013/NOTE-prov-overview-20130430/||rel="noopener noreferrer" target="_blank"]] provides a data model and related tools for provenance interchange on the web. The following diagram shows the three base classes of the PROV data model: Entity, Activity, and Agent. These three classes form the basis for the representation of provenance in the EBRAINS Knowledge Graph: every node in the KG has a type which is a subclass of one of these base classes.
adavison 1.1 15
adavison 7.1 16 [[image:starting-points.svg||alt="The three Starting Point classes of the W3C PROV ontology and the properties that relate them."]]
adavison 1.1 17
adavison 4.2 18 == Storage of provenance in the Knowledge Graph ==
19
adavison 7.2 20 We present here the current schemas for representing (a) data analysis and (b) simulations in the Knowledge Graph. These schemas will need to be extended to cover neurorobotics simulations, and probably a more explicit representation of pipelines/workflows (the chaining together of multiple analysis / simulation stages) will be needed.
adavison 4.2 21
22 == Tools for automated capture of provenance ==
23
24 * on different systems:
25 ** HPC systems
26 ** neuromorphic systems
27 ** Jupyter notebooks
28 ** users' own computers
29 * prospective/pre-emptive vs run-time provenance capture
30 * capture of metadata vs capture of artefacts
31
32 == Communication between computer systems and the KG ==
33
34 * local cache and synchronization?
35
36 == User interfaces for browsing, visualizing, and searching provenance information ==
37
adavison 1.1 38