Version 10.2 by adavison on 2020/08/05 11:44

Hide last authors
adavison 4.2 1 == Introduction ==
adavison 1.1 2
adavison 7.1 3 Computational provenance is a record of all the steps in a computational scientific workflow, including the code that was run, input data, the computational environment (hardware, OS, compiler versions, library version...), the person who performed each step, and output data.
adavison 1.1 4
adavison 4.2 5 Capturing computational provenance facilitates:
adavison 1.1 6
adavison 4.2 7 * reproducibility of results
8 * management and tracking of workflows/projects by the scientists/engineers involved
9 * evaluation/review by other scientists and engineers
adavison 1.1 10
11
adavison 4.2 12 == Standards ==
adavison 1.1 13
adavison 7.2 14 The [[W3C PROV standard>>https://www.w3.org/TR/2013/NOTE-prov-overview-20130430/||rel="noopener noreferrer" target="_blank"]] provides a data model and related tools for provenance interchange on the web. The following diagram shows the three base classes of the PROV data model: Entity, Activity, and Agent. These three classes form the basis for the representation of provenance in the EBRAINS Knowledge Graph: every node in the KG has a type which is a subclass of one of these base classes.
adavison 1.1 15
adavison 7.1 16 [[image:starting-points.svg||alt="The three Starting Point classes of the W3C PROV ontology and the properties that relate them."]]
adavison 1.1 17
adavison 4.2 18 == Storage of provenance in the Knowledge Graph ==
19
adavison 7.2 20 We present here the current schemas for representing (a) data analysis and (b) simulations in the Knowledge Graph. These schemas will need to be extended to cover neurorobotics simulations, and probably a more explicit representation of pipelines/workflows (the chaining together of multiple analysis / simulation stages) will be needed.
adavison 4.2 21
adavison 10.2 22 [[image:Workflow provenance in the EBRAINS KG.svg||alt="KG schema for data analysis"]][[image:Workflow provenance in the EBRAINS KG-2.png||alt="KG schema for data analysis"]]
23
24 [[image:Workflow provenance in the EBRAINS KG.png||alt="KG schema for simulation"]]
25
26 (note that the diagrams do not show Agents; the person who launched each analysis/simulation activity is linked to the activity with a ##wasAssociatedWith## connection).
27
adavison 4.2 28 == Tools for automated capture of provenance ==
29
30 * on different systems:
31 ** HPC systems
32 ** neuromorphic systems
33 ** Jupyter notebooks
34 ** users' own computers
35 * prospective/pre-emptive vs run-time provenance capture
36 * capture of metadata vs capture of artefacts
37
38 == Communication between computer systems and the KG ==
39
40 * local cache and synchronization?
41
42 == User interfaces for browsing, visualizing, and searching provenance information ==
43
adavison 1.1 44