Last modified by adavison on 2022/05/23 22:24

Hide last authors
adavison 4.2 1 == Introduction ==
adavison 1.1 2
adavison 7.1 3 Computational provenance is a record of all the steps in a computational scientific workflow, including the code that was run, input data, the computational environment (hardware, OS, compiler versions, library version...), the person who performed each step, and output data.
adavison 1.1 4
adavison 4.2 5 Capturing computational provenance facilitates:
adavison 1.1 6
adavison 4.2 7 * reproducibility of results
8 * management and tracking of workflows/projects by the scientists/engineers involved
9 * evaluation/review by other scientists and engineers
adavison 1.1 10
adavison 4.2 11 == Standards ==
adavison 1.1 12
adavison 7.2 13 The [[W3C PROV standard>>https://www.w3.org/TR/2013/NOTE-prov-overview-20130430/||rel="noopener noreferrer" target="_blank"]] provides a data model and related tools for provenance interchange on the web. The following diagram shows the three base classes of the PROV data model: Entity, Activity, and Agent. These three classes form the basis for the representation of provenance in the EBRAINS Knowledge Graph: every node in the KG has a type which is a subclass of one of these base classes.
adavison 1.1 14
adavison 7.1 15 [[image:starting-points.svg||alt="The three Starting Point classes of the W3C PROV ontology and the properties that relate them."]]
adavison 1.1 16
adavison 4.2 17 == Storage of provenance in the Knowledge Graph ==
18
adavison 7.2 19 We present here the current schemas for representing (a) data analysis and (b) simulations in the Knowledge Graph. These schemas will need to be extended to cover neurorobotics simulations, and probably a more explicit representation of pipelines/workflows (the chaining together of multiple analysis / simulation stages) will be needed.
adavison 4.2 20
adavison 12.1 21 [[image:Workflow provenance in the EBRAINS KG-2.png||alt="KG schema for data analysis"]]
adavison 10.2 22
23 [[image:Workflow provenance in the EBRAINS KG.png||alt="KG schema for simulation"]]
24
25 (note that the diagrams do not show Agents; the person who launched each analysis/simulation activity is linked to the activity with a ##wasAssociatedWith## connection).
26
adavison 10.3 27 (% class="box warningmessage" %)
28 (((
29 TODO: insert or link to the detailed schemas for each type
30 )))
31
adavison 4.2 32 == Tools for automated capture of provenance ==
33
adavison 10.5 34
35 Issues to discuss:
36
adavison 4.2 37 * on different systems:
38 ** HPC systems
39 ** neuromorphic systems
40 ** Jupyter notebooks
41 ** users' own computers
42 * prospective/pre-emptive vs run-time provenance capture
43 * capture of metadata vs capture of artefacts
44
45 == Communication between computer systems and the KG ==
46
adavison 10.4 47 Two issues arise:
adavison 4.2 48
adavison 10.4 49 (i) fine-grained provenance information may need to be obtained on compute nodes, which may not have network access;
50
51 (ii) failures of provenance upload should not cause the workflows to fail;
52
53 An overall solution for both of these issues would perhaps involve a local cache and later synchronization.
54
adavison 4.2 55 == User interfaces for browsing, visualizing, and searching provenance information ==
56
adavison 10.4 57 (% class="box infomessage" %)
58 (((
59 DISCUSSION NEEDED: integrate visualization of prov information into KG Search UI, and/or develop separate app?
60 )))