Wiki source code of Provenance of simulation and data analysis workflows

Version 7.1 by adavison on 2020/08/05 08:54

author	version	line-number	content
		1	== Introduction ==
		2
		3	Computational provenance is a record of all the steps in a computational scientific workflow, including the code that was run, input data, the computational environment (hardware, OS, compiler versions, library version...), the person who performed each step, and output data.
		4
		5	Capturing computational provenance facilitates:
		6
		7	* reproducibility of results
		8	* management and tracking of workflows/projects by the scientists/engineers involved
		9	* evaluation/review by other scientists and engineers
		10
		11
		12
		13	== Standards ==
		14
		15	The [[W3C PROV standard>>https://www.w3.org/TR/2013/NOTE-prov-overview-20130430/\|\|rel=" noopener noreferrer" target="_blank"]] provides a data model and related tools for provenance interchange on the web. The following diagram shows the three base classes of the PROV data model: Entity, Activity, and Agent. These three classes form the basis for the representation of provenance in the EBRAINS Knowledge Graph: every node in the KG has a type which is a subclass of one of these base classes.
		16
		17	[[image:starting-points.svg\|\|alt="The three Starting Point classes of the W3C PROV ontology and the properties that relate them."]]
		18
		19	== Storage of provenance in the Knowledge Graph ==
		20
		21
		22	== Tools for automated capture of provenance ==
		23
		24	* on different systems:
		25	** HPC systems
		26	** neuromorphic systems
		27	** Jupyter notebooks
		28	** users' own computers
		29	* prospective/pre-emptive vs run-time provenance capture
		30	* capture of metadata vs capture of artefacts
		31
		32	== Communication between computer systems and the KG ==
		33
		34	* local cache and synchronization?
		35
		36
		37	== User interfaces for browsing, visualizing, and searching provenance information ==
		38
		39