Technical details

Last modified by lzehl on 2021/07/05 18:57

openMINDS is designed as modular as possible, in order to facilitate extensions and maintenance of existing, as well as development and integration of new metadata models and schemas. The layout and technical requirements for this modularity are described below.

In parallel, openMINDS tries to consider the various programming skills present in the neuroscience research community. For this reason, openMINDS established an integration pipeline which gradually increases the level of technical detail: starting from a user-friendly, lightweight schema template and ending with established, highly technical metadata schema formats (e.g., JSON-Schema).

Please find below a documentation of the layout and requirements needed to keep the openMINDS modularity, the syntax of the openMINDS schema template, as well as the openMINDS integration pipeline.

The openMINDS umbrella

openMINDS is the overall umbrella for a set of integrated metadata models for describing neuroscience research products in graph databases. The correct integration of these metadata models as well as the provision of a central access point is handled by an integration pipeline. All metadata models are developed on separate GitHub repositories, in order to facilitate extensions and maintenance of existing, as well as development and integration of new openMINDS metadata models and schemas. The integration pipeline ensures that the central openMINDS GitHub repository ingests all these GitHub repositories as git-submodules, integrates the respective metadata models and builds the openMINDS GitHub pages, as well as ZIP files containing the respective openMINDS schemas in the supported formats, such as the openMINDS syntax (cf. below), JSON-Schema, or HTML. In the following we will summarize the contents and requirements for the central openMINDS GitHub repository as well as all metadata model git-submodules. The openMINDS integration pipeline will be handled in a separate chapter (cf. below).

Let us start with the central openMINDS GitHub repository which has a main branch, a documentation branch, and version branches (naming convention: vX; e.g., v1). Official releases (naming convention: vX.Y; e.g., v1.0) are tagged and provided as release packages.

The main branch hosts the general README, the LICENSE document, the CONTRIBUTING document, and the general openMINDS logo. In addition, it maintains the openMINDS vocabulary (vocab; cf. below) which provides general definitions and references for schema types and properties used across all openMINDS metadata models and their versions, as well as the bash script that builds the content of the documentation and version branches.

The documentation branch hosts the HTML files that build the openMINDS GitHub pages, as well as a ZIP file for each version branch and official release containing the respective openMINDS schemas in the currently supported formats, such as the openMINDS syntax (`.schema.tpl.json`; cf. below), JSON-Schema (`.schema.json`), or HTML (`.html`).

The version branches host the respective openMINDS schemas of a major version by ingesting the corresponding metadata models as git-submodules. Note that these version branches can have official release tags. If a version branch has an official release tag, only backwards compatible changes can be merged on this branch. This can include corrections of typos in instructions, introduction of additional properties to schemas, loosening constraints on expected value numbers or formats, granting additional relations between schemas, and adding new schemas (if they do not require relational changes in existing schemas). Except for typo corrections, these changes are typically tagged as sub-releases for the respective major version (e.g., v1.1). If a version branch does not have an official release tag, yet, also non-backwards compatible changes can be merged on this branch. This can include renaming of existing properties, increasing constraints on expected value numbers or formats, removing relations between schemas and adding new schemas, if they cause relational changes in existing schemas. In case all version branches have official release tags, a new non-backwards compatible change would lead to the creation of a new version branch (with a respectively increased major version number).

As mentioned already above, the setup of the central openMINDS GitHub repository is maintained by the openMINDS integration pipeline (cf. below). Note that the pipeline is configured in such a way, that each commit on one of the openMINDS submodules will trigger a new build of the central openMINDS repository ensuring that its content is always up-to-date.

For this to work smoothly for the existing, but also for all new openMINDS metadata models, the corresponding openMINDS submodules (GitHub repositories) have to meet the following requirements:

(1) The openMINDS metadata model has to be located on a public GitHub repository and published under an MIT license.

(2) The GitHub repository of such a metadata model should have at least one version branch (naming convention: vX, where X is a major version number).

(3) The version branch should have the following folders & files:

schemas/ (required) - contains the schemas of the respective metadata model implemented in the reduced openMINDS syntax (cf. below). The sub-directory of the "schemas" folder can be further structured or flat.
tests/ (recommended) - contains test-instances (JSON-LDs) for checking the constraints defined in the schemas of the respective metadata model. The sub-directory of the "tests" folder should follow the same structure as the "schemas" folder with an additional sub-directory for each schema. The file names of the test-instances should be written in lowerCamelCase and state first the name of the respectively tested schema and second, separated with an underscore, what schema constrain is tested (e.g., contactInformation_validEmail.jsonld). If a test-instance is expected to fail the schema validation the file name should receive the postfix "_nok" (e.g., contactInformation_invalidEmail_nok.jsonld).
examples/ (recommended) - contains examples for valid instance collections for the respective metadata model. Each example should receive its own directory (folder) with a README.md describing the example, and an metadataCollection subfolder containing the openMINDS instances (JSON-LDs). This subfolder can be further structured or flat.
img/ (optional) - contains typically the logo of the openMINDS submodule.
instances/ (optional) - contains the controlled metadata instances (JSON-LDs) for selected schemas of the respective metadata model. The sub-directory of the "instances" folder should follow the same structure as the "schemas" folder with an additional sub-directory for each schema. The file names of the controlled instances should be written in lowerCamelCase and state the simple, human-readable identifier of the instance (e.g., homoSapiens.jsonld for the controlledTerms schema Species).
version.txt (required) - states the identifier of the version branch (e.g., v1) of the respective metadata model.
README.md (required) - contains a short content description of the respective metadata model.
LICENSE.txt (required) - defines the MIT license for the respective metadata model.

For more information on the content of the existing openMINDS metadata models, please go to Metadata models & schemas.

The openMINDS vocabulary

Located under the folder vocab in the main branch of the central openMINDS GitHub directory, the openMINDS vocabulary is semi-automatically gathered and stored in dedicated JSON files (types.json and properties.json). The openMINDS integration pipeline makes sure that both files are updated with each commit to any of the GitHub repositories for the openMINDS metadata models. With that, the openMINDS vocab reflects always an up-to-date status of the general attributes of existing schema types and properties across all openMINDS metadata models, while providing the opportunity to centrally review and maintain their consistency. In addition, this design allows us to centrally define and maintain multiple references to related schemas and matching schema properties of other metadata initiatives. How this works in detail is explained in the following.

The types.json file is an associative array listing all existing openMINDS schema types. For each openMINDS schema type, a small list of general attributes are provided in a nested associative array. Currently, the following attributes are captured:

{
"https://openminds.ebrains.eu/«METADATA_MODEL_LABEL»/«SCHEMA_NAME»": {
   "description": "«GENERAL_DESCRIPTION»",
   "label": "«HUMAN-READABLE_LABEL»",
   "name": "«SCHEMA_NAME»",
   "schemas": [
     "«RELATIVE_PATH_TO_SCHEMA_FILE_OF_THAT_TYPE»"
    ],
   "translatableTo": [
     "«REFERENCE_TO_RELATED_SCHEMA_OF_OTHER_INITIATIVE»"
    ]
  }
}

With each new schema committed to one of the openMINDS metadata models, a new entry is appended to the types.json file, with the values for "name", "label", and "schemas" automatically derived. The remaining attributes are predefined with a null value and frequently, manually edited by a corresponding expert of the openMINDS development team. If necessary the auto-derived "label" value can be edited as well. All manual editions will be preserved and not overwritten when the file is updated again with a new commit. In case a schema is deleted from the openMINDS metadata models, the corresponding entry in the types.json file is marked as being deprecated (additional attribute-value pair; "deprecated": true). It only can be permanently removed from the types.json file, if the entry is manually deleted.

Similar to the types.json file, the properties.json file is an associative array listing all properties across all existing openMINDS schemas. For each openMINDS property, a small list of general attributes are provided in a nested associative array. Currently, the following attributes are captured:

{
"https://openminds.ebrains.eu/vocab/«PROPERTY_NAME»": {
   "description": "«GENERAL_DESCRIPTION",
   "label": "«HUMAN-READABLE_LABEL",
   "labelForReverseLink": "«HUMAN-READABLE_LABEL_OF_REVERSED_LINK",
   "name": "«PROPERTY_NAME»",
   "sameAs": [
     "«REFERENCE_TO_MATCHING_SCHEMA-PROPERTY_OF_OTHER_INITIATIVE"
    ],
   "schemas": [
     "«RELATIVE_PATH_TO_SCHEMA_FILE_CONTAINING_THIS_PROPERTY»"
    ]
  }
}

With each new property committed to a schema of one of the openMINDS metadata models, a new entry is appended to the properties.json file, with the "name", "label" and "labelForReversedLink" automatically derived. The remaining attributes are predefined with a null value and frequently, manually edited by a corresponding expert of the openMINDS development team. If necessary the auto-derived values for "label" and "labelForReversedLink" can be edited as well. All manual editions will be preserved and not overwritten when the file is updated again with a new commit. In case a property is not used anymore in any of the schemas from the openMINDS metadata models, the corresponding entry in the properties.json file is marked as being deprecated (additional attribute-value pair; "deprecated": true). It only can be permanently removed from the properties.json file, if the entry is manually deleted.

The openMINDS syntax

All openMINDS metadata models are defined using a light-weighted schema syntax. Although this schema syntax is inspired by JSON-Schema, it outsources most schema technicalities to be handled by the openMINDS integration pipeline, making the openMINDS schemas more human-readable, especially for untrained eyes.

The few remaining customized technical properties which need additional interpretation or translation to a formal schema languages (e.g. JSON-Schema) have an underscore as prefix (e.g., "_type"). Within the openMINDS integration pipeline (cf. below), the schema template syntax is interpreted, extended and flexibly translated to various formal schema languages. All further specifications of the openMINDS schema template syntax are described below.

Basic openMINDS schema structure

All openMINDS schemas need to have the extension .schema.tpl.json and each schema is defined as a nested associative array (dictionary) with the following conceptual structure:

{
"_type": "https://openminds.ebrains.eu/LABEL_OF_METADATA_MODEL/SCHEMA_NAME",
"properties": {
   "PROPERTY_NAME": {
     "type": "DATA_TYPE",
     "_instruction": "METADATA_ENTRY_INSTRUCTION"
    }
  },
"required": [
   "PROPERTY_NAME"
  ]
}

"_type" defines the schema type (or namespace) with the depicted naming convention, where the label of the respective openMINDS metadata model (e.g., "core") and the schema name (format: UpperCamelCase; e.g. "ContactInformation") have to be specified. Obviously, the schema name should be meaningful and provide some insides into what metadata content the schema covers.

Under "properties" a nested associative array is defined, where each key defines the property name (format: lowerCamelCase; e.g. "givenName"). The corresponding value is again a nested associative array defining the expected data "type" (cf. below) and the "_instructions" for entering the correct metadata for the respective property.

Under "required" a list of property names can be provided that are obligatory to be present in a correctly instantiated metadata instance of the respective schema. If none of the properties are required, this key-value pair does not have to be specified. Note that within openMINDS, it is assumed that only the stated properties are allowed (additional undefined properties are prohibited per default).

Schemas extending a context-schema

In the case that several schemas are highly related and contain a common set of properties, it is possible to define a non-type context-schema with these common properties that can be extended and modified by the group of related schemas.

All properties and constraints (e.g. required properties, expected data types) defined in the context-schema are passed on to the schemas extending this context-schema. Each of these schemas can define additional properties, or (if necessary) can overwrite the constraints of the context-schema (incl. "_instructions"). In order to state that a schema is extending a context-schema, the following additional key-value pair has to be added to the schema template above:

"_extends": "RELATIVE_PATH_TO_OPENMINDS-CONTEXT-SCHEMA"

This design not only makes it easier to identify highly related schemas, but also facilitates the maintenance of the commonly used properties. A good hands-on example, is the context-schema ResearchProduct which is extended by the following schema set: Dataset, MetaDataModel, Model, and Software.

Data type depending constraints

Depending on the expected data "type" additional constraints can be made for the metadata entry of a respective property. Currently, the openMINDS schema template syntax supports the following data types: "string", "number" (integer or float), "integer", "float", "boolean", "object" or "array". Except for "boolean", all these data types can have additional constraints. The essential constraints will be summarized in the following (cf. JSON-Schema specifications 7.0 for more).

If the expected data "type" is a "string" the expected number of characters, the format or a regular expression pattern of the string can be further defined. Here abstract examples for all possible string constraints:

{
"properties": {
   "stringProperty_noConstraints": {
     "type": "string",
     "_instruction": "Enter a free text."
    },
   "stringProperty_lengthConstraints": {
     "maxLength": 6,
     "minLength": 2,
     "type": "string",
     "_instruction": "Enter a free text (allowed numbers of characters: 2 - 6)."
    },
   "stringProperty_formatConstraints": {
     "type": "string",
     "_formats": [
       "email",
       "date",
       "time",
       "date-time",
       "iri"
      ]
     "_instruction": "Enter a string matching one of the given formats."
    },
   "stringProperty_patternConstraints": {
     "pattern": "«regular_expression_ECMA_262_dialect»"
     "type": "string",
     "_instruction": "Enter a string matching the given regex pattern."
    }
  }
}

If the expected data "type" is an "integer" or a "number" (float or integer) the expected range or multiples can be further defined. Here abstract examples for all possible "integer" and "number" constraints (not that both constraints can be defined for both data types):

{
"properties": {
   "integerProperty_noConstraints": {
     "type": "integer",
     "_instruction": "Enter an integer."
    },
   "integerProperty_rangeConstraints": {
     "maximum": 50,
     "minimum": 10,
     "type": "integer",
     "_instruction": "Enter an integer equal or between 10 and 50."
    },
   "numberProperty_noConstraints": {
     "type": "number",
     "_instruction": "Enter a number (float or integer)."
    },
   "numberProperty_multipleOfConstraints": {
     "multipleOf": 10.5,
     "type": "number",
     "_instruction": "Enter any number which is a multiple of 10.5."
    }
  }
}

If the expected data "type" is an "object" the expected schema type needs to be defined, as well as if the object is linked or embedded. Note that linked objects can exist by themselves. In contrast embedded objects depend on the existence of their parent schema (if the parent schema is deleted, the embedded objects will be deleted as well). Here abstract examples for all possible "object" constraints:

{
"properties": {
   "objectProperty_linked": {
     "_linkedTypes": [
       "«SCHEMA_TYPE»"
      ],
     "_instruction": "Add the link to an instance conform with the given schema types."
    },
   "objectProperty_embedded": {
     "_embeddedTypes": [
       "«SCHEMA_TYPE»"
      ],
     "_instruction": "Enter an instance conform with the given schema types."
    }
  }
}

If the expected data "type" is an "array" the expected data type of the items in the array, as well as the expected length of the array can be further defined. Valid data types for items are "string", "number", "integer", "float", "boolean", and/or "object". In addition, items can also be defined as n-tuples with expected data types. Note that any of the above data type depending constraints can be also applied to respective items. Note that within openMINDS, it is assumed that only the stated item types are allowed (additional unconstrained items are prohibited per default). All array constraints can be applied to all item types. Here abstract examples for all possible "array" constraints:

{
"properties": {
   "arrayProperty_noConstraints": {
     "type": "array",
     "_instruction": "Add at least one item of any data type."
    },
   "arrayProperty_itemsOfTypeInteger": {
     "type": "array",
     "items": {
       "type": "integer"
      },
     "_instruction": "Add at least one item of data type integer."
    },
   "arrayProperty_uniqueItemsOfTypeString": {
     "type": "array",
     "items": {
       "type": "string"
      },
     "uniqueItems": true,
     "_instruction": "Add unique items of data type string."
    },
   "arrayProperty_itemsOfTypeNumber_constrainedArrayLength": {
     "type": "array",
     "items": {
       "type": "number"
      },
     "maxItems": 3,
     "minItems": 2,
     "_instruction": "Add 2 or 3 items of data type number."
    },
   "arrayProperty_objectArray": {
     "type": "array",
     "_linkedTypes": [
       "«SCHEMA_TYPE»"
      ],
     "_instruction": "Add at least one link to an instance conform with the given schema types."
    },
   "arrayProperty_tuplesWithDefinedDataTypes": {
     "type": "array",
     "items": [
        {"type": "string"},
        {"type": "integer"}
      ],
     "_instruction": "Add at least one 2-tuple with data type string and integer."
    }
  }
}

The openMINDS integration pipeline

(coming soon) If you'd like to learn more about the openMINDS integration pipeline, especially if you'd like to contribute to it, please get in touch with us via the issues on the openMINDS_generator GitHub or the support email: openminds@ebrains.eu