Entity Identifiers and URI Minting
Introduction
Linked Data (LD) relies on having a shared way to reference the same concept or entity, such as a person, place, document, category, or event. To achieve this, LD resources are uniquely identified by their Uniform Resource Identifier (URI). Each URI is unique to a single concept or entity and should be accessible over the web.
Many URIs look like web addresses or URLs, and in the context of LINCS, this is the primary form they take. However, while URI formats may be the same as a URL, a URI does not necessarily need to resolve to a web page or have a network location; all URLs are URIs but not all URIs are URLs.
In practice, not all URLs are suitable as URIs for the purposes of Linked Open Data (LOD), since some are more persistent and authoritative than others. For instance, using the URL for a person’s LinkedIn profile is not as desirable as using an ORCID identifier. Using a link to a downloadable PDF of an article from a faculty member’s departmental web page is not good practice compared to using a Digital Object Identifier (DOI) for the same article. If a URI does resolve to a human-readable webpage, this means that it is a “dereferenced” URI.
Learn More about URIs
To learn more about URIs, see the below resources:
- LINCS Glossary
- Linked Open Data Basics
- Europeana URI Document
- CHIN GitHub Issue Ticket
- W3C Good URIs
- W3C Cool URIs
- BnF @ SWIB19
- Ruben Verborgh, “Web Fundamentals: The Semantic Web & Linked Data”
Types of URIs
When LINCS transforms a dataset into LOD, entities are identified in the source data and are prepared for transformation into LOD resources. These resources require a canonical URI for identification and access. Wherever possible, LINCS will reuse existing URIs from external vocabularies and authorities by matching, or “matching entities”, entities in the source data with appropriate URIs. When an external identifier for a resource does not exist, a URI must be created, or “minted,” to represent the resource.
URI Scenarios
Transforming data using LINCS involves one or more URI scenarios.
Matching URI Exists
When a URI exists that matches the entity, LINCS does not need to produce a new URI because pre-existing URIs are easy to integrate into the LINCS knowledge graph. Management of that URI stays with the authority that created the URI.
To work well with LINCS tools, URIs should follow standard LOD practices, such as having a label preferably in both French and English.
See the LINCS Entity Matching Guide for guidance on choosing external URIs.
No Matching URI Exists; Data Holder Mints a URI
When no matching URI exists that matches the entity, a data holder (typically an organization or large ongoing research project) may mint their own URI according to standard LOD practices.
Data holders may negotiate with a third-party organization that maintains a vocabulary or authority file to add the required term and assign it a URI, or they may elect to use a third-party service, such as Wikidata, to mint the URI. In all such cases, the URI will use an external namespace. The URI creator is responsible for the management of the URI.
See the LINCS Entity Matching Guide for details.
No Matching URI Exists; LINCS Mints a a URI
Data contributors may give LINCS the authority to create new URIs if there is no matching existing URI. These URIs are under a LINCS namespace, using a LINCS-generated identifier.