Semantic Data Charter (SDC) Specification

Version 1.0 - October 20, 2025

This version: https://semanticdatacharter.com/spec/sdc4/
Editor: Timothy W. Cook, Axius-SDC, Inc.
Copyright © 2025 Axius-SDC, Inc.

Abstract

The Semantic Data Charter (SDC) is a framework for creating semantically rich, machine-readable, and interoperable data models. It provides a set of principles and a reference model for defining data with clear, unambiguous meaning, ensuring that data is not only structured but also self-describing. This specification details the SDC Reference Model (RM), the core data types, and provides guidance on creating domain-specific data models.

1. Introduction

In an increasingly data-driven world, the ability to share, understand, and reuse data across different systems and domains is paramount. Traditional data modeling approaches often focus on data structure, leaving meaning implicit and open to misinterpretation. The Semantic Data Charter addresses this challenge by providing a framework for creating data models that are both structurally sound and semantically explicit.

The SDC is founded on three core pillars:

  • Enforce Governance: Establish a formal, machine-readable contract for your data.
  • Embed Meaning: Link your data to a universal business vocabulary.
  • Mandate Quality: Formally define rules for handling imperfect data.

This specification provides the technical details of the SDC Reference Model, which is implemented in XML Schema (XSD) and OWL (Web Ontology Language).

2. Standards Compliance and Enabling

SDC4 is built upon and leverages established open standards from the World Wide Web Consortium (W3C) to ensure broad interoperability, machine readability, and semantic richness. It is designed not to replace existing domain-specific standards but to provide a foundational, unifying layer that enhances their capabilities, particularly in areas of data governance, semantic grounding, and data quality.

2.1. Foundational W3C Standards

SDC4 directly utilizes and aligns with the following core W3C standards:

  • XML Schema (XSD): The structural backbone of SDC4. All SDC Reference Models and derived Data Models are defined using XSD, providing a robust, widely understood mechanism for defining data structures, data types, and constraints.
  • Resource Description Framework (RDF): SDC4's approach to embedding semantic meaning is fully compatible with RDF. The explicit labeling of model components, linked to universal business vocabularies, allows for easy transformation of SDC4 data into RDF triples, making it queryable via SPARQL and integrable into knowledge graphs.
  • Web Ontology Language (OWL): The semantic models underlying SDC4, particularly the universal business vocabulary and relationships between concepts, are expressible in OWL.
  • SPARQL Protocol and RDF Query Language (SPARQL): Because SDC4 data can be readily mapped to RDF, it becomes queryable using SPARQL, allowing for complex federated queries across diverse datasets.

2.2. Enabling Industry-Specific Standards

SDC4 acts as a "standard for standards," providing a meta-framework that complements and enhances existing industry data standards by addressing common challenges that often fall outside their primary scope of defining specific data elements.

Key areas where SDC4 enables and strengthens industry standards include:

  • Healthcare (e.g., HL7 FHIR): While FHIR defines clinical data resources, SDC4 can standardize the governance and provenance metadata around FHIR resources.
  • Financial Services (e.g., ISO 20022, ACORD): SDC4 can provide a common semantic layer that links elements from these standards to a unified enterprise business vocabulary.
  • Privacy Regulations (e.g., PIPEDA, GDPR, CCPA): SDC4's granular XdAnyType includes access control tags and robust provenance for explicit, machine-readable declaration of data sensitivity.
  • Regulatory Reporting (e.g., SEC EDGAR, XBRL): SDC4 can ensure the underlying factual data being tagged is semantically unambiguous and includes robust quality and governance metadata.

3. Conformance

Conformance to the Semantic Data Charter is defined at two levels:

  • SDC Reference Model Conformance: A data model conforms to the SDC Reference Model if its defining XML Schema is a valid xsd:restriction of the SDC Reference Model (sdc4.xsd). Data models MUST NOT use xsd:extension to add new elements or attributes to the SDC types.
  • SDC Principles Conformance: An organization's data practices conform to the SDC principles if they adhere to the governance, meaning, and quality pillars outlined in this document.

4. Architecture

The SDC architecture is composed of the following key components:

  • Reference Model (RM): A set of core data types and structures that serve as the building blocks for all SDC-compliant data models.
  • Data Models (DMs): Domain-specific models created by constraining the components of the Reference Model.
  • Model Components (MCs): The individual building blocks within a Data Model, derived from the RM components.
  • Semantic Annotations: Metadata embedded within the data models that provide context and meaning.

4.1. Core Data Types

The SDC Reference Model provides a rich set of extended data types (Xd* types) that go beyond the standard XML Schema data types:

  • XdAnyType: The base type for all SDC extended data types.
  • XdStringType, XdTokenType: For textual data.
  • XdBooleanType: For boolean values.
  • XdCountType, XdQuantityType, XdFloatType, XdDoubleType: For numeric data.
  • XdTemporalType: For date and time information.
  • XdLinkType: For creating relationships between data models.
  • XdFileType: For embedding or referencing binary data.
  • ClusterType: For grouping related data elements.
  • XdAdapterType: For adapting any Xd* type for use within a ClusterType.

4.2. Governance and Provenance Components

The SDC Reference Model includes a comprehensive set of components for capturing data governance, provenance, and audit information directly within the data model.

4.2.1. Governance

Governance components define the actors, roles, and responsibilities associated with the data:

  • PartyType: Represents an actor involved with the data (person, organization, device, or software application).
  • ParticipationType: Describes the role of a PartyType in a specific activity.
  • AttestationType: Provides a formal mechanism for a party to attest to the data's content.
  • AuditType: Captures a detailed audit trail of every system and user that has interacted with the data.
  • Access Control: The model supports fine-grained control via the act (Access Control Tag) element.

4.2.2. Provenance

Provenance components provide a complete history of the data's origin, creation, and modification:

  • Instance Metadata: The root DMType contains key provenance fields: instance_id, instance_version, creation_timestamp, subject, and provider.
  • Temporal Validity: Every data element (XdAnyType) contains optional timestamp fields: vtb (Valid Time Begin), vte (Valid Time End), tr (Time Recorded), and modified.

4.3. The Data Model (DM) Wrapper and Semantic Grounding

A critical distinction exists between the metadata for a data instance and the metadata for the data model definition itself.

4.3.1. The DMType as an Instance Wrapper

The DMType serves as the root element for every SDC data instance. Its primary role is to be a container for the data payload and the provenance and governance metadata of that specific instance.

4.3.2. Semantic Grounding of the Data Model Definition

The semantic meaning and descriptive metadata of the data model definition (i.e., the schema) are established separately. This definition-level metadata is typically embedded directly within the schema files using xsd:appinfo and RDF annotations.

4.4. Model Component Reusability and Immutability

A cornerstone of the SDC architecture is the ability to create reusable and immutable Model Components (MCs). This is achieved through a specific pattern of schema definition.

When a domain-specific structure is needed, it is defined by creating a new xsd:complexType that restricts a base type from the SDC Reference Model (e.g., sdc4:ClusterType). This new complexType is given a unique and permanent name using a Collision-Resistant Unique Identifier (CUID2), prefixed with mc-, such as mc-clj5x1g8f000008l09j7f6c3d.

This mechanism provides three key advantages:

  1. Immutability and Reuse: The mc-<CUID2> component can be imported and reused across multiple Data Models.
  2. Consistent Querying: An application can reliably query for all instances of a specific element, knowing it will always find the same structure.
  3. Separation of Structure and Semantics: This pattern decouples the immutable physical structure from its semantic meaning.

4.5. Data Longevity and Migration Avoidance

Key Architectural Feature: Permanence

This section describes how SDC4 achieves zero-migration data longevity through CUID-based immutability and namespace versioning.

A significant challenge in traditional data management is the need for costly, complex, and error-prone data migrations when underlying schemas evolve or applications are replaced. The SDC4 architecture is designed to address this issue at its core and promote data longevity.

The combination of:

  1. A Stable Reference Model: Providing a consistent set of base types and structural patterns.
  2. Immutable Structural Components: Using CUIDs (mc-<cuid> and ms-<cuid>) to define reusable data structures whose physical shape is guaranteed not to change.
  3. Explicit, Decoupled Semantics: Embedding the meaning via fixed label elements and optional RDF annotations, rather than relying on structural element names.

means that data created according to an SDC4 Data Model remains inherently understandable and processable over time.

New applications or future versions of existing applications can interpret historical SDC4 data because:

  • The structure associated with a given ms-<cuid> is permanently defined by its corresponding mc-<cuid> restriction in the schema.
  • The meaning of that structure is explicitly stated by the fixed label (and potentially richer RDF annotations) within that schema definition.

Unlike traditional approaches, which often break backward compatibility and require data transformation when element names or nesting structures change, SDC4 isolates structural definition from semantic interpretation. As long as the SDC Reference Model remains stable and the CUIDs persist, the data remains usable. This significantly reduces or potentially eliminates the need for large-scale data migrations, lowering long-term data management costs and preserving the value of historical data assets.

5. Modeling Examples

Note: The following XML snippets are instance documents. They are based on data model schemas where CUID-named elements (ms-... and dm-...) define the data structure. The semantic meaning, such as "Blood Pressure" or "Systolic", is not present in the instance-label element; instead, it is a fixed attribute in the schema definition of the corresponding component.

5.1. Healthcare: Patient Vital Signs

<dm-h7k2x... xmlns:sdc4="https://semanticdatacharter.com/ns/sdc4/">
    <sdc4:dm-label>Patient Vitals</sdc4:dm-label>
    <sdc4:dm-language>en-US</sdc4:dm-language>
    <sdc4:dm-encoding>UTF-8</sdc4:dm-encoding>

    <sdc4:ms-clj5x1g8f000008l09j7f6c3d>
        <sdc4:ms-clj5x2p4k000108l01a2b3c4d>
            <sdc4:xdquantity-value>120</sdc4:xdquantity-value>
            <sdc4:xdquantity-units>
                <sdc4:xdstring-value>mmHg</sdc4:xdstring-value>
            </sdc4:xdquantity-units>
        </sdc4:ms-clj5x2p4k000108l01a2b3c4d>
    </sdc4:ms-clj5x1g8f000008l09j7f6c3d>
</dm-h7k2x...>

5.2. Agriculture: Soil Moisture Levels

<dm-a4g9y... xmlns:sdc4="https://semanticdatacharter.com/ns/sdc4/">
    <sdc4:dm-label>Soil Moisture Reading</sdc4:dm-label>
    <sdc4:dm-language>en-US</sdc4:dm-language>
    <sdc4:dm-encoding>UTF-8</sdc4:dm-encoding>

    <sdc4:ms-clj5x1g8f000008l09j7f6c3d>
        <sdc4:ms-clj5x2p4k000108l01a2b3c4f>
            <sdc4:xdquantity-value>35.5</sdc4:xdquantity-value>
            <sdc4:xdquantity-units>
                <sdc4:xdstring-value>%</sdc4:xdstring-value>
            </sdc4:xdquantity-units>
        </sdc4:ms-clj5x2p4k000108l01a2b3c4f>
    </sdc4:ms-clj5x1g8f000008l09j7f6c3d>
</dm-a4g9y...>

5.3. Business: Customer Order

<dm-b8f3z... xmlns:sdc4="https://semanticdatacharter.com/ns/sdc4/">
    <sdc4:dm-label>Customer Order</sdc4:dm-label>
    <sdc4:dm-language>en-US</sdc4:dm-language>
    <sdc4:dm-encoding>UTF-8</sdc4:dm-encoding>

    <sdc4:ms-clj5x1g8f000008l09j7f6c3d>
        <sdc4:ms-clj5x4b1c000308l0i9j8k7l6>
            <sdc4:xdstring-value>ORD-2025-12345</sdc4:xdstring-value>
        </sdc4:ms-clj5x4b1c000308l0i9j8k7l6>
        <sdc4:ms-clj5x2p4k000108l01a2b3c4d>
            <sdc4:xdquantity-value>199.99</sdc4:xdquantity-value>
            <sdc4:xdquantity-units>
                <sdc4:xdstring-value>USD</sdc4:xdstring-value>
            </sdc4:xdquantity-units>
        </sdc4:ms-clj5x2p4k000108l01a2b3c4d>
    </sdc4:ms-clj5x1g8f000008l09j7f6c3d>
</dm-b8f3z...>

Additional examples for Finance, Aerospace, Engineering, and Security use cases follow similar patterns.

6. Security Considerations

Implementers of the Semantic Data Charter should be aware of the following security considerations:

  • Data Privacy: When modeling data that contains personally identifiable information (PII), care must be taken to ensure compliance with relevant privacy regulations (e.g., GDPR, HIPAA).
  • Access Control: The act (Access Control Tag) element in XdAnyType should be used to enforce access control policies.
  • Data Integrity: The hash-result and hash-function elements in XdFileType can be used to verify the integrity of binary data.

7. Reference Model Component Details

This section provides a detailed explanation of the primary complexType components available in the SDC Reference Model (sdc4.xsd).

7.1. Root and Structural Components

These components form the foundational structure of any SDC Data Model:

DMType

The mandatory root element type for any SDC instance document. It acts as a wrapper for the data payload and contains instance-specific metadata.

<xsd:complexType name="mc-h7k2x...">
    <xsd:complexContent>
        <xsd:restriction base="sdc4:DMType">
            <xsd:sequence>
                <xsd:element name="dm-label" type="xsd:string" fixed="Patient Vitals"/>
                <xsd:element name="dm-language" type="xsd:language" fixed="en-US"/>
                <xsd:element name="dm-encoding" type="xsd:string" fixed="UTF-8"/>
                <xsd:element ref="sdc4:ms-clj5x1g8f000008l09j7f6c3d"/>
            </xsd:sequence>
        </xsd:restriction>
    </xsd:complexContent>
</xsd:complexType>

ClusterType

A container component used to group other ItemType elements. This allows for the creation of arbitrarily complex hierarchical data structures.

<xsd:complexType name="mc-clj5x1g8f000008l09j7f6c3d">
    <xsd:complexContent>
        <xsd:restriction base="sdc4:ClusterType">
            <xsd:sequence>
                <xsd:element name="label" type="xsd:string" fixed="Blood Pressure" />
                <xsd:element ref="sdc4:ms-..." minOccurs="1" maxOccurs="1" />
            </xsd:sequence>
        </xsd:restriction>
    </xsd:complexContent>
</xsd:complexType>

7.2. Core Data Types (Xd* Types)

These types provide semantically rich representations for common data values:

XdStringType

A general-purpose type for character strings.

<xsd:complexType name="mc-clj5x4b1c000308l0i9j8k7l6">
    <xsd:complexContent>
        <xsd:restriction base="sdc4:XdStringType">
            <xsd:sequence>
                <xsd:element name="label" type="xsd:string" fixed="Customer ID"/>
                <xsd:element name="xdstring-value">
                    <xsd:simpleType>
                        <xsd:restriction base="xsd:string">
                            <xsd:pattern value="CUST-[0-9]{5}"/>
                        </xsd:restriction>
                    </xsd:simpleType>
                </xsd:element>
            </xsd:sequence>
        </xsd:restriction>
    </xsd:complexContent>
</xsd:complexType>

XdQuantityType

Represents a physical quantity with a decimal value and units.

<xsd:complexType name="mc-clj5x2p4k000108l01a2b3c4d">
    <xsd:complexContent>
        <xsd:restriction base="sdc4:XdQuantityType">
            <xsd:sequence>
                <xsd:element name="label" type="xsd:string" fixed="Systolic"/>
                <xsd:element name="xdquantity-value" type="xsd:decimal"/>
                <xsd:element name="xdquantity-units">
                    <xsd:complexType>
                        <xsd:complexContent>
                            <xsd:restriction base="sdc4:XdStringType">
                                <xsd:sequence>
                                    <xsd:element name="xdstring-value" type="xsd:string" fixed="mmHg"/>
                                </xsd:sequence>
                            </xsd:restriction>
                        </xsd:complexContent>
                    </xsd:complexType>
                </xsd:element>
            </xsd:sequence>
        </xsd:restriction>
    </xsd:complexContent>
</xsd:complexType>

Ready to Implement SDC4?