Data Physics
The foundational philosophy of the Semantic Data Charter™. How SDC achieves agility without brittleness by separating structure, semantics, and governance.
Where the Complexity Lives
SDC does not eliminate complexity. Complex information systems are complex — that reality doesn't change because you adopt a new reference model. What SDC does is redefine where the complexity lives.
From the archives — September 2008
This section builds on ideas first articulated in Where The Context Lies, a 2008 point paper by Timothy W. Cook that identified the root cause of semantic interoperability failure: “The context currently lies in the software where we cannot exchange it. We need to put it into the data where it belongs.”
Written during the openEHR era of multi-level modeling, the paper argued that domain context — the who, what, when, where, and why — was trapped in opaque application code instead of traveling with the data. SDC's Data Physics is the matured realization of that thesis: complexity relocated from hidden integration plumbing to transparent definitions and queries, now extended beyond healthcare to every domain.
Read the original paper (PDF, 2008)In traditional enterprise architecture, the complexity is buried in opaque integration code: middleware, ETL pipelines, API adapters, message buses, data mapping layers, transformation scripts. This code is written by developers, maintained by developers, and understood only by developers. When it breaks, developers debug it. When requirements change, developers rewrite it. The domain experts who understand the data and the analysts who need to query it are locked out of the system that connects their work.
SDC moves the complexity into two transparent places:
1. Domain Expert Definitions
The people who understand the data — clinicians, registrars, port authorities, tax officials — define the shape and meaning of their data. They declare the types, constraints, enumerations, access policies, and semantic annotations. This is where the modeling complexity lives. It is visible, auditable, and governed by the people who know the domain.
2. Query Complexity
The information analysts who need to ask cross-domain questions write SPARQL queries (or XQuery/XPath against XML databases) that traverse the graph. The queries can be sophisticated — multi-hop traversals across ten domains, role-filtered views, temporal bridging across schema versions. This is where the analytical complexity lives. It is visible, testable, and owned by the people who understand the questions.
What disappears is the integration layer in between. No middleware translating Civil Registry records into Healthcare records. No ETL pipeline mapping Maritime crew lists to Employment records. No API adapter converting Tax Authority formats to Port Authority formats. The integration is structural — guaranteed by the Reference Model at compile time — so the code that used to bridge these systems simply doesn't need to exist.
The tradeoff is honest: SDC increases the rigor required to define data models and increases the sophistication required to query across them. But both of these are managed by the right people — domain experts and information analysts — instead of being buried in a codebase that neither group can read.
The Charter as Constitution
A Semantic Data Charter is, in the most literal sense, a constitution for data.
The data model definitions are laws: published, versioned, immutable once ratified. They declare the structural rules (types, constraints) and the governance rules (access policies, legal bases) under which data operates. New requirements don't amend existing laws — they mint new ones, just as constitutional amendments extend rather than rewrite the original document.
The Reference Model is the constitutional framework: the meta-rules that all laws must conform to. SDC4 defines the type system, the Cluster hierarchy, the act governance model, and the RDF emission patterns. Every data model "law" must be valid under this framework, just as every statute must be valid under the constitution.
And like a constitution, the Charter is readable by non-lawyers. A domain expert can look at an SDC data model and understand what it says — because the model is the domain language, expressed in a structured form. The complexity is in the definitions and the queries, not in hidden plumbing between them.
The Problem: Semantic Coupling
Traditional data modeling is "brittle" because it conflates the Concept (The Thing) with the Label (The Word).
Example: An XSD defines <CustomerType>.
The Failure: When the business redefines "Customer" to "Client" (or changes the scope of what a customer is), the schema must be updated. This breaks backwards compatibility, invalidates historical data, and requires expensive ETL to migrate old records to the new definition.
Every enterprise data architect has lived this nightmare. The label changes, so the schema changes, so the database changes, so the application changes, so the integration changes. A one-word business decision cascades into months of engineering work — and the historical data either gets migrated (expensive, lossy) or abandoned (wasteful, risky).
The root cause is that the schema used the word as the identifier. When the word changes, everything downstream breaks.
The SDC Solution: Concept Unique Identifiers (CUIDs)
SDC anchors data not to words, but to CUIDs (Concept Unique Identifiers).
What Is a CUID?
A CUID is a collision-resistant unique identifier minted once per component definition. It is the component's permanent address in the semantic space — bound to a specific structural and semantic definition, independent of the application that uses it, independent of the database that stores it.
In SDCStudio, CUIDs are generated using the CUID2 algorithm — a secure, collision-resistant ID format designed for distributed systems. A typical CUID looks like:
clxk8s0oo0001jn08g5r3h7z4
This string has no semantic content. It doesn't encode the component's name, type, project, or version. It is a pure coordinate — a point in semantic space.
The Immutability Rule
Once a CUID is minted, its definition, constraints, scope, and access policies are frozen forever.
- The CUID's structural definition — its data type, constraints, enumerations, and
act(access control) policies — is permanent. - Its semantic definition — RDF/XML predicate-object pairs using standardized predicates from OWL, RDFS, SKOS, etc. — is equally frozen.
- If a concept evolves, a new CUID is minted with its own definition. Conceptual equivalence between the old and new CUIDs is expressed through semantic annotations, not by reusing the same identifier.
- Each definition can carry as many predicate-object pairs as needed to fully describe its meaning.
A record created in 2024 using CUID-A will always be valid against the 2024 Data Model. It never "rots." It is a perfect fossil of the reality at the moment of creation.
The Schema as Coordinate System
The SDC schema is not a dictionary (where words have definitions that can change). It is a coordinate system (where points have fixed positions).
| Traditional Schema | SDC Schema |
|---|---|
<CustomerType> is the identifier |
clxk8s0oo0001jn08g5r3h7z4 is the identifier |
| Redefining "Customer" as "Client" breaks the schema and requires migration | "Customer" and "Client" are separate CUIDs; shared semantics expressed via RDF/XML predicate-object pairs (owl, rdfs, skos) |
| Historical records must be migrated | Historical records remain valid forever |
| The schema is a living document | The schema is a published, immutable artifact |
Non-Destructive Evolution: The Semantic Ledger
Concepts evolve. Businesses change. Regulations expand. SDC handles this not by updating definitions, but by minting new ones. SDC functions as a Semantic Ledger — append-only, never overwrite.
How It Works
Scenario: The business concept of "Customer" (CUID-A) evolves into a broader concept of "Client" (CUID-B).
The Execution: We do not patch CUID-A. We mint CUID-B.
CUID-A("Customer") — minted 2024, definition frozen, still valid for all 2024 records.CUID-B("Client") — minted 2026, broader scope, used for all new records going forward.CUID-Ais marked as deprecated — not deleted, not modified, just flagged.
Old Data
Remains 100% valid against the 2024 Data Model. Zero migration. Zero ETL. The records are perfect fossils.
New Data
Is minted against the 2026 Data Model using CUID-B.
Both Coexist
In the same graph, in the same triple store, queryable together.
Versioning: The Mechanism
SDCStudio enforces this through a modified semantic versioning scheme: MAJOR.MINOR.PATCH
- MAJOR = SDC Reference Model version (currently 4 for SDC4)
- MINOR = Feature releases for the specific artifact
- PATCH = Bug fixes and minor updates
Every component minted under SDC4 carries a 4.x.x version. This version number permanently binds the CUID to the SDC4 Reference Model's structural rules. A 4.x.x CUID will always be structurally compatible with every other 4.x.x CUID.
The XSD Is the Source of Truth
The XSD (XML Schema Definition) is the authoritative source of truth for every SDC data model.
The published XSD contains:
- Structural constraints — data types, cardinality, enumerations, min/max values, pattern restrictions
- RDF/XML semantics — semantic annotations embedded directly in the schema, not layered on top
actgovernance policies — access control tags declaring who can see, use, and compose each cell- CUID bindings — every component's permanent identifier is declared in the schema
- Reference Model conformance — the XSD enforces SDC4 type system rules at validation time
An XSD-valid record is, by definition, structurally governed. The schema is both the blueprint and the enforcer. Any record that validates against the XSD is guaranteed to carry correct types, constraints, semantics, and governance — because the XSD is all of those things.
Two Query Interfaces, One Source of Truth
| Storage | Query Language | Strength |
|---|---|---|
| XML Database (MarkLogic, BaseX, eXist-db) | XQuery / XPath | Native XSD validation, schema-aware queries, direct access to structural constraints and RDF/XML annotations |
| RDF Triple Store (GraphDB, Fuseki) | SPARQL | Graph traversal across domains, federated queries, semantic bridging between CUIDs |
These are not competing approaches. They are two views of the same truth:
- XQuery/XPath operates on the XML instances and their schemas directly. It can validate, query, and traverse the data with full awareness of the XSD structure.
- SPARQL operates on the RDF triples extracted from those same XML instances. It excels at cross-domain graph traversal.
The underlying principle is storage-agnostic: the evolutionary metadata is in the schema, not in the query engine.
Evolutionary Bridging via the Graph
The graph stores the evolutionary relationship between CUIDs. SPARQL is particularly convenient for cross-domain queries because it was designed for exactly this kind of traversal.
Semantic Bridging
Querying Across Time
A SPARQL query can traverse these edges to bridge across evolutionary boundaries:
The query returns both 2024 "Customer" records and 2026 "Client" records — because the graph knows they are semantically related. No migration happened. No ETL ran. The old records were never touched.
Reference Model Evolution: SDC4 to SDC5
The hardest version of the brittleness critique: What happens when the Reference Model itself changes? The answer follows the same pattern, one level up.
1. SDC4 is not patched
Every 4.x.x CUID, every SDC4 Data Model, every SDC4-generated app remains frozen and valid. The records are fossils — permanently governed by the SDC4 Reference Model that created them.
2. SDC5 CUIDs are minted fresh
New components carry 5.x.x versions. They conform to SDC5's type system and governance model. They are structurally incompatible with SDC4 components (different major version = different Reference Model).
3. The graph bridges them
The triple store holds both SDC4 and SDC5 triples. Semantic bridging edges connect equivalent concepts across Reference Model versions.
4. Queries traverse the bridge
A SPARQL query for "all Person records" can follow the sdc:succeeds edge to include both SDC4 and SDC5 records, with the graph providing the structural mapping between the two Reference Model versions.
The key insight: The Reference Model version is encoded in every CUID's version number. A 4.x.x component will never be confused with a 5.x.x component. They coexist in the graph as distinct entities with an explicit evolutionary relationship — not as ambiguous versions of "the same thing." This is fundamentally different from traditional schema migration, where version N+1 replaces version N and the old data must be transformed. In SDC, version N and version N+1 are both permanent residents of the graph. The bridge is metadata, not migration.
Governance Travels with the CUID
A CUID carries not just its structural definition but also its access policies — the act (access control tag) elements that declare who can see, use, and compose this data.
When CUID-B supersedes CUID-A, the governance policies may also evolve:
CUID-A ("Customer", 2024)
act allows dpv:ServiceProvision, dpv:DirectMarketing
CUID-B ("Client", 2026)
act allows dpv:ServiceProvision only — direct marketing removed per new privacy regulation
Old records keep old policies. A 2024 record minted under CUID-A retains its 2024 act policies. It was governed by the rules that existed when it was created. The data is a fossil — including its governance.
New records get new policies. A 2026 record minted under CUID-B carries the updated, stricter policies.
The graph contains records with different governance rules for the same conceptual entity, and both are correct — because each record is governed by the rules that were in force at the time of its creation. The governance doesn't just protect the data — it is part of the data, frozen at the moment of creation, and auditable forever.
Summary: Agility without Brittleness
SDC achieves Agility without Brittleness by separating three concerns:
| Concern | Mechanism | Source of Truth | Mutability |
|---|---|---|---|
| Structure | CUID + Reference Model version | XSD | Immutable — frozen at mint time |
| Semantics | Evolutionary edges (supersedes, succeeds) | XSD (expressed as RDF/XML annotations) | Evolving — append-only ledger |
| Governance | act elements with DPV vocabulary |
XSD | Immutable per CUID — evolves only through new CUIDs |
| Query | SPARQL (triple store) or XQuery/XPath (XML database) | Derived from XSD | Storage-agnostic |
The XSD is the source of truth. The graph (or XML database) is the query interface. The evolutionary metadata lives in the schema — not in any particular storage engine.
We don't ask the data to change. We don't ask the schema to change.
We ask the query layer to understand the history.
The data is a fossil. The schema is the geological record. The query engine is the paleontologist.
About Axius SDC
The Semantic Data Charter is developed by Axius SDC, Inc., an international team with 40+ years combined experience in semantic data and health informatics across the United States, Canada, and Brazil.
Learn more about our team