How SDC4 eliminates the NULL problem in sensor networks and restores the topology of information
Industrial sensor networks generate millions of data points per day. When a value goes missing, the system records NULL. But NULL is not an answer.
A temperature sensor returns NULL. Was it a network dropout? A sensor failure? A scheduled maintenance window? An out-of-range reading the firmware rejected? Four fundamentally different realities, collapsed into one indistinguishable symbol. ML models trained on this data learn an artifact that exists in no real system.
The most valuable signal in predictive maintenance is the trajectory toward failure. But conventional systems log only the failure state (NULL) while discarding the path that led to it. In control systems terms: we sample at the singularity, not along the gradient. The derivative is erased.
One missing vibration reading from a turbine bearing sensor. The predictive maintenance model never saw the anomaly developing because the absence was untyped. By the time a human noticed, the bearing had failed catastrophically. Unplanned shutdown: $2 million.
SDC4 does not fix missing data. It restores the topology of information.
Most pipelines attempt to guess missingness; you instead propose to encode its cause. That shift transforms the problem from probabilistic recovery into deterministic reasoning over a richer state space.
Formally, you are moving from: x = value ∪ {NULL} to something closer to: x = value × reason_for_absence
This is not a minor extension. It is a dimensional lift. Once absence is typed, the schema itself becomes an information-bearing lattice, and the system regains closure: constraints can propagate, inconsistencies can be detected, and inference becomes bounded rather than speculative.
— Kris Welford, B.Eng Control Systems, M.Sc Computer Engineering
Head of Engineering | Principal Control Systems Specialist
Record the value or NULL. Then try to impute the missing data using statistical methods. The model operates on a space where absence is aliased. It is not learning the world. It is learning the artifact of a schema decision.
The schema itself encodes why data is absent. The unfilled form carries uncertainty as a functional void, not a defect. Constraints propagate. Inference is bounded. The system knows what it does not know.
Data quality as a first-class architectural invariant, not a post-hoc cleaning task
SDC4's ExceptionalValue system encodes the exact cause of every missing value. ISO 21090 NULL Flavors (NI, UNK, ASKU, NAV, OTH, MSK, NA, NASK, QS, TRC, PINF, NINF) replace the untyped void with a deterministic state machine. The schema knows what it does not know.
One missing reading → 12 distinct absence types with propagating constraints
When a sensor reading is absent, SDC4 preserves the trajectory that led to the absence. A vibration sensor reporting increasingly erratic readings before a comm timeout tells a fundamentally different story than a clean dropout. Both are NULL in traditional systems. Only one precedes catastrophic failure.
Erratic readings → timeout → ExceptionalValue preserves the failure signature
SDC4 uses xsd:restriction exclusively. Never xsd:extension. This creates a mathematical restriction lattice where every component is a constrained specialization of its parent type. Constraints propagate downward. Validation is deterministic. The schema is not just documentation. It is an executable specification.
XdQuantity → restriction → Bounded, validated, self-describing sensor data
The structural layer (XSD) is independent of the transport protocol. MQTT, OPC-UA, CoAP, REST, or proprietary buses can all carry SDC4 data. The semantic layer (RDF/SHACL) adds ontology links without changing the structure. Upgrade the semantics without touching the schema.
Same schema → MQTT + OPC-UA + REST + legacy protocols
XdQuantity with units, range constraints, and ExceptionalValue types for each sensor
XSD validation catches out-of-range values. SHACL constraints enforce business rules.
Predictive models consume typed absence. The gradient is preserved. The failure signature is intact.
What control systems engineers see in the SDC4 approach
The industry habit is to log the failure state (NULL) while discarding the trajectory toward it. In control terms, we are sampling only at the singularity, not along the gradient. The most valuable signal -- the derivative -- is erased.
What you are really proposing is an inversion: instead of treating data quality as a post-hoc cleaning task, you treat it as a first-class architectural invariant. That unlocks an entirely different class of computation -- constraint satisfaction, schema differencing, lattice subsumption -- not as academic curiosities, but as production primitives.
You are not fixing missing data. You are restoring the topology of information.
— Kris Welford, B.Eng Control Systems, M.Sc Computer Engineering
Head of Engineering | Principal Control Systems Specialist
Real-world applications of SDC4 in IoT and industrial systems
Vibration, temperature, and pressure sensors with typed absence. ML models distinguish between sensor dropout (retrain) and pre-failure signatures (alert immediately). The gradient toward failure is preserved.
Smart meter and transformer data across heterogeneous substations. SDC4 schemas normalize readings from legacy Modbus and modern MQTT sources. Absence types distinguish outages from meter faults.
Air quality, water quality, and weather stations across distributed networks. Long-term data permanence ensures 50-year climate studies remain valid and computable without migration.
GPS, fuel, engine diagnostics, and cargo condition sensors across thousands of vehicles. Protocol-independent schemas work across cellular, satellite, and local connections. Cold-chain monitoring with typed absence for compliance.
Bridge strain gauges, building HVAC, water pressure networks. Structural health monitoring where a missing reading from a strain gauge has fundamentally different implications than a missing HVAC setpoint.
Patient monitoring systems where a missing heart rate reading triggers different clinical responses based on whether the lead detached, the device rebooted, or the patient moved. Each absence type maps to a different clinical protocol.
The cost of untyped absence in industrial sensor networks
One missing sensor value that masked a bearing failure trajectory
Industrial sensor networks routinely carry 8-15% missing values across all readings
Each with different operational implications, all collapsed into one NULL symbol
Stop guessing at missingness. Start encoding it. Contact us to discuss how SDC4 applies to your sensor network.