Quick reference
Data versus Model engineering
Facts
Recorded or deterministically derived real-world facts at the prediction grain, time-safe "as-of" tp where p is the prediction time
If a domain expert can recognise it as a thing that happened / was measured / was true → it belongs here
Data Engineering (with Domain Oversight sign-off)
features_at_tc (event-grain table) + short data dictionary
Representation
Mathematical representations of those facts used by an algorithm (encodings, scaling, interactions, learned transforms)
If it’s a trick to help the model or depends on training data statistics → it belongs here
Model Engineering (with Domain Oversight sign-off)
preprocessor + model artefact (versioned) + input column list
Outcome
Labels/outcomes computed after "as-of" tp, plus censoring/availability flags
If it answers “what happened after?” → it’s an outcome, not a feature
Data Engineering (with Domain Oversight sign-off)
labels_at_tc (keyed by event_id)
Contract
The small set of agreed fields that must be stable across training and inference
If changing it would break someone else’s code → it’s part of the contract
Joint: Data + Model Engineering (Domain approves meaning)
contract_version + schema file (very small)
Enforcement
Tests and checks that keep the contract true over time (uniqueness, not-null, ranges, leakage sentinels)
If it can silently corrupt results → it needs an automated check
Platform/Infrastructure (implemented by DE/ME as relevant)
CI checks + scheduled audits + “fail fast” schema validation
Last updated
