Who does what?

Mapping the modelling task on to the team

Problem statement

The key coordination problem is how to divide responsibility between the data engineer and the model engineer when defining the prediction task. The data engineer will extract and shape hospital EHR data into reliable, versioned tables of antibiotic prescribing and microbiology outcomes, while the model engineer will build a simple first model (for example, logistic regression). The hardest work sits between them: choosing the index event that anchors each prediction, defining the label window and prediction horizon, and specifying which features are allowed before the cut-off time so that leakage is avoided. We will use the language from Label–Segment–Featurize, and implemented this in tested, modular transformations as per the MLOps philosophy. The dataset, features and labels are reproducible across training an serving environments, and the resulting model can be moved back into NHS systems after development offline in a Trusted Research Environment.

We have four roles in the RADIX team.

Role
Notes

Data engineering

Infrastructure engineering

Model engineering

Clinical lead

Design principles

  • MLOpsarrow-up-right is the discipline of turning a machine learning model into a reliable, auditable, repeatable service. In practice it means: versioning data, labels, features, code, and models; automating training/evaluation; enforcing quality gates (tests, audits, monitoring); and making deployment and rollback safe.(Corbin 2023arrow-up-right) For this project, MLOps implies an end-to-end pipeline that can be run deterministically in both NHS and research environments, with clear interfaces between data preparation, feature generation, model training, and serving—so that a “minimal working model” can be promoted, monitored for drift/performance, and iterated without breaking clinical workflow or information governance.

  • Label–Segment–Featurize (L-S-F): A useful design pattern for “prediction engineering” in time-stamped relational data.(Kanter 2016arrow-up-right) The key principle is to explicitly define (1) the label and its time window, (2) the cutoff time at which the prediction is made and the lead/lag that define what historical data is allowed, and (3) the feature extraction restricted to the allowable segment to prevent label leakage. Applied here, L-S-F forces agreement on what the “prescribing event” is (the anchor), how resistance/mismatch is determined relative to that anchor (label window), and what prior history is admissible as predictors (feature window).

  • High level requirements

    • Clear ownership: each deliverable has a single accountable role.

    • Reproducibility: datasets and models are versioned and re-runnable.

    • No leakage: features must only use information available at prediction time.

    • IG by design: identifiable work stays on NHS infrastructure; research work stays in the DSH.

    • Deployment path: the trained model can be applied in NHS or HSL infrastructure with the same feature contract.

Implementation

Overview

  • The data engineer owns the data product (event definitions, labels, features, dataset versioning).

  • The model engineer owns the modelling product (training/evaluation code, leakage control, packaging for deployment).

  • The clinical lead owns clinical validity & safety (definitions, exclusions, outcome validity, evaluation framing).

  • The infrastructure engineer owns platform & delivery mechanics (environments, CI/CD, orchestration, security boundaries).

Data versus Model engineering

circle-info

The rule of thumb is that data features are facts, and model features are representations of those facts.

  • Data features are time-stamped, clinically interpretable, reproducible facts at the right time granularity ("as-of" tpresciptiont_{presciption} ), with stable definitions and data-quality tests. If you can say “a clinician could recognise this field as a real-world thing” → it’s probably a data feature.

  • Model features decide how such facts are encoded and transformed for learning and inference (normalisation, interactions, binning, embeddings, missingness strategy, calibration-oriented transforms), and own the code that makes training/inference identical. If you can say “this is a transformation to help the model” → it’s probably a model feature.

This means that data engineering will build a data model with one row per prediction event (e.g. antibiotic start time). That model will

  • present columns of atomic facts with clear "as-of" tpresciptiont_{presciption} semantics, without leakage, with clear units, and stable semantic meaning and interpretation

  • this clinical ML feature store with one row per prediction event is likely to be built from either Camino (our modular pipeline that maintains tables based on their underlying concepts derived in turn from the UCLH enterprise data warehouse), or the HSL Radix FHIR store.

And model engineering will build additional columns within the model pipeline but would not build fresh tables or re-organise the "as-of" tpresciptiont_{presciption} data structure.

Heuristics for data versus model engineering decisions

  1. Would you want this displayed in a clinical dashboard? For example, who builds the "Number of antibiotic courses in last 90 days" feature (anchored at "as-of" tpresciptiont_{presciption} )? Here you'd argue this would be of interest to a clinical consumer so this is a data feature. Conversely, normalising or log transform the same number would be a model feature.

  2. Does the feature require fitting parameters?

    • Model features: StandardScaler mean/SD, target encoding priors, PCA loadings, spline knots, calibration maps

    • Data features: Simple deterministic aggregations (“count”, “max”, “days since”)

  3. Is the feature "semantic" or "algorithmic"? This is similar to (1) above.

    • Semantic feature (has meaning independent of algorithm): comorbidity count, prior resistant isolate, creatinine latest, ward, age → data features

    • Model features: Algorithmic feature (meaning depends on algorithm): one-hot encoding choices, missingness indicators, interaction terms, monotonic binning, learned embeddings → model features

  4. A promotion rule - from model to data feature Moving a transformation upstream from model to data only if:

    • it improves multiple models/use-cases,

    • it’s deterministic and stable,

    • it has clinical meaning or operational value,

    • it can be tested with audits,

    • it doesn’t embed training-set statistics.

  5. Default to a data feature if you're not sure

Summary

Effectively we are building two products - a clinical feature store and a many models and their design matrices

  1. Clinical ML Feature Store (facts)

    1. Relationships

      1. Owned by the data engineer

      2. Reviewed by the clinical lead

      3. Consumed by the model engineer

    2. Layers

      1. Bronze: raw extracts from Epic Clarity/Caboodle mapped to standard fields

      2. Silver: Camino - cleaned, deduplicated, time-aligned tables (meds, cultures, encounters)

      3. Gold: the event-anchored fact table: ⁠features and labels "as-of" tpresciptiont_{presciption}

    3. Never "saved" because it represents current truth

  2. Model Design Matrix (representations)

    1. Relationships

      1. Owned by the model engineer

      2. Reviewed by the clinical lead

    2. Turns ⁠features "as-of" tpresciptiont_{presciption} into

      1. encoded matrix \(X\) (categorical encoding, scaling)

      2. derived terms (interactions, non-linear transforms)

      3. missingness handling as implemented logic

    3. Saved as a versioned artefacts within MLflow or similar

Examples

  • Data features

    • ⁠age_at_event, ⁠sex

    • ⁠drug_name, ⁠route, ⁠dose, ⁠indication_code (as recorded)

    • ⁠organism_last_12m: last cultured organism category (if known before \(t_c\))

    • ⁠prior_resistance_to_drug_2y: count of prior non-susceptible results

    • ⁠days_since_last_antibiotic

    • ⁠num_hospital_admissions_1y

    • ⁠latest_creatinine_value_before_tc + ⁠timestamp_of_creatinine

  • Model features

    • One-hot / target encoding of ⁠drug_name, ⁠organism_last_12m

    • ⁠missing_creatinine_indicator

    • ⁠log1p(num_hospital_admissions_1y)

    • Interaction terms: ⁠drug_class × prior_resistance

    • Binning: ⁠age_group chosen for performance/robustness

    • Calibration mapping and threshold selection

  • Borderline cases

    • age_group

      • If it’s for reporting/clinical interpretability and fixed upfront → a data feature

      • If it’s tuned/changed during modelling → a model feature

Last updated