Infrastructure

MLOps

The practice of streamlining the ML lifecycle — from experimentation to production deployment and ongoing monitoring.

Definition

MLOps (Machine Learning Operations) applies DevOps principles to ML systems, addressing the unique challenges of deploying and maintaining models in production. While software code is deterministic, ML models are probabilistic and degrade when the data distribution shifts ("concept drift").

MLOps encompasses: experiment tracking (logging hyperparameters, metrics, artefacts), model versioning and registry, CI/CD pipelines for model training and deployment, feature stores, automated retraining triggers, performance monitoring, and A/B testing.

Mature MLOps platforms (Weights & Biases, MLflow, Kubeflow, Vertex AI) are necessary for organisations running many models in production. Without MLOps, the "85% of ML projects never reach production" statistic reflects the gap between experimentation and reliable deployment.

Examples

  • Weights & Biases
  • MLflow
  • Kubeflow
  • Amazon SageMaker
  • Databricks MLflow