💸 Earn 20% cashback for every friend you refer who subscribes — Refer & Earn →

What MLOps interview questions should you prepare and how do you answer them?

Updated June 18, 2026 · 8 min read · Crack ML Interview

TL;DR

MLOps interviews test how you take a model from notebook to reliable production service and keep it healthy. Expect questions on CI/CD and reproducibility, model and feature versioning, deployment strategies like shadow and canary, monitoring for data and concept drift, and automated retraining. The strongest answers treat the model, data, and code as three versioned artifacts that must stay in sync, and emphasize observability and safe rollback over clever modeling. Concrete tooling names and a clear story about detecting and recovering from model degradation separate strong candidates from those who only know training.

Reproducibility, CI/CD, and Versioning

Treat code, data, and model as three versioned artifacts

The foundational MLOps idea is that reproducing a model requires versioning three things together: the code, the data and feature definitions, and the resulting model artifact plus its hyperparameters. Code goes in Git; data and large artifacts use tools like DVC or a data lake with immutable snapshots; experiments and models are tracked in a registry like MLflow. When asked how you would reproduce a model from six months ago, the answer is to pin the code commit, the dataset snapshot, and the environment, and to have logged the exact training configuration. Candidates who only version code reveal a gap interviewers probe.

CI/CD for models adds data and model validation gates

ML CI/CD extends software CI/CD with extra gates. Beyond unit and integration tests, you add data validation that checks schema and distribution of incoming data, model validation that checks the new model beats a baseline on a held-out set and on critical slices, and behavioral tests for known edge cases. The pipeline trains, evaluates, registers, and conditionally deploys. Emphasize that a model passing offline metrics is necessary but not sufficient; it must clear validation gates and a staged rollout before serving production traffic.

Deployment Strategies and Serving

Shadow, canary, and blue-green for safe model rollout

Shadow deployment runs the new model on live traffic without serving its predictions, letting you compare against the current model with zero user risk, ideal for validating latency and prediction distribution before exposure. Canary or A/B rollout sends a small traffic percentage to the new model, monitors the online metric, and ramps up gradually with automatic rollback on regression. Blue-green keeps two full environments and switches traffic atomically for instant rollback. Knowing when to use each, shadow for risk-free validation, canary for measuring real impact, is a frequent question.

Online versus batch serving and the feature store

Decide between batch prediction, where you precompute and cache predictions on a schedule, and online prediction, where you compute per request, based on the latency requirement and whether inputs are known in advance. A feature store is central: it serves the same feature values offline for training and online for inference, guaranteeing point-in-time correctness and eliminating training-serving skew. Explain the offline store for training and the low-latency online store for serving, and why consistency between them is the whole point of having a feature store.

Monitoring, Drift, and Retraining

Monitor inputs, predictions, and outcomes separately

A complete monitoring story tracks three layers. Operational metrics cover latency, throughput, and error rate like any service. Data and prediction monitoring watches feature distributions and the prediction distribution for drift, since these signal trouble before labels arrive. Model performance monitoring tracks the actual outcome metric once ground-truth labels become available, which is often delayed. Explain how you handle delayed labels, for example by monitoring proxy signals in the meantime, and define concrete alerts rather than vaguely saying you would monitor the model.

Distinguish data drift from concept drift and design retraining

Data drift is a change in the input distribution; concept drift is a change in the relationship between inputs and the target. Detect data drift with statistical tests on feature distributions, and concept drift through a drop in the outcome metric. Design retraining as either scheduled, on a fixed cadence, or triggered, when drift detection or a metric drop fires. Address the full loop: detect, retrain on fresh data, validate against the current model, and deploy via canary with rollback. Treating the model as a continuously maintained system rather than a one-off is the senior signal.

MLOps Interview Topics: Question, Strong-Answer Anchor, and Tooling

TopicRepresentative QuestionStrong-Answer AnchorExample Tooling
ReproducibilityHow do you reproduce a 6-month-old model?Version code, data, and config togetherGit, DVC, MLflow
CI/CDWhat does CI/CD look like for ML?Add data and model validation gatesGitHub Actions, Kubeflow
DeploymentHow do you roll out a new model safely?Shadow, then canary with auto-rollbackSeldon, KServe
Feature storeWhy use a feature store?Consistent offline/online featuresFeast, Tecton
MonitoringWhat do you monitor in production?Ops, data/prediction drift, outcome metricEvidently, Prometheus
RetrainingWhen and how do you retrain?Scheduled or drift-triggered, validated rolloutAirflow, Kubeflow Pipelines

Who this is for

Data scientist strong in modeling, light on production operations

Profile: Builds and evaluates models in notebooks, fluent in metrics and feature engineering, but has never owned deployment, monitoring, or a retraining pipeline.

Pain points: Answers modeling questions deeply but gives vague responses on rollout strategy, drift detection, and how to keep a model healthy after launch, which is the core of an MLOps role.

Strategy: Study the production lifecycle explicitly: deployment strategies, the three layers of monitoring, and the detect-retrain-validate-deploy loop. Learn to name concrete tooling and to distinguish data drift from concept drift. Reframe modeling strength around how you would operationalize and maintain a model in production.

DevOps or platform engineer moving into MLOps

Profile: Expert in CI/CD, Kubernetes, and observability for traditional services, but new to ML-specific concerns like training-serving skew, feature stores, and model drift.

Pain points: Designs solid infrastructure but underweights the ML-specific gates: data validation, model validation against baselines, point-in-time feature correctness, and drift monitoring.

Strategy: Map existing CI/CD and observability skills onto the ML lifecycle, then add the ML-specific layers: data and model validation gates, feature store consistency, and drift detection. Emphasize the insight that models degrade silently over time, which is the conceptual leap from traditional ops to MLOps.

FAQ

Q: What is the difference between data drift and concept drift?

A: Data drift is a change in the input distribution, for example a new user demographic, while the input-to-label relationship stays the same. Concept drift is a change in that relationship itself, so the same inputs now map to different correct outputs. Data drift is detected with distribution tests on features; concept drift is detected through a drop in the outcome metric.

Q: Why is a feature store important in MLOps?

A: A feature store serves the same feature values for offline training and online inference, guaranteeing point-in-time correctness and eliminating training-serving skew, which is a leading cause of models that look good offline but fail in production. It also enables feature reuse across teams and consistent feature versioning.

Q: How do you decide between batch and online model serving?

A: Use batch serving when predictions can be precomputed on a schedule and inputs are known in advance, which is cheaper and simpler. Use online serving when predictions depend on real-time inputs or must be fresh within a tight latency budget. The decision is driven by the latency requirement and whether the input is available ahead of time.

Want to practice with real, verified ML interview questions from top companies?

Browse the question bank