Building a Self-Hosted Production ML Platform
Running ML workloads in the cloud is convenient until you see the bill. A single GPU instance on AWS (g5.xlarge) costs ~$1/hour — that's $720/month if you leave it running. For experimentation, prototyping, and learning, a home MLOps lab is an far more economical alternative.
The hardware foundation doesn't need to be exotic. My setup runs on a refurbished Dell OptiPlex 3080 (i5-10500, 32GB RAM, ~$250) paired with an Nvidia RTX 3060 12GB ($300 used). Total cost: less than a single month of cloud GPU. The 12GB VRAM is enough for fine-tuning 7B parameter models with QLoRA.
Docker Compose is the orchestration backbone. Each service gets its own container: MLflow for experiment tracking (port 5000), MinIO for artifact storage (S3-compatible, port 9000), PostgreSQL for MLflow's backend store, JupyterLab for development (port 8888), and a Triton Inference Server for model serving (ports 8000-8002).
For teams or multi-model scenarios, K3s (lightweight Kubernetes) layers on top. It handles scaling inference replicas, rolling updates when you push a new model version, and resource quotas so a runaway training job doesn't starve the inference server. The single-node K3s install takes under 5 minutes.
Monitoring is non-negotiable even in a homelab. Prometheus scrapes metrics from every container — GPU utilization (via nvidia-smi exporter), inference latency percentiles, request throughput, memory pressure. Grafana dashboards give you a single pane of glass. I've open-sourced my dashboard JSON configs.
The training pipeline: code in JupyterLab → track experiments in MLflow → best model registers in MLflow Model Registry → CI/CD triggers Triton reload → new model serves traffic within 30 seconds. No cloud, no vendor lock-in, full reproducibility.
The biggest surprise? Reliability. This stack has been running 24/7 for 8 months with zero unplanned downtime. Docker's restart policies and health checks handle everything. The total power draw is about 80W idle, 250W under full training load — roughly $15/month in electricity.
Chetan Khapedia
AI & Data Science Engineer · Robotics · Edge AI · MLOps