Chetan Khapedia — AI & Data Science Engineer

What it actually takes to make a 7B local model produce exactly N questions, in the right mix, in valid JSON, every time. A custom Ollama Modelfile (`quiz-master`) on top of qwen2.5-coder:7b, a length-aware token budget (~400 tokens per question), a schema-locked prompt with type-mix balancing, and a multi-provider fallback chain (Ollama → Gemini → OpenAI) so the platform stays online when the M4 host is offline. Plus the failure modes — silent truncation, type collapse, off-by-one counts — that any production prompt pipeline has to handle.