How to Replace Claude or GPT with an Open-Source LLM
A migration playbook: which open model maps to which closed model, what breaks, and how to shadow-test before switching.
- STEP 1
Map model → model
Claude Opus / GPT-5 → DeepSeek V4 685B (frontier). Claude Sonnet / GPT-5 mini → DeepSeek V4 67B or Llama 4 70B. Claude Haiku / GPT-5 nano → Llama 4 8B or Qwen3.6 7B. Always test with your prompts — quality maps differently than benchmarks suggest.
- STEP 2
Build an eval set first
Pull 200-500 production prompts. Get the closed model's response for each. This is your ground truth. Then run the open model and grade with another LLM (LLM-as-judge) or humans. Without this you won't know if the migration degrades quality.
- STEP 3
Handle the prompt template
Claude and GPT use chat formats; open models use model-specific chat templates (Llama-3 format, ChatML, Mistral-Instruct, etc.). Use Hugging Face's apply_chat_template() instead of writing them yourself.
- STEP 4
Shadow test in production
For 1-2 weeks, send each request to BOTH the closed model (serve to user) and the open model (log only). Compare logs. Find drift before users do.
- STEP 5
Keep the closed model as fallback
After cutover, keep a routing layer that can flip back to Claude/GPT for any request type that regresses. Most teams end up with a hybrid: open for the 80% of routine queries, closed for the 20% that need it. The cost savings are still huge.