← Use cases

How to Replace Claude or GPT with an Open-Source LLM

A migration playbook: which open model maps to which closed model, what breaks, and how to shadow-test before switching.

  1. STEP 1

    Map model → model

    Claude Opus / GPT-5 → DeepSeek V4 685B (frontier). Claude Sonnet / GPT-5 mini → DeepSeek V4 67B or Llama 4 70B. Claude Haiku / GPT-5 nano → Llama 4 8B or Qwen3.6 7B. Always test with your prompts — quality maps differently than benchmarks suggest.

  2. STEP 2

    Build an eval set first

    Pull 200-500 production prompts. Get the closed model's response for each. This is your ground truth. Then run the open model and grade with another LLM (LLM-as-judge) or humans. Without this you won't know if the migration degrades quality.

  3. STEP 3

    Handle the prompt template

    Claude and GPT use chat formats; open models use model-specific chat templates (Llama-3 format, ChatML, Mistral-Instruct, etc.). Use Hugging Face's apply_chat_template() instead of writing them yourself.

  4. STEP 4

    Shadow test in production

    For 1-2 weeks, send each request to BOTH the closed model (serve to user) and the open model (log only). Compare logs. Find drift before users do.

  5. STEP 5

    Keep the closed model as fallback

    After cutover, keep a routing layer that can flip back to Claude/GPT for any request type that regresses. Most teams end up with a hybrid: open for the 80% of routine queries, closed for the 20% that need it. The cost savings are still huge.

Recommended models