← Use cases

How to Self-Host a Coding Assistant with an Open LLM

Step-by-step setup of a private code assistant using DeepSeek Coder V3 or Llama 4 on your own GPU.

  1. STEP 1

    Pick the model

    DeepSeek Coder V3 33B for best quality if you have ≥24GB VRAM. Llama 4 8B for any laptop GPU. Qwen3.6 7B if you also need vision/multilingual.

  2. STEP 2

    Install Ollama

    ollama.com → install → ollama pull deepseek-coder-v3:33b. The 33B Q4 weights are ~20GB; expect a 10-minute download.

  3. STEP 3

    Wire up your editor

    VS Code: install the Continue extension, set provider to Ollama. JetBrains: install Continue or Tabby. Both speak the OpenAI API spec, so you point them at http://localhost:11434/v1.

  4. STEP 4

    Tune the system prompt

    Default chat template is fine for chat. For inline completions, use the FIM (fill-in-the-middle) template specific to your model — DeepSeek Coder uses <|fim_begin|> / <|fim_hole|> / <|fim_end|>.

  5. STEP 5

    Add codebase context

    Continue and Cursor support project-wide retrieval. Index your repo once, then queries automatically pull relevant files. Combined with a 64K-context model like DeepSeek Coder V3, this gives near-Cursor quality, fully local.

Recommended models