How to Self-Host a Coding Assistant with an Open LLM

Step-by-step setup of a private code assistant using DeepSeek Coder V3 or Llama 4 on your own GPU.

STEP 1
Pick the model
DeepSeek Coder V3 33B for best quality if you have ≥24GB VRAM. Llama 4 8B for any laptop GPU. Qwen3.6 7B if you also need vision/multilingual.
STEP 2
Install Ollama
ollama.com → install → ollama pull deepseek-coder-v3:33b. The 33B Q4 weights are ~20GB; expect a 10-minute download.
STEP 3
Wire up your editor
VS Code: install the Continue extension, set provider to Ollama. JetBrains: install Continue or Tabby. Both speak the OpenAI API spec, so you point them at http://localhost:11434/v1.
STEP 4
Tune the system prompt
Default chat template is fine for chat. For inline completions, use the FIM (fill-in-the-middle) template specific to your model — DeepSeek Coder uses <|fim_begin|> / <|fim_hole|> / <|fim_end|>.
STEP 5
Add codebase context
Continue and Cursor support project-wide retrieval. Index your repo once, then queries automatically pull relevant files. Combined with a 64K-context model like DeepSeek Coder V3, this gives near-Cursor quality, fully local.

Recommended models