Using Ollama With Roo Code
Roo Code supports running models locally using Ollama. This provides privacy, offline access, and potentially lower costs, but requires more setup and a powerful computer.
Website: https://ollama.com/
Setting up Ollama
- 
Download and Install Ollama: Download the Ollama installer for your operating system from the Ollama website. Follow the installation instructions. Make sure Ollama is running ollama serve
- 
Download a Model: Ollama supports many different models. You can find a list of available models on the Ollama website. Some recommended models for coding tasks include: - codellama:7b-code(good starting point, smaller)
- codellama:13b-code(better quality, larger)
- codellama:34b-code(even better quality, very large)
- qwen2.5-coder:32b
- mistralai/Mistral-7B-Instruct-v0.1(good general-purpose model)
- deepseek-coder:6.7b-base(good for coding tasks)
- llama3:8b-instruct-q5_1(good for general tasks)
 To download a model, open your terminal and run: ollama pull <model_name>For example: ollama pull qwen2.5-coder:32b
- 
Configure the Model: Configure your model's context window in Ollama and save a copy. Default Context BehaviorRoo Code automatically defers to the Modelfile's num_ctxsetting by default. When you use a model with Ollama, Roo Code reads the model's configured context window and uses it automatically. You don't need to configure context size in Roo Code settings - it respects what's defined in your Ollama model.Option A: Interactive Configuration Load the model (we will use qwen2.5-coder:32bas an example):ollama run qwen2.5-coder:32bChange context size parameter: /set parameter num_ctx 32768Save the model with a new name: /save your_model_nameOption B: Using a Modelfile (Recommended) Create a Modelfilewith your desired configuration:# Example Modelfile for reduced context
 FROM qwen2.5-coder:32b
 # Set context window to 32K tokens (reduced from default)
 PARAMETER num_ctx 32768
 # Optional: Adjust temperature for more consistent output
 PARAMETER temperature 0.7
 # Optional: Set repeat penalty
 PARAMETER repeat_penalty 1.1Then create your custom model: ollama create qwen-32k -f ModelfileOverride Context WindowIf you need to override the model's default context window: - Permanently: Save a new model version with your desired num_ctxusing either method above
- Roo Code behavior: Roo automatically uses whatever num_ctxis configured in your Ollama model
- Memory considerations: Reducing num_ctxhelps prevent out-of-memory errors on limited hardware
 
- Permanently: Save a new model version with your desired 
- 
Configure Roo Code: - Open the Roo Code sidebar ( icon).
- Click the settings gear icon ().
- Select "ollama" as the API Provider.
- Enter the model tag or saved name from the previous step (e.g., your_model_name).
- (Optional) Configure the base URL if you're running Ollama on a different machine. The default is http://localhost:11434.
- (Optional) Enter an API Key if your Ollama server requires authentication.
- (Advanced) Roo uses Ollama's native API by default for the "ollama" provider. An OpenAI-compatible /v1handler also exists but isn't required for typical setups.
 
Tips and Notes
- Resource Requirements: Running large language models locally can be resource-intensive. Make sure your computer meets the minimum requirements for the model you choose.
- Model Selection: Experiment with different models to find the one that best suits your needs.
- Offline Use: Once you've downloaded a model, you can use Roo Code offline with that model.
- Token Tracking: Roo Code tracks token usage for models run via Ollama, helping you monitor consumption.
- Ollama Documentation: Refer to the Ollama documentation for more information on installing, configuring, and using Ollama.
Troubleshooting
Out of Memory (OOM) on First Request
Symptoms
- First request from Roo fails with an out-of-memory error
- GPU/CPU memory usage spikes when the model first loads
- Works after you manually start the model in Ollama
Cause If no model instance is running, Ollama spins one up on demand. During that cold start it may allocate a larger context window than expected. The larger context window increases memory usage and can exceed available VRAM or RAM. This is an Ollama startup behavior, not a Roo Code bug.
Fixes
- 
Preload the model ollama run <model-name>Keep it running, then issue the request from Roo. 
- 
Pin the context window ( num_ctx)- Option A — interactive session, then save:
# inside `ollama run <base-model>`
 /set parameter num_ctx 32768
 /save <your_model_name>
- Option B — Modelfile (recommended for reproducibility):
Then create the model:FROM <base-model>
 PARAMETER num_ctx 32768
 # Adjust based on your available memory:
 # 16384 for ~8GB VRAM
 # 32768 for ~16GB VRAM
 # 65536 for ~24GB+ VRAMollama create <your_model_name> -f Modelfile
 
- Option A — interactive session, then save:
- 
Ensure the model's context window is pinned Save your Ollama model with an appropriate num_ctx(via/set+/save, or preferably a Modelfile). Roo Code automatically detects and uses the model's configurednum_ctx- there is no manual context size setting in Roo Code for the Ollama provider.
- 
Use smaller variants If GPU memory is limited, use a smaller quant (e.g., q4 instead of q5) or a smaller parameter size (e.g., 7B/13B instead of 32B). 
- 
Restart after an OOM ollama ps
 ollama stop <model-name>
Quick checklist
- Model is running before Roo request
- num_ctxpinned (Modelfile or- /set+- /save)
- Model saved with appropriate num_ctx(Roo uses this automatically)
- Model fits available VRAM/RAM
- No leftover Ollama processes