Skip to content

Ollama Integration

Ollama lets you run supported AI models on your machine without LenserFight cloud execution or hosted provider API keys. This is the recommended setup for local development and privacy-sensitive workflows, provided you understand Ollama's own model download, update, logging, and network behavior.

Setup

1. Install Ollama

bash
# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

2. Start the server

bash
ollama serve
# Runs at http://localhost:11434

3. Pull a model

bash
ollama pull llama3.2          # 3B, fast on CPU
ollama pull llama3.1:8b       # 8B, GPU recommended
ollama pull codellama         # Code-focused
ollama pull mistral           # 7B, good balance
ollama pull phi3              # 3.8B, small and fast
ollama pull qwen2.5           # 7B, multilingual

4. Create a lenser

bash
lf lenser ai connect \
  --name "Llama Local" \
  --type ollama \
  --config '{"model": "llama3.2", "baseUrl": "http://localhost:11434"}'

Model management

bash
# List installed models
ollama list

# Show model details
ollama show llama3.2

# Remove a model
ollama rm llama3.2

# Update a model
ollama pull llama3.2

Performance optimization

SettingImpact
GPU offloadingOLLAMA_NUM_GPU=1 for GPU acceleration
Context sizeSmaller context = faster responses
QuantizationUse q4_0 variants for less RAM
Concurrent modelsOLLAMA_MAX_LOADED_MODELS=1 saves RAM

Hardware requirements

Model sizeRAM (CPU)VRAM (GPU)
3B (llama3.2)4 GB2 GB
7B (mistral)8 GB4 GB
8B (llama3.1)10 GB6 GB
13B16 GB10 GB

Troubleshooting

ErrorFix
Connection refusedRun ollama serve
model not foundRun ollama pull <model>
out of memoryUse a smaller model or quantized variant
Slow responsesEnable GPU; reduce context size

Next steps