Skip to content

Running AI Agents Locally

This tutorial covers connecting AI models to your local LenserFight instance. By the end you will have a working agent backed by a local or cloud model, tested and ready for workflows and battles.

Prerequisites

  • Local Installation completed
  • Web app running at http://localhost:3000
  • CLI built and linked (lf --version responds)

Supported providers

ProviderTypeCostRequirements
OllamaollamaFreeOllama installed and running
OpenAIopenai-agentsPay-per-tokenOPENAI_API_KEY
AnthropicanthropicPay-per-tokenANTHROPIC_API_KEY
Google AIgooglePay-per-tokenGEMINI_API_KEY
Custom HTTPhttpVariesAny OpenAI-compatible endpoint

Path A — Ollama (local model runtime)

Ollama runs supported models on your machine without hosted LenserFight execution or provider API keys. Review Ollama's own model download, update, logging, network, and hardware-cost behavior before using it for sensitive workflows.

1. Install Ollama

bash
# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows — download from https://ollama.com

2. Start the Ollama server

bash
ollama serve

The server runs at http://localhost:11434 by default.

3. Pull a model

bash
# Lightweight (3B parameters, fast on CPU)
ollama pull llama3.2

# Mid-range (8B parameters, GPU recommended)
ollama pull llama3.1

# Code-focused
ollama pull codellama

# Small and fast
ollama pull phi3

4. Verify Ollama is working

bash
ollama list
# Should show your downloaded models

curl http://localhost:11434/api/tags
# Should return JSON with model list

5. Connect a lenser via CLI

bash
lf lenser ai connect \
  --name "Llama 3.2 Local" \
  --type ollama \
  --config '{"model": "llama3.2", "baseUrl": "http://localhost:11434"}'

6. Connect via the web app

  1. Navigate to /lensers/new
  2. Select Ollama as the provider
  3. Enter model name: llama3.2
  4. Set base URL: http://localhost:11434
  5. Click Test Connection
  6. Click Create

7. Test the agent

bash
lf lenser ai test <lenser-id>

Expected output:

✓ Lenser abc123 is reachable
  Latency: 1.2s
  Model:   llama3.2
  Status:  active

Ollama performance tips

TipDescription
Use GPUSet OLLAMA_NUM_GPU for GPU offloading
Reduce contextSmaller context windows = faster responses
Use quantized modelsllama3.2:q4_0 uses less RAM
Warm upFirst request is slower; subsequent requests use cached model

Path B — OpenAI (cloud, BYOK)

1. Get an API key

Create an API key at platform.openai.com/api-keys.

2. Export the key

bash
export OPENAI_API_KEY=sk-...

Or add to .env.local:

bash
OPENAI_API_KEY=sk-...

3. Connect a lenser

bash
lf lenser ai connect \
  --name "GPT-4o Agent" \
  --type openai-agents \
  --config '{"model": "gpt-4o"}'

Available OpenAI models

ModelBest forCost tier
gpt-4oGeneral purpose, multimodalMedium
gpt-4o-miniFast, cost-effectiveLow
o3Complex reasoningHigh
o3-miniReasoning, lower costMedium

4. Test the connection

bash
lf lenser ai test <lenser-id>

Path C — Anthropic (cloud, BYOK)

1. Get an API key

Create an API key at console.anthropic.com.

2. Export the key

bash
export ANTHROPIC_API_KEY=sk-ant-...

3. Connect a lenser

bash
lf lenser ai connect \
  --name "Claude 4 Agent" \
  --type anthropic \
  --config '{"model": "claude-sonnet-4-20250514"}'

Available Anthropic models

ModelBest forCost tier
claude-sonnet-4-20250514Best balance of speed and qualityMedium
claude-opus-4-20250514Complex analysis and reasoningHigh

Path D — Google AI (cloud, BYOK)

1. Get an API key

Create an API key at aistudio.google.com/app/apikey.

2. Configure

Add to .env.local:

bash
GEMINI_API_KEY=your-gemini-api-key

3. Connect a lenser

bash
lf lenser ai connect \
  --name "Gemini Pro Agent" \
  --type google \
  --config '{"model": "gemini-2.0-flash"}'

Path E — Custom HTTP endpoint

Any endpoint implementing the OpenAI-compatible chat completions API can be used.

Connect a custom endpoint

bash
lf lenser ai connect \
  --name "Custom Model" \
  --type http \
  --config '{
    "baseUrl": "https://your-endpoint.com/v1",
    "model": "your-model-name",
    "headers": {"Authorization": "Bearer your-key"}
  }'

This works with:

  • LM Studio
  • vLLM
  • text-generation-inference
  • Any OpenAI-compatible proxy

Agent configuration

Setting a personality

bash
lf lenser update <lenser-id> \
  --personality "You are a focused research assistant. You cite sources and ask clarifying questions before long tasks."

Setting runtime mode

bash
# Local execution (Ollama)
lf lenser update <lenser-id> --runtime local

# Cloud execution (BYOK providers)
lf lenser update <lenser-id> --runtime cloud

Running a prompt

bash
# Direct prompt execution
lf run exec \
  --lenser-id <lenser-id> \
  --prompt "Explain quantum entanglement in 3 sentences."

# Execute a Lens
lf run exec \
  --lenser-id <lenser-id> \
  --lens my-research-lens \
  --param topic="AI safety"

Prompt execution pipeline

When you execute a prompt through a lenser, the following pipeline runs:

1. Input validation        → parameters checked against Lens schema
2. Prompt rendering        → [[parameters]] replaced with values
3. System prompt assembly  → personality note + lens instructions merged
4. Provider dispatch       → request sent to model endpoint
5. Response streaming      → tokens streamed back to client
6. Output capture          → response stored in run record
7. Cost calculation        → token usage → credit cost

Inspecting runs

bash
# List recent runs
lf execution list --lenser <lenser-id>

# Inspect a specific run
lf execution inspect <run-id>

Tool calling

Lensers can invoke tools during execution. The platform supports:

Tool typeDescription
Built-in toolsWeb search, file read, code execution
Connector toolsExternal API integrations
Custom toolsUser-defined tool schemas
bash
# List available tools
lf tool list

# Attach a tool to a lenser
lf lenser update <lenser-id> --tools web-search,code-exec

Troubleshooting

SymptomProviderFix
Connection refusedOllamaStart ollama serve
Model not foundOllamaRun ollama pull <model>
401 UnauthorizedOpenAI/AnthropicCheck API key is exported
429 Rate limitedAny cloudWait and retry, or reduce concurrency
TimeoutAnyCheck network; increase timeout in config
Out of memoryOllamaUse a smaller model or quantized variant

Next steps