Skip to content

AI Models

LenserFight routes lens executions through a unified model registry (ai.models). Each model has a canonical key used in API calls, a set of capabilities, and declared input/output modalities.

Capability tags

TagMeaning
chatConversational / instruction-following text generation
reasoningExtended chain-of-thought reasoning (think-before-answer)
toolsFunction / tool calling
visionAccepts image inputs
json_schemaProvider-supported structured JSON output
image_generationProduces images from text prompts
video_generationProduces video clips from text/image prompts
audio_generationProduces speech or general audio
music_generationProduces music or soundtracks

OpenAI

GPT-5.4 Pro

FieldValue
Keygpt-5.4-pro
Capabilitieschat · reasoning · tools · vision · json_schema
Context window400 000 tokens
Input modalitiestext · image · document
Output modalitiestext
Provider docs

Most capable GPT-5.4 tier. Use for complex multi-step tasks, agentic workflows, and vision-heavy lens designs.

GPT-5.4

FieldValue
Keygpt-5.4
Capabilitieschat · reasoning · tools · vision · json_schema
Context window400 000 tokens
Input modalitiestext · image · document
Output modalitiestext
Provider docs

GPT-5.4 Mini

FieldValue
Keygpt-5.4-mini
Capabilitieschat · tools · json_schema
Context window400 000 tokens
Input modalitiestext
Output modalitiestext
Provider docs

Faster, cost-efficient variant. No vision or reasoning.

GPT-5.4 Nano

FieldValue
Keygpt-5.4-nano
Capabilitieschat · json_schema
Context window400 000 tokens
Input modalitiestext
Output modalitiestext
Provider docs

Lowest-latency OpenAI text model. Best for simple classification or extraction pipelines.

GPT-5.2

FieldValue
Keygpt-5.2
Capabilitieschat · reasoning · tools · vision · json_schema
Context window400 000 tokens
Input modalitiestext · image · document
Output modalitiestext
Provider docs

GPT-4o

FieldValue
Keygpt-4o
Capabilitieschat · tools · vision · json_schema
Context window128 000 tokens
Input modalitiestext · image · document
Output modalitiestext
Provider docs

OpenAI — Generative Media

DALL-E 4

FieldValue
Keydall-e-4
Capabilitiesimage_generation
Input modalitiestext
Output modalitiesimage
Provider docs

Synchronous image generation. Returns a signed URL immediately after the API call resolves.

Sora 2.0

FieldValue
Keysora-2.0
Capabilitiesvideo_generation
Input modalitiestext
Output modalitiesvideo
Provider docs

Async video generation. The execution engine returns a pending task ID; the lenser polls until the clip is ready.


Anthropic

Claude Opus 4.6

FieldValue
Keyclaude-opus-4-6
Capabilitieschat · reasoning · tools
Context window200 000 tokens
Input modalitiestext
Output modalitiestext
Provider docs

Anthropic's most capable model. Suited for long-document analysis, multi-step reasoning chains, and complex agentic tasks.

Claude Sonnet 4.6

FieldValue
Keyclaude-sonnet-4-6
Capabilitieschat · reasoning · tools
Context window200 000 tokens
Input modalitiestext
Output modalitiestext
Provider docs

Balanced speed/quality. Default recommendation for most Claude-backed lenses.

Claude Sonnet 4.5

FieldValue
Keyclaude-sonnet-4-5
Capabilitieschat · reasoning · tools
Context window200 000 tokens
Input modalitiestext
Output modalitiestext
Provider docs

Claude Sonnet 4.0

FieldValue
Keyclaude-sonnet-4-0
Capabilitieschat · reasoning · tools
Context window200 000 tokens
Input modalitiestext
Output modalitiestext
Provider docs

Claude Haiku 4.5

FieldValue
Keyclaude-haiku-4-5
Capabilitieschat
Context window200 000 tokens
Input modalitiestext
Output modalitiestext
Provider docs

Fastest, cheapest Anthropic model. Use for high-volume classification or summarisation where speed matters most.

Claude Haiku 3.5

FieldValue
Keyclaude-haiku-3-5
Capabilitieschat
Context window200 000 tokens
Input modalitiestext
Output modalitiestext
Provider docs

Legacy. Prefer Haiku 4.5 for new lenses.


Google

Gemini 3.1 Pro Preview

FieldValue
Keygemini-3.1-pro-preview
Capabilitieschat · reasoning · tools · vision
Context window2 000 000 tokens
Input modalitiestext · image · document
Output modalitiestext
Provider docs

Gemini 3 Pro Preview

FieldValue
Keygemini-3-pro-preview
Capabilitieschat · reasoning · tools · vision
Context window2 000 000 tokens
Input modalitiestext · image · document
Output modalitiestext
Provider docs

Gemini 2.5 Pro

FieldValue
Keygemini-2.5-pro
Capabilitieschat · reasoning · tools · vision
Context window2 000 000 tokens
Input modalitiestext · image · document
Output modalitiestext
Provider docs

Industry-leading 2M-token context. Excellent for large codebase analysis or full-book summarisation.

Gemini 3 Flash Preview

FieldValue
Keygemini-3-flash-preview
Capabilitieschat · tools · vision
Context window1 000 000 tokens
Input modalitiestext · image · document
Output modalitiestext
Provider docs

Gemini 2.5 Flash

FieldValue
Keygemini-2.5-flash
Capabilitieschat · tools · vision
Context window1 000 000 tokens
Input modalitiestext · image · document
Output modalitiestext
Provider docs

Gemini 3.1 Flash Lite Preview

FieldValue
Keygemini-3.1-flash-lite-preview
Capabilitieschat · tools
Context window1 000 000 tokens
Input modalitiestext
Output modalitiestext
Provider docs

Gemini 2.5 Flash Lite

FieldValue
Keygemini-2.5-flash-lite
Capabilitieschat · tools
Context window1 000 000 tokens
Input modalitiestext
Output modalitiestext
Provider docs

Lowest-cost Google text model. Ideal for batch workloads or simple extraction lenses.


Google — Generative Media

Imagen 4

FieldValue
Keyimagen-4
Capabilitiesimage_generation
Input modalitiestext
Output modalitiesimage
Provider docs

Synchronous. High-fidelity photorealistic image generation.

Veo 3

FieldValue
Keyveo-3
Capabilitiesvideo_generation
Input modalitiestext
Output modalitiesvideo
Provider docs

Async video generation (task-poll pattern). Produces cinematic-quality clips.

Lyria 2

FieldValue
Keylyria-2
Capabilitiesaudio_generation · music_generation
Input modalitiestext
Output modalitiesaudio
Provider docs

Async music synthesis. Outputs full instrumental tracks from a text prompt.


Mistral

Mistral Large 3

FieldValue
Keymistral-large-3
Capabilitieschat · tools · json_schema
Context window128 000 tokens
Input modalitiestext
Output modalitiestext
Provider docs

Mistral's flagship instruction model. Strong structured-output and tool-use performance.

Magistral Medium 1.2

FieldValue
Keymagistral-medium-1.2
Capabilitieschat · reasoning
Context window40 000 tokens
Input modalitiestext
Output modalitiestext
Provider docs

Magistral Small 1.2

FieldValue
Keymagistral-small-1.2
Capabilitieschat · reasoning
Context window40 000 tokens
Input modalitiestext
Output modalitiestext
Provider docs

Stability AI

Stable Diffusion 4

FieldValue
Keystable-diffusion-4
Capabilitiesimage_generation
Input modalitiestext · image
Output modalitiesimage
Provider docs

Accepts an optional reference image for image-to-image workflows. Synchronous generation.


ElevenLabs

ElevenLabs v4

FieldValue
Keyelevenlabs-v4
Capabilitiesaudio_generation
Input modalitiestext
Output modalitiesaudio
Provider docs

High-quality text-to-speech with voice cloning. Returns an audio file via the task-poll pattern.


Kling

Kling 2.0

FieldValue
Keykling-2.0
Capabilitiesvideo_generation
Input modalitiestext
Output modalitiesvideo
Provider docs

Async video generation. Strong at character-consistent motion.


Suno

Suno v5

FieldValue
Keysuno-v5
Capabilitiesaudio_generation · music_generation
Input modalitiestext
Output modalitiesaudio
Provider docs

Produces full songs (vocals + instrumentation) from a prompt. Async; uses the task-poll pattern.


Midjourney

Midjourney 7

FieldValue
Keymidjourney-7
Capabilitiesimage_generation
Input modalitiestext
Output modalitiesimage
Provider docs

Premium artistic image generation. Not yet active in the default registry; enable via BYOK.


Using a model in a lens

Reference a model by its key in the lens version's model_id field or pass it in the execution DTO:

json
{
  "model_id": "gemini-2.5-pro",
  "input_snapshot": { "prompt": "Explain quantum entanglement simply." },
  "funding_source": "platform_credit"
}

Generative media lenses declare their output in output_contract.kind (image, video, audio, music). The execution engine routes to the correct GenerativeMediaAdapter automatically.