type: decision
status: active
timestamp: 2026-06-30
tags: [ai, inference, ollama, cloudflare-workers-ai, puter-js, no-card, grill-decision, fallback-ladder]

Zero-cost inference backends — Ollama + Cloudflare Workers AI + Puter.js

Approved LLM endpoints when not using paid Claude/GPT keys. Local (Ollama) + serverless (Workers AI) + browser (Puter.js). Zero card, zero subscription. Grill-locked 2026-06-30 alongside gemini-cli-agent-addition.

Zero-cost inference backends (2026-06-30 grill)

Decision

Three zero-cost model backends are approved for oriz workflows, each marking a distinct deployment surface:

BackendSurfaceFree tierCard?Role
OllamaLocal, dev machineUnlimited (your GPU)n/aPrimary dev runtime; offline; CI on workstation
Cloudflare Workers AIServerless, edge Worker10,000 neurons/dayNOPrimary serverless runtime; prod-side inference
Puter.jsBrowser, end-user paysUnlimited from our sideNO (end-user may optionally add one to their Puter account)User-facing chat and on-page AI features

All three pass the no-card-on-file hard rule. All three already have service entries in knowledge/services/business/ai/. This decision codifies them as a single fallback ladder end-to-end.

Why a ladder (not pick-one)

Routing rules

WorkloadBackend (first pick)Fallback
Dev on laptop, no network / offlineOllama (localhost:11434/v1/chat/completions)Cloudflare Workers AI
Prod inference inside a Cloudflare WorkerCloudflare Workers AI (env.AI.bind() native binding)Puter.js dispatched to client
User-facing chat in browserPuter.jsFall back to Workers AI for hard server-side steps
Open-source CLI agent failover (any of Aider, Cline, Kilo Code, OpenCode, gocode, Coddy)Ollama at localhost:11434 (all confirmed OpenAI-compat)Cloudflare Workers AI over HTTPS
Free-tier hosted Google model for general chatGemini CLI — see gemini-cli-agent-addition-2026-06-30n/a (no public REST)

Per-backend details

Quota invariants (per never-hit-quotas)

BackendSoft alarm tripCap
Cloudflare Workers AI5,000 neurons/day (50%)10,000 neurons/day hard cap
Local OllamaDisk spaceNo API-side cap
Puter.jsn/a (end-user pays)Per-user at Puter’s discretion
Gemini CLI600 req/day (60%); 36 req/min (60%)1,000 req/day, 60 req/min

What this decision does NOT do

Cross-refs


Edit on GitHub · Back to index