AI split — Puter.js (browser) + Cloudflare Workers AI (server)
AI split — Puter.js (browser) + Cloudflare Workers AI (server)
Decision
The family ships two AI providers, picked by surface:
| Surface | Service | Why this surface |
|---|---|---|
Browser (chat in oriz-me, on-page assistants, any client-side AI feature) |
Puter.js | User-pays free tier; we ship no API key, the family pays nothing per request |
Server (inside the umbrella Hono Worker at api.oriz.in) |
Cloudflare Workers AI | Native Worker binding, zero-egress within Cloudflare, 10K neurons / day free, no card |
OpenRouter remains rejected (already locked); Firebase AI Logic (Gemini) is available via the Firebase basics skill if a feature truly needs Google's specific model, but is not the default.
Why
A single AI provider can't cover both surfaces well:
- Browser-only providers (Puter.js) put auth + billing on the user, so the family ships nothing per request. But the API key can't ride into a Worker without exposing it.
- Server-only providers (Workers AI) bind natively to a Worker and run on the same edge node — best p50 of any inference path — but can't be called from a static Cloudflare Pages page without a relay.
Splitting by surface keeps each free tier reserved for its intended
workload, so neither cliff hits prematurely. Cloudflare Workers AI
also fits the family's stack-cohesion posture (same as
queue-cloudflare-native.md — same
account, same wrangler.toml, same no-card billing surface).
Implications
- Browser AI features import the Puter.js script + use its model IDs (DeepSeek R1, Llama, Moonshot). No API key in client code.
- Server AI features declare an
[ai]binding in the Hono Worker'swrangler.tomland callenv.AI.run("@cf/..."). Inference happens on the same edge node as the Worker. - Embeddings (e.g. for any future RAG over
oriz-me-data) go through Workers AI's BGE / Nomic models, cached by content hash so re-runs cost zero neurons. - Image generation (occasional og:image fallback alongside Satori) goes through Workers AI's Stable Diffusion XL Lightning when needed.
- Whisper for ASR (podcast / video transcription) goes through Workers AI server-side.
- Quota headroom per
rules/interaction/never-hit-quotas.md: the Hono Worker tracks neurons consumed per day in KV; trips a soft cap at 50% (5,000 neurons) to flag approach. Browser-side AI never burns the server budget — it goes through Puter.js. - Encapsulation: server-side AI calls live in
apps/api/src/ai/so the swap surface to OpenAI / Anthropic / Hugging Face is one file if Workers AI's catalog ever falls short. - Documentation must clarify the split — a future contributor must not pull the Puter.js SDK into the Worker (it won't work) or call Workers AI's binding from the browser (no binding exists).