Data lives in each app's own repo — no separate data repos for janaushdhi/ncert/financial-cards

decision Mon Jun 22 2026 00:00:00 GMT+0000 (Coordinated Universal Time) decisiondatamono-app-reposno-data-splitcron

Data lives in each app's own repo

Decision

Each data-driven app (oriz-janaushdhi-app, oriz-ncert-app, oriz-financial-cards-app, etc.) keeps its own data in its own data/ directory inside its own GitHub repo.

NO separate oriz-*-data repos created for any of them.

Existing API service repos (oriz-flow-fii-dii-activity-api, oriz-mmi-tickertape-mmi-api) stay — they're services with data/ dirs of their own, served via GH Pages.

Why

User mandate verbatim: "None of them require a separate data repo. All data in the repo of their creation. We are moving to the monorepo. I don't want to increase the number of repositories just for the sake of it."

51 submodules is enough. Adding 3-5 more -data repos for the sake of architectural purity isn't worth the maintenance overhead.

How data updates work

Per app, daily/weekly/monthly cron in .github/workflows/scrape.yml:

Scraper script runs (e.g. Playwright fetches the medicine CSV)
Output ? data/<YYYY-MM-DD>.json + data/latest.json in the app repo
Workflow commits + pushes to main
CF Pages auto-redeploys on push
Site rebuilds with the fresh data baked in

App-level GH Action handles everything; zero external coordination.

Runtime fetch for freshness

Where data MUST be live (intraday market data, live counters), apps lazy-fetch from raw URLs:

paisa-finance fetches FII/DII + MMI from raw.githubusercontent.com/chirag127/oriz-flow-fii-dii-activity-api/main/data/latest.json + similar for MMI
Lazy + SWR (stale-while-revalidate) + localStorage 1h TTL — shows cached immediately, fetches fresh in background

Cross-refs

Market data per-repo pattern ? [[decisions/ops/market-data-per-repo]]
janaushdhi app scope ? [[decisions/apps/janaushdhi-app-scope]]
ncert combined PDF directory ? [[decisions/apps/ncert-combined-pdf-directory]]