← knowledge.oriz.in

Data lives in each app's own repo — no separate data repos for janaushdhi/ncert/financial-cards

decision decisiondatamono-app-reposno-data-splitcron

Data lives in each app's own repo

Decision

Each data-driven app (oriz-janaushdhi-app, oriz-ncert-app, oriz-financial-cards-app, etc.) keeps its own data in its own data/ directory inside its own GitHub repo.

NO separate oriz-*-data repos created for any of them.

Existing API service repos (oriz-flow-fii-dii-activity-api, oriz-mmi-tickertape-mmi-api) stay — they're services with data/ dirs of their own, served via GH Pages.

Why

User mandate verbatim: "None of them require a separate data repo. All data in the repo of their creation. We are moving to the monorepo. I don't want to increase the number of repositories just for the sake of it."

51 submodules is enough. Adding 3-5 more -data repos for the sake of architectural purity isn't worth the maintenance overhead.

How data updates work

Per app, daily/weekly/monthly cron in .github/workflows/scrape.yml:

  1. Scraper script runs (e.g. Playwright fetches the medicine CSV)
  2. Output ? data/<YYYY-MM-DD>.json + data/latest.json in the app repo
  3. Workflow commits + pushes to main
  4. CF Pages auto-redeploys on push
  5. Site rebuilds with the fresh data baked in

App-level GH Action handles everything; zero external coordination.

Runtime fetch for freshness

Where data MUST be live (intraday market data, live counters), apps lazy-fetch from raw URLs:

Cross-refs