← knowledge.oriz.in

Alternative free-forever backup channels for GitHub code and metadata

decision decisionbackupdisaster-recoverymetadatacloudflare-r2backblaze-b2huggingfacegithub-migration

Alternative free-forever backup channels

Context

Our primary backup strategy is the 6-host active git mirror (using GitHub Actions to push code). However, git mirroring only copies the source code and git history. It does NOT backup repository metadata:

This decision documents alternative, free-forever, automated methods to back up both repositories and their associated metadata.


1. Native GitHub Migration API (Full Export)

GitHub offers a native migration API that generates a single, downloadable .tar.gz archive containing the entire repository structure and metadata.

Capabilities

Automated Flow (GH Actions + Target Storage)

  1. Trigger: A weekly GitHub Actions cron runs a script to start the migration:
    # Start migration for repository or organization
    curl -X POST -H "Authorization: token $GH_PAT" \
      -H "Accept: application/vnd.github+json" \
      "https://api.github.com/orgs/chirag127/migrations" \
      -d '{"repositories":["workspace"],"exclude_attachments":false}'
    
  2. Poll: The workflow sleeps/polls the status endpoint until the migration state becomes exported.
  3. Download: Downloads the archive using the returned download URL:
    curl -H "Authorization: token $GH_PAT" \
      -o backup.tar.gz \
      "https://api.github.com/orgs/chirag127/migrations/$MIGRATION_ID/archive"
    
  4. Push to Storage: Stream/copy the archive to one of the free-forever storage layers below.

2. Cloudflare R2 (10 GB S3-Compatible Free Storage) — PREFERRED

Cloudflare R2 provides a free-tier object storage bucket with zero egress fees, making it the perfect remote archive repository for backup bundles.

Limits

Automation via AWS CLI

Since R2 is S3-compatible, the pre-installed aws CLI on GitHub runners can copy backups directly:

- name: Upload to Cloudflare R2
  run: |
    aws s3 cp backup.tar.gz s3://oriz-backups/github-backup-$(date +%F).tar.gz \
      --endpoint-url ${{ secrets.R2_ENDPOINT }}
  env:
    AWS_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
    AWS_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
    AWS_DEFAULT_REGION: auto

3. Backblaze B2 (10 GB Free Storage) — SECONDARY

Backblaze B2 is an alternative object storage option offering a free-forever tier without requiring a credit card.

Limits

Automation via Rclone or B2 CLI

You can use rclone (pre-installed on runner or run-time script) to sync backups:

rclone copy backup.tar.gz b2:oriz-backups-bucket/ -v

4. Hugging Face Datasets — AVOID for generic file backup

?? CAVEAT 2026-06-24: Hugging Face's "free unlimited" hosting is intended for public ML datasets and model weights. Using HF Datasets as a generic file backup target for tarballs / metadata archives:

Use R2 or B2 instead for backup tarballs. Only use HF Datasets for things that are genuinely ML-related (e.g. the oriz-ai-providers-data repo, or training datasets if those ever exist).


5. Open-Source Metadata Backup Tools (CLI)

For custom metadata extraction (saving issues/PRs as readable JSON rather than binary tarballs), the following CLI tools can be executed inside GitHub Actions:

  1. github-backup (Python):
    • Command: github-backup --token $GH_PAT --output-directory ./backup --all --private-embed-key chirag127
    • Backs up repositories, wikis, issues, pull requests, milestones, labels, and releases.
  2. gitbackup (Go):
    • Excellent for multi-platform clones.
  3. rclone (Go):
    • For syncing local folders of downloaded metadata files to GDrive/OneDrive free tiers.

Recommendation

Default channel: Native GitHub Migration API ? Cloudflare R2 weekly. R2's zero egress + 10 GB free + S3-compatible CLI makes it the cleanest path. Use B2 as a secondary mirror only if you've outgrown R2's 10 GB.

Skip HF Datasets for this purpose — not designed for it, real ToS risk.

Cross-refs