ADR 0005 · accepted · 2026-04-17
0005 — Infrastructure as code for Cloudflare Access
Date: 2026-04-17 Status: accepted
Context
The CoE site (coe.goosegroup.co, hosted on Cloudflare Pages) sits behind Cloudflare Access, gating traffic to emails ending in @balsambrands.com and @goosegroup.co. The initial setup was done through a mix of the Zero Trust dashboard and direct Cloudflare API calls during the issue #2 work.
The dashboard is a bad place to own this config:
- The Zero Trust UI is dense and changes often. Replaying the setup from memory in six months is a gamble.
- Settings drift when multiple people click through. There is no record of who changed what, or why.
- Anything layered on later — Google and Microsoft SSO (issue #1), group rules, session policies, service tokens for machine traffic — compounds that drift.
- The CoE site is expected to grow into something with real app surface area (MCP server, opportunity submission, cohort workspace). Auth config will grow with it.
We also want the repo to practice what the cohort methodology preaches: decisions recorded, changes as pull requests, work visible. Clickops violates that on the one piece of infrastructure that gates access to everything else.
Decision
Cloudflare resources for this site are managed with OpenTofu in infra/tofu/. State lives in Cloudflare R2 via the S3-compatible backend. Changes flow through PRs, with tofu plan running in CI and commenting on the PR; tofu apply runs manually via a workflow_dispatch trigger in GitHub Actions.
Scope of what’s managed:
- Cloudflare Pages custom domain
coe.goosegroup.co - Access application “Balsam CoE” covering
coe.goosegroup.co,balsam-coe.pages.dev, and*.balsam-coe.pages.dev - Access policy allowing
@balsambrands.comand@goosegroup.co
Things explicitly not managed here (for now):
- The
goosegroup.coDNS zone itself. Other records live there (Vercel-hosted marketing site, etc.). Narrow scope. - The One-time PIN identity provider. Cloudflare provisions it with the Zero Trust org; referenced by ID, not managed. Google/Microsoft SSO IdPs added under issue #1 will be managed.
- The Pages project itself. Build and deploy config belongs with whatever front-end tooling lives in the main repo; this module covers the security perimeter around it.
Before first apply, we consolidated the two Access apps that existed (the one we created plus the auto-created *.balsam-coe.pages.dev app Cloudflare Pages generated when the “Access Policy” toggle was flipped) into a single app covering all three hostnames. One app, one policy, one source of truth.
Reasoning
Why OpenTofu and not Terraform. BSL-free, drop-in compatible, same Cloudflare provider. No meaningful downside. Any OSS-friendly posture we can take at the infrastructure layer is free to take now, expensive to retrofit later.
Why R2 and not local state. Local state means one machine can apply — and if that machine is lost, the state is lost with it. R2 is native to the same account we’re managing, costs nothing at this volume, and gives CI a place to lock and write state during applies.
Why manual dispatch and not auto-apply-on-merge. Applying infra changes automatically on merge is fine once the team trusts the plan output. Today the team is small and the blast radius of a bad apply — locking everyone out of the CoE site — is non-trivial. Manual dispatch keeps the human confirmation step. We flip this when we’re bored of clicking it.
Why consolidate the two apps. The Pages-auto app was created by the Pages UI “Access Policy” toggle and re-syncs when that toggle is touched. If we leave it in place, Tofu and the dashboard are in tension forever. One app with a wildcard destination is simpler and has the same security properties.
Why the first iteration only covers Access and the Pages domain. Keep the module small enough that the first PR is reviewable. Grow it as SSO, future apps, and observability land.
Consequences
- The Cloudflare Zero Trust dashboard becomes a read-only surface for anything in scope. Changes in the dashboard will be detected as drift on the next plan. The README spells out which toggles to specifically avoid.
- Adding Google/Microsoft SSO (issue #1) is a PR to
cloudflare.tf, not a dashboard click. - Rotating the Cloudflare API token is a credential change in GitHub Actions secrets and the local
.env. No code change. - A new person joining the team gets the runbook in
infra/tofu/README.mdand can apply changes without being walked through the Zero Trust UI. - We own a small operational surface (state bucket, API token, Actions workflows). The total cost is a ~6-file module and two short workflows, set up once.
Open items
- Convert the manual-dispatch apply to auto-apply-on-merge after a few successful plan/apply cycles.
- Add state locking via the DynamoDB-compatible table option once R2 supports it, or accept the risk of concurrent applies (unlikely with one-person ops today).
- When the site gains an app layer (MCP server, submission, cohort workspace), extend this module or split it as scope demands.