Deploy

The runbook for running A-Market on a Hetzner Cloud VPS with staging and production side-by-side on one host, fronted by a single TLS reverse proxy, deployed from GitHub Actions. Day-to-day server tasks (SSH, env editing, pgAdmin, email, backups) live in Operations.

How it fits together

                         ┌──────────────────── one VPS ────────────────────┐
  Internet ── :443 ──►   │  edge Caddy (auto-TLS)                           │
  mcp.<d> auth.<d> app.<d> docs.<d>  │  proxies by container name over redline_edge
  mcp-staging.<d> … docs-staging.<d> ▼                                      │
                         ┌─ redline-prod ─┐   ┌─ redline-staging ─┐         │
                         │ mcp-server      │   │ mcp-server         │        │
                         │ keycloak        │   │ keycloak           │        │
                         │ postgres        │   │ postgres           │        │
                         │ webapp · docs   │   │ webapp · docs      │        │
                         └────────────────┘   └───────────────────┘         │
                         └──────────────────────────────────────────────────┘
  GitHub Actions ── build images ──► GHCR ──(ssh: git pull + compose pull + up)──► VPS

Images are built once by CI and pushed to GHCR; both environments pull them (staging tracks :staging, prod runs an immutable :v<x.y.z>).
Each environment is a self-contained compose project (docker-compose.yml + deploy/docker-compose.prod.yml), isolated by COMPOSE_PROJECT_NAME.
The edge Caddy (deploy/docker-compose.edge.yml + deploy/Caddyfile) is the only internet-facing piece; it terminates TLS and reverse-proxies to each env's containers over the shared external redline_edge network.
Secrets live in .env.<env> files on the server (never in git or CI).

Prerequisites

A domain, a Hetzner Cloud account + API token, an SSH keypair, this repo on GitHub with GHCR, and locally terraform + ssh + jq/curl for the smoke test.

1. Provision the server (Terraform)

bash

cd infra/terraform
cp terraform.tfvars.example terraform.tfvars   # hcloud_token + ssh_public_key
terraform init && terraform apply

Cloud-init installs Docker, creates the deploy user + the redline_edge network, clones the repo to /opt/redline, and hardens SSH. Lock SSH down via allowed_ssh_cidrs. A cx33 (8 GB) runs both stacks comfortably.

2. DNS records

Point these at the server IP (DNS-only / proxy off so Caddy can terminate TLS):

Record	Purpose
`mcp.<domain>`	prod MCP server
`auth.<domain>`	prod Keycloak (issuer)
`app.<domain>`	prod webapp
`docs.<domain>`	prod docs site
`mcp-staging.<domain>`	staging MCP server
`auth-staging.<domain>`	staging Keycloak
`app-staging.<domain>`	staging webapp
`docs-staging.<domain>`	staging docs site

3. One-time server setup

bash

ssh deploy@<server-ip>
cd /opt/redline
cp .env.prod.example    .env.prod
cp .env.staging.example .env.staging
cp .env.edge.example    .env.edge

In each file set the hostnames (replace example.com), strong per-env secrets (POSTGRES_PASSWORD, KEYCLOAK_ADMIN_PASSWORD), the image refs (incl. DOCS_IMAGE), and ACME_EMAIL + the domains (incl. DOCS_DOMAIN / DOCS_STAGING_DOMAIN) in .env.edge. Leave OAUTH_AUDIENCE at its default (https://api.redline.app) — it must match the realm audience mapper. Then start the edge proxy:

bash

./deploy/deploy.sh edge

Optional, per environment: to let account erasure also delete the user's Keycloak login (not just their app data), provision the redline-account-admin service account — see Enable Keycloak login deletion on account erasure.

4. GitHub setup (CI/CD)

Create two Environments (staging, production). In each set VPS_HOST, VPS_USER (deploy), and VPS_SSH_KEY (the private key). Make the GHCR packages public, or set GHCR_USER + GHCR_TOKEN (a read:packages PAT). Optionally add a required reviewer to production to gate prod deploys (this repo runs them ungated).

The workflow (.github/workflows/deploy.yml):

push to main → builds :staging images → deploys staging (app + docs).
push tag v* → builds :prod (+ :<version>) → full release: deploys staging then production, stamps the version into /version, and publishes the release's docs.
Run workflow (manual) → pick the environment.

5. Releasing

Staging redeploys on every push to main (app + docs):

bash

git push origin main
node scripts/auth-smoke.mjs https://mcp-staging.<domain> https://auth-staging.<domain>

Cutting a release

A release is a v* tag — pushing it is the only manual step; CI does the rest.

Bump the version. Set "version" in package.json to the new number and commit it to main (e.g. a chore(release): v0.3.1 commit/PR). Keep it in sync with the tag.

Tag and push from an up-to-date main:

bash

git checkout main && git pull
git tag -a v0.3.1 -m "release: v0.3.1"
git push origin v0.3.1

CI runs the full release (.github/workflows/deploy.yml, no approval gate):
- builds & pushes :prod + :0.3.1 images for mcp-server, webapp, and docs;
- deploys staging then production (serialized so they don't race on the VPS);
- stamps the tag version into APP_VERSION on both environments, so /version + get_version report it (continuous main-push staging deploys keep the last released version);
- publishes the release's docs to docs.<domain> (part of the prod-stack deploy — there's no separate docs job).

Verify:

bash

curl https://mcp.<domain>/v1/version       # → {"version":"0.3.1"}
node scripts/auth-smoke.mjs https://mcp.<domain> https://auth.<domain>
# app.<domain> + docs.<domain> now serve the released build

Rollback: point the *_IMAGE refs in .env.prod at an earlier :<version> and run ./deploy/deploy.sh prod.

Set the reported version without a release

APP_VERSION is auto-stamped on every release; to set it ad-hoc (wraps deploy/set-app-version.sh over SSH):

bash

VPS_HOST=<vps-ip> npm run set-app-version -- 0.3.1                 # prod, recreates the mcp-server
VPS_HOST=<vps-ip> npm run set-app-version -- 0.3.1 --env staging

Once a host's DNS resolves, uncomment its block in deploy/Caddyfile (if commented), push, and run ./deploy/deploy.sh edge on the server to reload Caddy.

The docs site (docs.amrkt.ch)

The documentation site you're reading is apps/docs (VitePress), served by the edge Caddy on docs.<domain> / docs-staging.<domain> from a redline-docs container that CI builds and pushes to GHCR like the webapp.

Docs ship with their environment — the docs container is just another service in each stack, so it follows the same cadence as the app:

Staging (docs-staging.amrkt.ch) updates on every push to main: deploy-staging redeploys the staging stack, pulling the freshly built redline-docs:staging (DOCS_IMAGE=…:staging in .env.staging).
Production (docs.amrkt.ch) updates only when you cut a release (v* tag): deploy-production redeploys the prod stack, pulling redline-docs:prod (DOCS_IMAGE=…:prod in .env.prod, pinned like the app images).

The edge Caddy serves docs.<domain> / docs-staging.<domain> out of the box (deploy/Caddyfile). To bring it up on a deployment that doesn't have it yet:

DNS — add docs.<domain> + docs-staging.<domain> A records → the VPS IP.
Server env — set DOCS_IMAGE=ghcr.io/<owner>/redline-docs:staging in /opt/redline/.env.staging and …:prod in .env.prod, plus DOCS_DOMAIN / DOCS_STAGING_DOMAIN in .env.edge.
Apply — ./deploy/deploy.sh staging (and prod on a release), then ./deploy/deploy.sh edge so Caddy provisions the docs host certs.

Observability (Sentry + logging)

Backend logging is structured JSON via pino (apps/mcp-server/src/logger.ts), always to stderr (stdout is the stdio JSON-RPC stream); auth headers, tokens and passwords are redacted (LOG_LEVEL, default info). Sentry is opt-in (errors only, no tracing): set SENTRY_DSN on the mcp-server via .env.<env>; the webapp bakes VITE_SENTRY_DSN + source-map upload (SENTRY_AUTH_TOKEN/SENTRY_ORG/SENTRY_PROJECT, SENTRY_URL=https://de.sentry.io for the EU org amrkt) at build time. With no DSN the SDKs stay disabled.

Production hardening checklist

[ ] Strong, unique POSTGRES_PASSWORD / KEYCLOAK_ADMIN_PASSWORD per environment.
[ ] allowed_ssh_cidrs locked to your IP/VPN.
[ ] In the prod realm: registrationAllowed=false, tighten webOrigins to your domains, remove the demo dealer user once real dealers exist.
[ ] Database backups scheduled (see Operations).
[ ] Uptime monitor on /healthz and Keycloak /health.

Splitting staging onto its own VPS

terraform apply a second server, point the production Environment's VPS_HOST at the new box, move the prod DNS, and run the prod stack + an edge proxy there. Each environment is already a self-contained stack, so nothing else changes.

Troubleshooting

"Client not found" on sign-in → a client was added to keycloak/realm-export.json after the realm was created. Provision it into the running realm: bash keycloak/provision-webapp-client.sh staging.
401 even with a token → audience mismatch; the token's aud must equal OAUTH_AUDIENCE (both default https://api.redline.app).
invalid issuer / JWKS errors → KEYCLOAK_ISSUER_URL must equal the token iss (https://auth.<domain>).
Keycloak redirect/HTTPS loops behind Caddy → ensure KC_PROXY_HEADERS=xforwarded.
Caddy can't get a certificate → the host's DNS must resolve here and ports 80/443 open; don't enable a host's Caddy block before its DNS exists.
network redline_edge not found → docker network create redline_edge (cloud-init and deploy.sh do this automatically).

Deploy ​

How it fits together ​

Prerequisites ​

1. Provision the server (Terraform) ​

2. DNS records ​

3. One-time server setup ​

4. GitHub setup (CI/CD) ​

5. Releasing ​

Cutting a release ​

Set the reported version without a release ​

The docs site (docs.amrkt.ch) ​

Observability (Sentry + logging) ​

Production hardening checklist ​

Splitting staging onto its own VPS ​

Troubleshooting ​

Deploy

How it fits together

Prerequisites

1. Provision the server (Terraform)

2. DNS records

3. One-time server setup

4. GitHub setup (CI/CD)

5. Releasing

Cutting a release

Set the reported version without a release

The docs site (docs.amrkt.ch)

Observability (Sentry + logging)

Production hardening checklist

Splitting staging onto its own VPS

Troubleshooting