kennethreitz/mercury.kennethreitz.org

Files

T

kennethreitz 9ecc56e733 ACME doctrine: HTTP-01 permanent (no account tokens on plan); DNS-first ordering

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-05 09:38:42 -04:00

3.9 KiB

Raw Permalink Blame History

Server: mercury

Last verified: 2026-06-04

Access


Hostname	`mercury`
Domain	`mercury.kennethreitz.org`
IP	`5.161.122.181` (Hetzner)
SSH	`ssh root@mercury.kennethreitz.org` (key auth)
Dokploy UI	https://mercury.kennethreitz.org

Specs


OS	Ubuntu 26.04 LTS
Kernel	7.0.0-15-generic
Server type	Hetzner Cloud CPX41 (upgraded from CPX31 2026-06-05; 8 vCPU, 15.6 GiB; root disk kept at 150 GB for downgrade flexibility) (id `136742397`, dc `ash-dc1`)
CPU	8 vCPU
RAM	15.6 GiB
Disk	150 GB (`/dev/sda1`)
Volume	`mercury-objects` (id `105925944`), 750 GB ext4 at `/mnt/objects` (fstab, nofail) — MinIO data
Swap	4 GB swapfile (`/swapfile`, swappiness 10) — added 2026-06-05 after Immich's arrival OOM-killed the Dokploy service (exit 137 loop → bad gateway). Immich server/ML now carry 2g/1.5g mem_limits.
Firewall	Hetzner Cloud Firewall `mercury-web` (id `11085164`): inbound 22/80/443tcp+udp/2222 + ICMP only. Blocks the otherwise-public Dokploy :3000 and Traefik dashboard :8080 (api.insecure). Manage via Hetzner API.

Stack

Docker 29.5.3 running in single-node Swarm mode (node mercury, manager/leader).

Core services (Dokploy platform)

Service	Image	Notes
`dokploy`	`dokploy/dokploy:v0.29.7`	Swarm service, port 3000
`dokploy-postgres`	`postgres:16`	Swarm service, Dokploy's own DB
`dokploy-redis`	`redis:7`	Swarm service
`dokploy-traefik`	`traefik:v3.6.7`	Plain container; ports 80/443 (+443/udp for HTTP/3), 8080

Traefik terminates TLS for mercury.kennethreitz.org and proxies to the Dokploy UI.

Deployed applications

See inventory.md. Currently:

httpbin — https://httpbin.kennethreitz.org (kennethreitz/httpbin)
poemsbysarah — https://poemsbysarah.com (built from kennethreitz/sarah-poems)
kjvstudy — https://kjvstudy.org (built from kennethreitz/kjvstudy.org)
kennethreitz.org — https://kennethreitz.org (built from kennethreitz/kennethreitz.org)
interpretations — https://interpretations.kennethreitz.org (built from kennethreitz/interpretations)
photos — https://photos.kennethreitz.org (compose: web + celery worker + postgres17 db, from kennethreitz/photos.kennethreitz.org)

Deploys

All Swarm applications use start-first update ordering with rollback on failure (set via application.update → updateConfigSwarm). kennethreitz.org and kjvstudy additionally have Swarm healthchecks on /health (60s start period) so traffic only moves to a warmed container — deploys are zero-downtime (verified by probing during a live deploy). Durations in these API fields are nanoseconds.

TLS / ACME

Traefik's letsencrypt resolver uses the HTTP-01 challenge. All certs issued.

Lessons from the 2026-06-05 Fly migration (cost ~1.5h of cert warnings):

While DNS still pointed at Fly, every validation failed; 5 failed authorizations/hour/domain trips Let's Encrypt's rate limiter, and each retry during the stale window extends it — when this happens, stop retrying and wait out the window (exact expiry is in the 429 in Traefik logs).
After a rate-limit stall, Traefik does not retry on its own — restart the dokploy-traefik container to trigger fresh orders.
DNS-01 via DNSimple is not possible on this account: lego requires an account token (dnsimple_a_…) and those aren't available at the current DNSimple plan level. HTTP-01 is the permanent strategy.
Doctrine for new domains (avoids every cert failure we've had): create the DNS record FIRST, verify all four ns*.dnsimple-edge.* nameservers serve it (their edge propagation can lag many minutes), and only THEN attach the domain in Dokploy. If a validation still fails, wait out any rate-limit window and restart dokploy-traefik exactly once.

3.9 KiB Raw Permalink Blame History