Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
3.9 KiB
Server: mercury
Last verified: 2026-06-04
Access
| Hostname | mercury |
| Domain | mercury.kennethreitz.org |
| IP | 5.161.122.181 (Hetzner) |
| SSH | ssh root@mercury.kennethreitz.org (key auth) |
| Dokploy UI | https://mercury.kennethreitz.org |
Specs
| OS | Ubuntu 26.04 LTS |
| Kernel | 7.0.0-15-generic |
| Server type | Hetzner Cloud CPX41 (upgraded from CPX31 2026-06-05; 8 vCPU, 15.6 GiB; root disk kept at 150 GB for downgrade flexibility) (id 136742397, dc ash-dc1) |
| CPU | 8 vCPU |
| RAM | 15.6 GiB |
| Disk | 150 GB (/dev/sda1) |
| Volume | mercury-objects (id 105925944), 750 GB ext4 at /mnt/objects (fstab, nofail) — MinIO data |
| Swap | 4 GB swapfile (/swapfile, swappiness 10) — added 2026-06-05 after Immich's arrival OOM-killed the Dokploy service (exit 137 loop → bad gateway). Immich server/ML now carry 2g/1.5g mem_limits. |
| Firewall | Hetzner Cloud Firewall mercury-web (id 11085164): inbound 22/80/443tcp+udp/2222 + ICMP only. Blocks the otherwise-public Dokploy :3000 and Traefik dashboard :8080 (api.insecure). Manage via Hetzner API. |
Stack
Docker 29.5.3 running in single-node Swarm mode (node mercury, manager/leader).
Core services (Dokploy platform)
| Service | Image | Notes |
|---|---|---|
dokploy |
dokploy/dokploy:v0.29.7 |
Swarm service, port 3000 |
dokploy-postgres |
postgres:16 |
Swarm service, Dokploy's own DB |
dokploy-redis |
redis:7 |
Swarm service |
dokploy-traefik |
traefik:v3.6.7 |
Plain container; ports 80/443 (+443/udp for HTTP/3), 8080 |
Traefik terminates TLS for mercury.kennethreitz.org and proxies to the Dokploy UI.
Deployed applications
See inventory.md. Currently:
- httpbin — https://httpbin.kennethreitz.org (
kennethreitz/httpbin) - poemsbysarah — https://poemsbysarah.com (built from
kennethreitz/sarah-poems) - kjvstudy — https://kjvstudy.org (built from
kennethreitz/kjvstudy.org) - kennethreitz.org — https://kennethreitz.org (built from
kennethreitz/kennethreitz.org) - interpretations — https://interpretations.kennethreitz.org (built from
kennethreitz/interpretations) - photos — https://photos.kennethreitz.org (compose: web + celery worker + postgres17 db, from
kennethreitz/photos.kennethreitz.org)
Deploys
All Swarm applications use start-first update ordering with rollback on failure
(set via application.update → updateConfigSwarm). kennethreitz.org and kjvstudy
additionally have Swarm healthchecks on /health (60s start period) so traffic only
moves to a warmed container — deploys are zero-downtime (verified by probing during
a live deploy). Durations in these API fields are nanoseconds.
TLS / ACME
Traefik's letsencrypt resolver uses the HTTP-01 challenge. All certs issued.
Lessons from the 2026-06-05 Fly migration (cost ~1.5h of cert warnings):
- While DNS still pointed at Fly, every validation failed; 5 failed authorizations/hour/domain trips Let's Encrypt's rate limiter, and each retry during the stale window extends it — when this happens, stop retrying and wait out the window (exact expiry is in the 429 in Traefik logs).
- After a rate-limit stall, Traefik does not retry on its own — restart the
dokploy-traefikcontainer to trigger fresh orders. - DNS-01 via DNSimple is not possible on this account: lego requires an
account token (
dnsimple_a_…) and those aren't available at the current DNSimple plan level. HTTP-01 is the permanent strategy. - Doctrine for new domains (avoids every cert failure we've had): create
the DNS record FIRST, verify all four
ns*.dnsimple-edge.*nameservers serve it (their edge propagation can lag many minutes), and only THEN attach the domain in Dokploy. If a validation still fails, wait out any rate-limit window and restartdokploy-traefikexactly once.