Files
2026-06-05 09:38:42 -04:00

3.9 KiB

Server: mercury

Last verified: 2026-06-04

Access

Hostname mercury
Domain mercury.kennethreitz.org
IP 5.161.122.181 (Hetzner)
SSH ssh root@mercury.kennethreitz.org (key auth)
Dokploy UI https://mercury.kennethreitz.org

Specs

OS Ubuntu 26.04 LTS
Kernel 7.0.0-15-generic
Server type Hetzner Cloud CPX41 (upgraded from CPX31 2026-06-05; 8 vCPU, 15.6 GiB; root disk kept at 150 GB for downgrade flexibility) (id 136742397, dc ash-dc1)
CPU 8 vCPU
RAM 15.6 GiB
Disk 150 GB (/dev/sda1)
Volume mercury-objects (id 105925944), 750 GB ext4 at /mnt/objects (fstab, nofail) — MinIO data
Swap 4 GB swapfile (/swapfile, swappiness 10) — added 2026-06-05 after Immich's arrival OOM-killed the Dokploy service (exit 137 loop → bad gateway). Immich server/ML now carry 2g/1.5g mem_limits.
Firewall Hetzner Cloud Firewall mercury-web (id 11085164): inbound 22/80/443tcp+udp/2222 + ICMP only. Blocks the otherwise-public Dokploy :3000 and Traefik dashboard :8080 (api.insecure). Manage via Hetzner API.

Stack

Docker 29.5.3 running in single-node Swarm mode (node mercury, manager/leader).

Core services (Dokploy platform)

Service Image Notes
dokploy dokploy/dokploy:v0.29.7 Swarm service, port 3000
dokploy-postgres postgres:16 Swarm service, Dokploy's own DB
dokploy-redis redis:7 Swarm service
dokploy-traefik traefik:v3.6.7 Plain container; ports 80/443 (+443/udp for HTTP/3), 8080

Traefik terminates TLS for mercury.kennethreitz.org and proxies to the Dokploy UI.

Deployed applications

See inventory.md. Currently:

Deploys

All Swarm applications use start-first update ordering with rollback on failure (set via application.updateupdateConfigSwarm). kennethreitz.org and kjvstudy additionally have Swarm healthchecks on /health (60s start period) so traffic only moves to a warmed container — deploys are zero-downtime (verified by probing during a live deploy). Durations in these API fields are nanoseconds.

TLS / ACME

Traefik's letsencrypt resolver uses the HTTP-01 challenge. All certs issued.

Lessons from the 2026-06-05 Fly migration (cost ~1.5h of cert warnings):

  • While DNS still pointed at Fly, every validation failed; 5 failed authorizations/hour/domain trips Let's Encrypt's rate limiter, and each retry during the stale window extends it — when this happens, stop retrying and wait out the window (exact expiry is in the 429 in Traefik logs).
  • After a rate-limit stall, Traefik does not retry on its own — restart the dokploy-traefik container to trigger fresh orders.
  • DNS-01 via DNSimple is not possible on this account: lego requires an account token (dnsimple_a_…) and those aren't available at the current DNSimple plan level. HTTP-01 is the permanent strategy.
  • Doctrine for new domains (avoids every cert failure we've had): create the DNS record FIRST, verify all four ns*.dnsimple-edge.* nameservers serve it (their edge propagation can lag many minutes), and only THEN attach the domain in Dokploy. If a validation still fails, wait out any rate-limit window and restart dokploy-traefik exactly once.