Files
2026-06-05 09:38:42 -04:00

82 lines
3.9 KiB
Markdown

# Server: mercury
*Last verified: 2026-06-04*
## Access
| | |
|---|---|
| Hostname | `mercury` |
| Domain | `mercury.kennethreitz.org` |
| IP | `5.161.122.181` (Hetzner) |
| SSH | `ssh root@mercury.kennethreitz.org` (key auth) |
| Dokploy UI | https://mercury.kennethreitz.org |
## Specs
| | |
|---|---|
| OS | Ubuntu 26.04 LTS |
| Kernel | 7.0.0-15-generic |
| Server type | Hetzner Cloud CPX41 (upgraded from CPX31 2026-06-05; 8 vCPU, 15.6 GiB; root disk kept at 150 GB for downgrade flexibility) (id `136742397`, dc `ash-dc1`) |
| CPU | 8 vCPU |
| RAM | 15.6 GiB |
| Disk | 150 GB (`/dev/sda1`) |
| Volume | `mercury-objects` (id `105925944`), 750 GB ext4 at `/mnt/objects` (fstab, nofail) — MinIO data |
| Swap | 4 GB swapfile (`/swapfile`, swappiness 10) — added 2026-06-05 after Immich's arrival OOM-killed the Dokploy service (exit 137 loop → bad gateway). Immich server/ML now carry 2g/1.5g mem_limits. |
| Firewall | Hetzner Cloud Firewall `mercury-web` (id `11085164`): inbound 22/80/443tcp+udp/2222 + ICMP only. Blocks the otherwise-public Dokploy :3000 and Traefik dashboard :8080 (api.insecure). Manage via Hetzner API. |
## Stack
Docker **29.5.3** running in single-node **Swarm** mode (node `mercury`, manager/leader).
### Core services (Dokploy platform)
| Service | Image | Notes |
|---|---|---|
| `dokploy` | `dokploy/dokploy:v0.29.7` | Swarm service, port 3000 |
| `dokploy-postgres` | `postgres:16` | Swarm service, Dokploy's own DB |
| `dokploy-redis` | `redis:7` | Swarm service |
| `dokploy-traefik` | `traefik:v3.6.7` | Plain container; ports 80/443 (+443/udp for HTTP/3), 8080 |
Traefik terminates TLS for `mercury.kennethreitz.org` and proxies to the Dokploy UI.
## Deployed applications
See [inventory.md](inventory.md). Currently:
- **httpbin** — https://httpbin.kennethreitz.org (`kennethreitz/httpbin`)
- **poemsbysarah** — https://poemsbysarah.com (built from `kennethreitz/sarah-poems`)
- **kjvstudy** — https://kjvstudy.org (built from `kennethreitz/kjvstudy.org`)
- **kennethreitz.org** — https://kennethreitz.org (built from `kennethreitz/kennethreitz.org`)
- **interpretations** — https://interpretations.kennethreitz.org (built from `kennethreitz/interpretations`)
- **photos** — https://photos.kennethreitz.org (compose: web + celery worker + postgres17 db, from `kennethreitz/photos.kennethreitz.org`)
## Deploys
All Swarm applications use `start-first` update ordering with rollback on failure
(set via `application.update``updateConfigSwarm`). kennethreitz.org and kjvstudy
additionally have Swarm healthchecks on `/health` (60s start period) so traffic only
moves to a warmed container — deploys are zero-downtime (verified by probing during
a live deploy). Durations in these API fields are nanoseconds.
## TLS / ACME
Traefik's `letsencrypt` resolver uses the **HTTP-01 challenge**. All certs issued.
Lessons from the 2026-06-05 Fly migration (cost ~1.5h of cert warnings):
- While DNS still pointed at Fly, every validation failed; 5 failed
authorizations/hour/domain trips Let's Encrypt's rate limiter, and **each
retry during the stale window extends it** — when this happens, stop
retrying and wait out the window (exact expiry is in the 429 in Traefik logs).
- After a rate-limit stall, Traefik does not retry on its own — restart the
`dokploy-traefik` container to trigger fresh orders.
- DNS-01 via DNSimple is **not possible on this account**: lego requires an
account token (`dnsimple_a_…`) and those aren't available at the current
DNSimple plan level. HTTP-01 is the permanent strategy.
- **Doctrine for new domains** (avoids every cert failure we've had): create
the DNS record FIRST, verify all four `ns*.dnsimple-edge.*` nameservers
serve it (their edge propagation can lag many minutes), and only THEN
attach the domain in Dokploy. If a validation still fails, wait out any
rate-limit window and restart `dokploy-traefik` exactly once.