9ecc56e733
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
82 lines
3.9 KiB
Markdown
82 lines
3.9 KiB
Markdown
# Server: mercury
|
|
|
|
*Last verified: 2026-06-04*
|
|
|
|
## Access
|
|
|
|
| | |
|
|
|---|---|
|
|
| Hostname | `mercury` |
|
|
| Domain | `mercury.kennethreitz.org` |
|
|
| IP | `5.161.122.181` (Hetzner) |
|
|
| SSH | `ssh root@mercury.kennethreitz.org` (key auth) |
|
|
| Dokploy UI | https://mercury.kennethreitz.org |
|
|
|
|
## Specs
|
|
|
|
| | |
|
|
|---|---|
|
|
| OS | Ubuntu 26.04 LTS |
|
|
| Kernel | 7.0.0-15-generic |
|
|
| Server type | Hetzner Cloud CPX41 (upgraded from CPX31 2026-06-05; 8 vCPU, 15.6 GiB; root disk kept at 150 GB for downgrade flexibility) (id `136742397`, dc `ash-dc1`) |
|
|
| CPU | 8 vCPU |
|
|
| RAM | 15.6 GiB |
|
|
| Disk | 150 GB (`/dev/sda1`) |
|
|
| Volume | `mercury-objects` (id `105925944`), 750 GB ext4 at `/mnt/objects` (fstab, nofail) — MinIO data |
|
|
| Swap | 4 GB swapfile (`/swapfile`, swappiness 10) — added 2026-06-05 after Immich's arrival OOM-killed the Dokploy service (exit 137 loop → bad gateway). Immich server/ML now carry 2g/1.5g mem_limits. |
|
|
| Firewall | Hetzner Cloud Firewall `mercury-web` (id `11085164`): inbound 22/80/443tcp+udp/2222 + ICMP only. Blocks the otherwise-public Dokploy :3000 and Traefik dashboard :8080 (api.insecure). Manage via Hetzner API. |
|
|
|
|
## Stack
|
|
|
|
Docker **29.5.3** running in single-node **Swarm** mode (node `mercury`, manager/leader).
|
|
|
|
### Core services (Dokploy platform)
|
|
|
|
| Service | Image | Notes |
|
|
|---|---|---|
|
|
| `dokploy` | `dokploy/dokploy:v0.29.7` | Swarm service, port 3000 |
|
|
| `dokploy-postgres` | `postgres:16` | Swarm service, Dokploy's own DB |
|
|
| `dokploy-redis` | `redis:7` | Swarm service |
|
|
| `dokploy-traefik` | `traefik:v3.6.7` | Plain container; ports 80/443 (+443/udp for HTTP/3), 8080 |
|
|
|
|
Traefik terminates TLS for `mercury.kennethreitz.org` and proxies to the Dokploy UI.
|
|
|
|
## Deployed applications
|
|
|
|
See [inventory.md](inventory.md). Currently:
|
|
|
|
- **httpbin** — https://httpbin.kennethreitz.org (`kennethreitz/httpbin`)
|
|
- **poemsbysarah** — https://poemsbysarah.com (built from `kennethreitz/sarah-poems`)
|
|
- **kjvstudy** — https://kjvstudy.org (built from `kennethreitz/kjvstudy.org`)
|
|
- **kennethreitz.org** — https://kennethreitz.org (built from `kennethreitz/kennethreitz.org`)
|
|
- **interpretations** — https://interpretations.kennethreitz.org (built from `kennethreitz/interpretations`)
|
|
- **photos** — https://photos.kennethreitz.org (compose: web + celery worker + postgres17 db, from `kennethreitz/photos.kennethreitz.org`)
|
|
|
|
## Deploys
|
|
|
|
All Swarm applications use `start-first` update ordering with rollback on failure
|
|
(set via `application.update` → `updateConfigSwarm`). kennethreitz.org and kjvstudy
|
|
additionally have Swarm healthchecks on `/health` (60s start period) so traffic only
|
|
moves to a warmed container — deploys are zero-downtime (verified by probing during
|
|
a live deploy). Durations in these API fields are nanoseconds.
|
|
|
|
## TLS / ACME
|
|
|
|
Traefik's `letsencrypt` resolver uses the **HTTP-01 challenge**. All certs issued.
|
|
|
|
Lessons from the 2026-06-05 Fly migration (cost ~1.5h of cert warnings):
|
|
- While DNS still pointed at Fly, every validation failed; 5 failed
|
|
authorizations/hour/domain trips Let's Encrypt's rate limiter, and **each
|
|
retry during the stale window extends it** — when this happens, stop
|
|
retrying and wait out the window (exact expiry is in the 429 in Traefik logs).
|
|
- After a rate-limit stall, Traefik does not retry on its own — restart the
|
|
`dokploy-traefik` container to trigger fresh orders.
|
|
- DNS-01 via DNSimple is **not possible on this account**: lego requires an
|
|
account token (`dnsimple_a_…`) and those aren't available at the current
|
|
DNSimple plan level. HTTP-01 is the permanent strategy.
|
|
- **Doctrine for new domains** (avoids every cert failure we've had): create
|
|
the DNS record FIRST, verify all four `ns*.dnsimple-edge.*` nameservers
|
|
serve it (their edge propagation can lag many minutes), and only THEN
|
|
attach the domain in Dokploy. If a validation still fails, wait out any
|
|
rate-limit window and restart `dokploy-traefik` exactly once.
|