‹ back to Docker Platform overview

The Docker Platform - Deep Dive

The mechanics behind the platform overview: how the tiers are laid out on disk, how systemd sequences them, and the cert-store failure that came out of running the proxy unprivileged.

The tiers

On disk the tiers are just directories of compose files that systemd brings up in order:

~/docker/
├── tier0/                 # ingress
│   ├── docker-compose.yml   Traefik (host networking)
│   ├── traefik.yml          static config
│   ├── dynamic.yml          routers + LAN/VPN allowlist middleware
│   └── certs/acme.json      the wildcard cert store (root:600)
└── tier1/                 # stateful, each service on its own bridge
    ├── gitlab/              GitLab CE + the two CI runners
    └── apps/                public-facing static site(s)

Tier 0 - ingress

The reverse proxy (Traefik) runs in tier 0 with host networking and comes up first. It is the single TLS termination point and the only cert authority in the lab: it runs the one ACME (Let's Encrypt) client, holds the wildcard cert, and fans the renewed cert out to the other TLS consumers (the DoH resolver and the VPN server) rather than have each run its own ACME - see Network. It runs unprivileged (cap_drop: ALL); that choice has a sharp edge, recorded in Gotchas below.

Tier 1 - stateful services

Tier 1 holds the things that depend on tier 0 being up, each on its own isolated bridge:

Because the tiers are separate bridges, the apps tier and the Git/CI tier cannot reach each other directly; both are reached only through tier 0.

Boot ordering

systemd units sequence the platform so it comes up the same way every time:

network/topology setup
        |
        v
docker-tier0  (Traefik, ingress)
        |
        v
docker-tier1-apps  +  docker-tier1-git   (start in parallel once tier 0 is up)

This matters because the proxy and DNS have to exist before the apps try to resolve names or register routes. The whole chain is Wants=/After=-ordered and survives a cold reboot (verified by booting the box and watching the tiers come up in order).

Self-healing the start path

Deterministic ordering only helps if each unit can actually start - and a leftover container from an earlier layout can dead-end the whole chain (see Gotchas). So each tier unit runs an idempotent preflight before compose up: it looks for any container squatting a name the tier manages and, only if that container belongs to a different compose project (or none), removes it. The legitimate tier-owned container is never touched, and a second run with nothing to clean is a silent no-op.

A tier-aware circuit breaker stops the preflight fighting a persistent recreator forever. It retries a bounded number of times; if a squatter keeps coming back, it stops and logs a CRITICAL breadcrumb (the offending project and image). For the foundational tier it then exits cleanly rather than hard-failing the box - LAN internet is protected independently by DNS failover, so a hard tier-0 failure would be needless and harmful; the CRITICAL log is the escalation signal. Leaf tiers (apps, Git/CI) fail loud instead, since they cannot take the network down with them.

Stable endpoints via loopback aliases

The reverse proxy and the DNS resolvers bind to dedicated loopback aliases (e.g. 127.0.0.10, 127.0.0.11, 127.0.0.12) rather than to a NIC address.

The reasoning is lean, not clever. On boot the physical NIC may not be up yet when something first needs DNS, and binding to a NIC IP makes a service fragile across link changes. Loopback is always up and it is there to be used - so rather than write logic to track the active NIC or react to link changes, I bind to something that never moves. resolv.conf points at the loopback-alias resolver, the proxy advertises a loopback-alias endpoint, and none of it cares which NIC is active or whether the LAN is up. Why re-implement behaviour the OS already hands you for free?

Host services (outside Docker)

Not everything is a container. Running directly on the host:

Service Role
Unbound Recursive DNS resolver (LAN/VPN/containers), blocklisting
dnsproxy DNS-over-HTTPS frontend
OpenVPN The one inbound service (remote access)
Suricata Network IDS, feeds the SIEM
VirtualBox Hosts the VMs (SIEM, Windows, Linux labs)
Ollama + Gemma Local LLM for first-line SIEM alert triage (GPU) - see Security Stack
Scheduled jobs (systemd timers) Threat-intel feed refresh, backups, audits, housekeeping

Gotchas

The cert store nobody could read

Symptom. TLS certificates stopped renewing. The obvious cause was a renewal script failing - but it failed silently, and fixing the script did not bring the certs back.

What it looked like from the floor. Not a cert error - a DNS one. The DoH resolver clients use is served over that same wildcard cert, so when the cert went bad the resolver stopped answering: phones on wi-fi could not resolve anything, while the wired PCs kept limping along on internet access purely because their DNS cache had not been purged yet. "Down for some devices, fine for others" is a long way from "a cert file has the wrong owner" - and that gap is what makes this kind of failure slow to spot.

Root cause. Two failures stacked. Going into the renewal path to debug the script, I found acme.json (the proxy's cert store) was owned by my user, not root. Traefik runs unprivileged (cap_drop: ALL), so it has no CAP_DAC_OVERRIDE - which means even root inside the container honours file ownership. It could not read its own cert store and was quietly falling back to a self-signed cert. It would have choked there even if the script had worked.

Fix. acme.json owned root, mode 600. ("TLS is up but every client screams" is the tell for a self-signed fallback.)

Lesson. A silent failure can hide a second, latent one - the script error masked a permission problem that was going to bite regardless. And capabilities you drop are capabilities you then have to honour: once CAP_DAC_OVERRIDE is gone, file ownership is load-bearing and has to be exactly right.

The reboot that ate itself

Symptom. After a routine maintenance reboot the whole stack came up broken: tier 0 failed to start, and both tier 1 units cascaded into dependency failure - no proxy, no site, no Git/CI.

Root cause. Two faults at once. A leftover container from a previous layout, kept alive by Docker's restart: unless-stopped policy, had grabbed the name tier 0's proxy wanted - so compose up hit a name conflict, the tier-0 unit failed, and everything that Requires= it refused to start. Separately, an unpinned CI runner had raced in and taken the static IP the forge container expected - an IPAM collision on the shared bridge.

Fix. Two layers. Immediately: remove the squatter, pin the runner IPs, bring the tiers up by hand. Durably: the tier units now self-heal on start (see above) - the preflight clears a foreign squatter before compose up, and the circuit breaker stops it fighting a persistent recreator - and every container IP that must be stable is pinned, so nothing can race for it.

Lesson. Deterministic boot ordering is not the same as resilient boot ordering - ordering assumes every step can actually start. A single stale artifact from an old layout can dead-end an otherwise-correct dependency graph, so the start path has to clean up after the past, not just sequence the present. And on a shared bridge, "it got the right address last time" is luck until it is pinned.

platform · the boot chain, made explicit
systemctl cat docker-tier1-apps
[Unit] Description=Docker Tier 1 - Apps (cv-site) After=tier0.target Requires=tier0.target PartOf=docker.service [Service] Type=oneshot RemainAfterExit=yes WorkingDirectory=/home/bogdan/docker/tier1/apps ExecStart=/usr/bin/docker compose up -d ExecStop=/usr/bin/docker compose down
# tier1 Requires=tier0.target: the apps and GitLab stacks cannot start until Traefik (tier0) is proven up. The dependency graph is the boot order - tier1-gitlab carries the same two lines.