What Docker healthchecks do
A Docker healthcheck runs a command in the container to report if the service is working. If the command exits 0, the container is healthy; non‑zero marks it unhealthy. Orchestrators and docker compose can gate dependencies on health.
- Healthy states: starting → healthy or unhealthy
- Health is separate from running: a container can be running but unhealthy
- Healthcheck runs in the container’s namespace, using its filesystem, network, and user
Quickstart: verify and debug fast
- Check status:
docker psshows STATUS with health. - Inspect details:
docker inspect --format '{{json .State.Health}}' <container> - Read recent health logs:
docker inspect <container> | jq '.State.Health.Log[-5:]' - Run the health command manually inside the container:
docker exec -it <container> sh -lc "<health command>" - Check application logs for clues:
docker logs <container>
Minimal working example (MWE)
This container serves a static file via Python’s HTTP server and uses wget to verify it.
# Dockerfile
FROM python:3.12-alpine
WORKDIR /app
RUN printf 'ok' > index.html
EXPOSE 8000
# Healthcheck: succeed only if index contains 'ok'
HEALTHCHECK --interval=10s --timeout=3s --start-period=5s --retries=3 \
CMD wget -qO- http://127.0.0.1:8000/ | grep -q ok || exit 1
CMD ["python", "-m", "http.server", "8000"]
Build and run:
docker build -t mwe-health .
docker run --name mwe --rm -p 8000:8000 mwe-health
In another terminal:
docker ps
# Wait a few seconds, then:
docker inspect --format '{{.State.Health.Status}}' mwe # should print: healthy
docker compose example:
version: "3.9"
services:
app:
build: .
ports:
- "8000:8000"
healthcheck:
test: ["CMD-SHELL", "wget -qO- http://127.0.0.1:8000/ | grep -q ok || exit 1"]
interval: 10s
timeout: 3s
retries: 3
start_period: 5s
db:
image: postgres:16-alpine
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
start_period: 10s
api:
build: ./api
depends_on:
db:
condition: service_healthy
Common symptoms and likely fixes
| Symptom | Likely cause | Practical fix |
|---|---|---|
| Healthcheck exits 127 | Command not found | Install tool (apk add curl), or use busybox wget; verify PATH |
| Times out | App slow to start or probe too heavy | Increase timeout/start-period; reduce work in probe |
| Works manually, fails in healthcheck | Shell features not available | Use CMD-SHELL; quote properly; avoid bash-only syntax on sh |
| HTTP 200 but unhealthy | Probe does not exit 0 | Ensure final command returns 0 on success; add ` |
| DB-dependent app unhealthy | Dependency not ready | Add start-period; or make app probe only its own readiness; use depends_on: service_healthy |
| Port connection refused | App binds to wrong interface | Bind app to 0.0.0.0 inside container; probe 127.0.0.1 or localhost |
| TLS probe fails | Missing CA or self-signed | Use http for local probe; add ca-certificates; or --insecure if acceptable |
Step-by-step diagnosis
- Inspect the health command
- Find it in Dockerfile or compose under healthcheck.test
- Confirm it is either an array exec form or CMD-SHELL appropriately
- Validate command availability
docker exec <ctr> which curl wget ncand ensure the tool exists- If missing, install during build, or switch to a tool you have
- Run the probe verbatim
docker exec -it <ctr> sh -lc "<probe>"; echo $?and confirm exit code
- Check app binding and endpoints
- Inside the container:
ss -lntpornetstat -lntto verify ports - Test:
wget -S -O- http://127.0.0.1:<port>/health
- Inside the container:
- Tune timing
- If the app needs warmup, raise
--start-periodand--timeout, lower frequency (--interval)
- If the app needs warmup, raise
- Make the probe cheap and deterministic
- Use a fast readiness endpoint; avoid full DB queries or migrations
- Watch the logs
docker inspectincludes.State.Health.Logwith command, exitCode, and output; fix based on errors
Patterns for robust healthchecks
- Keep probes local: target localhost or a UNIX socket if applicable
- Exit 0 only when the service can handle traffic; non-zero otherwise
- Use a stable, fast endpoint like
/healthzor/readyz - Prefer exec arrays for simple binaries; use CMD-SHELL for pipes and redirection
- Avoid heavy dependencies; busybox
wgetorncoften suffice
Examples:
# Simple TCP port probe
HEALTHCHECK CMD nc -z 127.0.0.1 8080 || exit 1
# HTTP probe with curl (ensure curl is installed)
HEALTHCHECK CMD curl -fsS http://127.0.0.1/healthz >/dev/null || exit 1
# App-provided check script
COPY healthcheck.sh /usr/local/bin/
HEALTHCHECK --interval=15s CMD ["/usr/local/bin/healthcheck.sh"]
healthcheck.sh should be small, fast, and end with exit 0 on success.
Pitfalls to avoid
- Using hostnames/ports that are only valid from other containers; the healthcheck runs inside the target container
- Depending on external services for health; prefer checking only what this container controls
- Returning success on partial failures; keep semantics strict
- Using bashisms on alpine busybox
sh(e.g.,[[); either install bash or rewrite for POSIX sh - Forgetting
start-periodfor apps with long cold starts - Not failing explicitly; ensure the command ends with
|| exit 1when using pipes
Performance notes
- Every healthcheck spawns a process; too-frequent checks waste CPU and I/O
- Reasonable defaults: interval 10–30s, timeout 2–5s, retries 3
- Use lightweight tools (
wget -q,nc -z) and cheap endpoints - Avoid disk writes, large payloads, or TLS handshakes unless necessary
- In high-density hosts, stagger intervals across services to reduce bursts
When to disable healthchecks
- For pure batch/one-shot jobs (they exit anyway)
- When an orchestrator already performs equivalent checks externally
Disable with:
HEALTHCHECK NONE
Tiny FAQ
My container runs but is unhealthy. What is the difference?
- Running means the process is alive. Healthy means your probe says it is ready. They are independent.
Should the probe check dependencies (DB, cache)?
- Prefer checking the container’s own readiness. If you must check dependencies, make it fast and resilient.
How do I gate startup on a healthy dependency in compose?
- Use
depends_on: condition: service_healthyand define a healthcheck for the dependency.
- Use
Why does a pipe succeed but the health is unhealthy?
- Without
set -o pipefail, only the last command’s exit code is used. Use CMD-SHELL and add|| exit 1.
- Without
How do I view the last failures?
docker inspect <ctr> | jq '.State.Health.Log'shows timestamped entries with exit code and output.
Checklist
- Command exists and exits 0 on success
- Probe targets the right host/port (usually 127.0.0.1)
- Timing tuned: start-period, interval, timeout, retries
- Logs inspected; failures are actionable
- Probe is lightweight and deterministic