Overview
In Docker, the error "Cannot start service …: bpf: operation not permitted" means the container tried to use the bpf() syscall (load/attach eBPF programs or maps) without the necessary Linux capabilities or was blocked by seccomp/AppArmor/SELinux or missing mounts/ulimits. This is common when running Cilium, bcc/bpftrace, XDP/TC programs, or eBPF exporters.
Root cause categories:
- Missing capabilities: CAP_BPF (>= 5.8), CAP_PERFMON, CAP_SYS_RESOURCE, and sometimes CAP_SYS_ADMIN (older kernels or special ops).
- Seccomp denial: Docker’s default seccomp profile blocks bpf() by default.
- LSM policy: AppArmor or SELinux can confine bpf.
- Missing mounts/limits: bpffs not mounted; memlock too low; missing /lib/modules.
- Namespace scope: some attachments require host namespaces (cgroupns, pid, network).
Quickstart (fastest fix)
Grant required capabilities, relax seccomp, and mount bpffs. Example using docker run (Ubuntu image installs bpftool on the fly to verify):
# WARNING: Adjust for your environment; review security impact before using in prod.
docker run --rm \
--cap-add BPF --cap-add PERFMON --cap-add SYS_RESOURCE \
--security-opt seccomp=unconfined \
--mount type=bind,src=/sys/fs/bpf,dst=/sys/fs/bpf \
--mount type=bind,src=/lib/modules,dst=/lib/modules,ro \
--cgroupns=host --pid=host --network host \
ubuntu:22.04 sh -lc 'apt-get update && apt-get install -y bpftool iproute2 >/dev/null && bpftool feature probe kernel'
If you run an older kernel (< 5.8) or see further permission errors, add:
--cap-add SYS_ADMIN
On SELinux systems that still deny access, append:
--security-opt label=disable
On AppArmor systems, if confined, append:
--security-opt apparmor=unconfined
Minimal working example (Compose)
Dockerfile to provide bpftool:
# Dockerfile
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y --no-install-recommends \
bpftool iproute2 && \
rm -rf /var/lib/apt/lists/*
CMD ["bpftool", "feature", "probe", "kernel"]
docker-compose.yml with the typical settings to allow eBPF work:
version: "3.8"
services:
ebpf-tool:
build: .
# Capabilities needed for eBPF
cap_add:
- BPF
- PERFMON
- SYS_RESOURCE
# For older kernels or certain attach types, uncomment:
# - SYS_ADMIN
# For TC/XDP networking work, uncomment:
# - NET_ADMIN
security_opt:
- seccomp:unconfined
# If AppArmor blocks, uncomment:
# - apparmor:unconfined
# If SELinux blocks, uncomment:
# - label=disable
volumes:
- /sys/fs/bpf:/sys/fs/bpf
- /lib/modules:/lib/modules:ro
ulimits:
memlock: -1
cgroupns_mode: host
pid: host
network_mode: host
Run it:
docker compose up --build
You should see a feature probe output rather than "bpf: operation not permitted".
Step-by-step: diagnose and resolve
- Verify kernel and features
- Check kernel:
uname -r. eBPF is robust from 4.14+; CAP_BPF exists from 5.8. - Probe features (on host):
bpftool feature probe kernel(or use the container above). - Ensure bpffs is available:
/sys/fs/bpfshould exist and be a bpf filesystem on the host. If not, mount it on the host:mount -t bpf bpf /sys/fs/bpf.
- Choose capabilities appropriate to your kernel
- 5.8 and newer: Prefer CAP_BPF + CAP_PERFMON + CAP_SYS_RESOURCE.
- Older kernels (< 5.8): Use CAP_SYS_ADMIN (bpf gated behind it) and often CAP_SYS_RESOURCE.
- Add NET_ADMIN when attaching TC/XDP to interfaces.
- Relax seccomp for bpf syscall
- Docker’s default seccomp profile blocks bpf(). Use
--security-opt seccomp=unconfinedor provide a custom seccomp profile that allows bpf and perf_event_open.
- Address LSM policies
- AppArmor: run with
--security-opt apparmor=unconfinedor a profile that allows bpf. - SELinux: use
--security-opt label=disableor craft a policy to permit bpf/perf_event.
- Mount required host paths
- bpffs: bind-mount
/sys/fs/bpfto the same path in the container. - Kernel headers/modules (some loaders/tools): bind-mount
/lib/modules:/lib/modules:ro.
- Raise memlock limits
- eBPF maps pin locked memory; set unlimited memlock: Compose
ulimits: memlock: -1, or Docker run--ulimit memlock=-1.
- Use host namespaces when needed
- Some cgroup or tracing attachments require host namespaces. Use
--cgroupns=host,--pid=host, and optionally--network host.
- Rootless Docker consideration
- Rootless Docker cannot grant CAP_BPF/SYS_ADMIN to containers. For eBPF workloads, use the rootful Docker daemon.
- Kernel sysctl note
kernel.unprivileged_bpf_disabled=1(common default) is fine when you grant the capabilities above; it only blocks truly unprivileged bpf.
Capability matrix (cheat sheet)
| Kernel version | Preferred caps | Fallback/notes |
|---|---|---|
| >= 5.8 | BPF, PERFMON, SYS_RESOURCE | Add SYS_ADMIN if specific ops still fail |
| < 5.8 | SYS_ADMIN, SYS_RESOURCE | BPF/PERFMON may not exist; NET_ADMIN as needed |
Notes:
- For TC/XDP networking, add NET_ADMIN regardless of kernel.
- Specific attach types (e.g., LSM probes) may need additional privileges.
Performance notes
- eBPF itself is fast, but over-broad attachments (many kprobes/tracepoints) add overhead. Scope filters narrowly.
- Unlimited memlock (-1) avoids allocation failures but permits higher resident memory; monitor map sizes and counts.
- Using host namespaces (pid/network) reflects real host activity and can increase event volume; ensure consumers can keep up.
- Avoid
--privilegedin production; grant only the minimal set of capabilities to reduce attack surface.
Common pitfalls
- Relying only on CAP_BPF on older kernels (< 5.8) — use SYS_ADMIN there.
- Forgetting seccomp: capability adds alone won’t help if bpf() is blocked by the default seccomp profile.
- Missing bpffs mount: tools expect
/sys/fs/bpfinside the container. - SELinux/AppArmor silently denying: check
dmesgor LSM audit logs if errors persist. - Rootless Docker: cannot provide required capabilities.
Tiny FAQ
Q: Is --privileged required?
- No. Prefer targeted
cap_add,seccomp=unconfined, and specific mounts. Use--privilegedonly as a last resort for debugging.
Q: Which exact capabilities do I need?
- Start with BPF, PERFMON, SYS_RESOURCE (>= 5.8). Add NET_ADMIN for TC/XDP. Add SYS_ADMIN for older kernels or if specific operations still fail.
Q: Can I avoid seccomp=unconfined?
- Yes, by using a custom seccomp profile that allows the
bpfandperf_event_opensyscalls while keeping other filters intact.
Q: Do I need kernel headers in the container?
- Often not, but some loaders compile BPF at runtime and need
/lib/modules(and sometimes headers) from the host.