KhueApps
Home/DevOps/Fix Docker 'Cannot start service: bpf: operation not permitted'

Fix Docker 'Cannot start service: bpf: operation not permitted'

Last updated: October 07, 2025

Overview

In Docker, the error "Cannot start service …: bpf: operation not permitted" means the container tried to use the bpf() syscall (load/attach eBPF programs or maps) without the necessary Linux capabilities or was blocked by seccomp/AppArmor/SELinux or missing mounts/ulimits. This is common when running Cilium, bcc/bpftrace, XDP/TC programs, or eBPF exporters.

Root cause categories:

  • Missing capabilities: CAP_BPF (>= 5.8), CAP_PERFMON, CAP_SYS_RESOURCE, and sometimes CAP_SYS_ADMIN (older kernels or special ops).
  • Seccomp denial: Docker’s default seccomp profile blocks bpf() by default.
  • LSM policy: AppArmor or SELinux can confine bpf.
  • Missing mounts/limits: bpffs not mounted; memlock too low; missing /lib/modules.
  • Namespace scope: some attachments require host namespaces (cgroupns, pid, network).

Quickstart (fastest fix)

Grant required capabilities, relax seccomp, and mount bpffs. Example using docker run (Ubuntu image installs bpftool on the fly to verify):

# WARNING: Adjust for your environment; review security impact before using in prod.
docker run --rm \
  --cap-add BPF --cap-add PERFMON --cap-add SYS_RESOURCE \
  --security-opt seccomp=unconfined \
  --mount type=bind,src=/sys/fs/bpf,dst=/sys/fs/bpf \
  --mount type=bind,src=/lib/modules,dst=/lib/modules,ro \
  --cgroupns=host --pid=host --network host \
  ubuntu:22.04 sh -lc 'apt-get update && apt-get install -y bpftool iproute2 >/dev/null && bpftool feature probe kernel'

If you run an older kernel (< 5.8) or see further permission errors, add:

--cap-add SYS_ADMIN

On SELinux systems that still deny access, append:

--security-opt label=disable

On AppArmor systems, if confined, append:

--security-opt apparmor=unconfined

Minimal working example (Compose)

Dockerfile to provide bpftool:

# Dockerfile
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y --no-install-recommends \
      bpftool iproute2 && \
    rm -rf /var/lib/apt/lists/*
CMD ["bpftool", "feature", "probe", "kernel"]

docker-compose.yml with the typical settings to allow eBPF work:

version: "3.8"
services:
  ebpf-tool:
    build: .
    # Capabilities needed for eBPF
    cap_add:
      - BPF
      - PERFMON
      - SYS_RESOURCE
      # For older kernels or certain attach types, uncomment:
      # - SYS_ADMIN
      # For TC/XDP networking work, uncomment:
      # - NET_ADMIN
    security_opt:
      - seccomp:unconfined
      # If AppArmor blocks, uncomment:
      # - apparmor:unconfined
      # If SELinux blocks, uncomment:
      # - label=disable
    volumes:
      - /sys/fs/bpf:/sys/fs/bpf
      - /lib/modules:/lib/modules:ro
    ulimits:
      memlock: -1
    cgroupns_mode: host
    pid: host
    network_mode: host

Run it:

docker compose up --build

You should see a feature probe output rather than "bpf: operation not permitted".

Step-by-step: diagnose and resolve

  1. Verify kernel and features
  • Check kernel: uname -r. eBPF is robust from 4.14+; CAP_BPF exists from 5.8.
  • Probe features (on host): bpftool feature probe kernel (or use the container above).
  • Ensure bpffs is available: /sys/fs/bpf should exist and be a bpf filesystem on the host. If not, mount it on the host: mount -t bpf bpf /sys/fs/bpf.
  1. Choose capabilities appropriate to your kernel
  • 5.8 and newer: Prefer CAP_BPF + CAP_PERFMON + CAP_SYS_RESOURCE.
  • Older kernels (< 5.8): Use CAP_SYS_ADMIN (bpf gated behind it) and often CAP_SYS_RESOURCE.
  • Add NET_ADMIN when attaching TC/XDP to interfaces.
  1. Relax seccomp for bpf syscall
  • Docker’s default seccomp profile blocks bpf(). Use --security-opt seccomp=unconfined or provide a custom seccomp profile that allows bpf and perf_event_open.
  1. Address LSM policies
  • AppArmor: run with --security-opt apparmor=unconfined or a profile that allows bpf.
  • SELinux: use --security-opt label=disable or craft a policy to permit bpf/perf_event.
  1. Mount required host paths
  • bpffs: bind-mount /sys/fs/bpf to the same path in the container.
  • Kernel headers/modules (some loaders/tools): bind-mount /lib/modules:/lib/modules:ro.
  1. Raise memlock limits
  • eBPF maps pin locked memory; set unlimited memlock: Compose ulimits: memlock: -1, or Docker run --ulimit memlock=-1.
  1. Use host namespaces when needed
  • Some cgroup or tracing attachments require host namespaces. Use --cgroupns=host, --pid=host, and optionally --network host.
  1. Rootless Docker consideration
  • Rootless Docker cannot grant CAP_BPF/SYS_ADMIN to containers. For eBPF workloads, use the rootful Docker daemon.
  1. Kernel sysctl note
  • kernel.unprivileged_bpf_disabled=1 (common default) is fine when you grant the capabilities above; it only blocks truly unprivileged bpf.

Capability matrix (cheat sheet)

Kernel versionPreferred capsFallback/notes
>= 5.8BPF, PERFMON, SYS_RESOURCEAdd SYS_ADMIN if specific ops still fail
< 5.8SYS_ADMIN, SYS_RESOURCEBPF/PERFMON may not exist; NET_ADMIN as needed

Notes:

  • For TC/XDP networking, add NET_ADMIN regardless of kernel.
  • Specific attach types (e.g., LSM probes) may need additional privileges.

Performance notes

  • eBPF itself is fast, but over-broad attachments (many kprobes/tracepoints) add overhead. Scope filters narrowly.
  • Unlimited memlock (-1) avoids allocation failures but permits higher resident memory; monitor map sizes and counts.
  • Using host namespaces (pid/network) reflects real host activity and can increase event volume; ensure consumers can keep up.
  • Avoid --privileged in production; grant only the minimal set of capabilities to reduce attack surface.

Common pitfalls

  • Relying only on CAP_BPF on older kernels (< 5.8) — use SYS_ADMIN there.
  • Forgetting seccomp: capability adds alone won’t help if bpf() is blocked by the default seccomp profile.
  • Missing bpffs mount: tools expect /sys/fs/bpf inside the container.
  • SELinux/AppArmor silently denying: check dmesg or LSM audit logs if errors persist.
  • Rootless Docker: cannot provide required capabilities.

Tiny FAQ

Q: Is --privileged required?

  • No. Prefer targeted cap_add, seccomp=unconfined, and specific mounts. Use --privileged only as a last resort for debugging.

Q: Which exact capabilities do I need?

  • Start with BPF, PERFMON, SYS_RESOURCE (>= 5.8). Add NET_ADMIN for TC/XDP. Add SYS_ADMIN for older kernels or if specific operations still fail.

Q: Can I avoid seccomp=unconfined?

  • Yes, by using a custom seccomp profile that allows the bpf and perf_event_open syscalls while keeping other filters intact.

Q: Do I need kernel headers in the container?

  • Often not, but some loaders compile BPF at runtime and need /lib/modules (and sometimes headers) from the host.

Series: Docker

DevOps