KhueApps
Home/DevOps/Fix Operation not permitted for SYS_ADMIN in Docker containers

Fix Operation not permitted for SYS_ADMIN in Docker containers

Last updated: October 07, 2025

Overview

Docker isolates processes with namespaces, capabilities, seccomp, and LSMs (AppArmor/SELinux). Privileged operations (mount, loop devices, iptables, FUSE, eBPF) often fail with: Operation not permitted. This guide shows practical fixes using the least privilege necessary.

Key causes:

  • Missing capabilities (e.g., CAP_SYS_ADMIN, CAP_NET_ADMIN)
  • Seccomp profile blocks syscalls (e.g., mount)
  • AppArmor/SELinux confinement
  • Rootless mode or user namespaces
  • Missing device nodes (/dev/fuse, /dev/net/tun, /dev/loop*)
  • Read-only filesystem, locked sysctls, cgroup restrictions

Quickstart: fix the common mount failure

When a container tries to mount, it typically needs CAP_SYS_ADMIN plus an unconfined seccomp profile, and may need AppArmor unconfined on some hosts.

  1. Reproduce the failure:
# Fails with Operation not permitted
docker run --rm alpine sh -c 'mkdir -p /mnt && mount -t tmpfs tmpfs /mnt'
  1. Minimal fix with least privilege:
# Add CAP_SYS_ADMIN and disable seccomp filtering for mount
# AppArmor unconfined is needed on many Ubuntu hosts
docker run --rm \
  --cap-add SYS_ADMIN \
  --security-opt seccomp=unconfined \
  --security-opt apparmor=unconfined \
  alpine sh -c 'mkdir -p /mnt && mount -t tmpfs tmpfs /mnt && mount | grep /mnt && umount /mnt'
  1. If you must, use the sledgehammer:
# Broadest access; prefer targeted caps
docker run --rm --privileged alpine sh -c 'mount -t tmpfs tmpfs /mnt && umount /mnt'

Minimal working example (Dockerfile + run)

# Dockerfile
FROM alpine:3.20
RUN mkdir -p /mnt
CMD ["sh", "-c", "mount -t tmpfs tmpfs /mnt && echo 'mounted' && umount /mnt"]

Build and run:

docker build -t mount-test .
# Likely fails without privileges
docker run --rm mount-test
# Works with targeted privileges
docker run --rm \
  --cap-add SYS_ADMIN \
  --security-opt seccomp=unconfined \
  --security-opt apparmor=unconfined \
  mount-test

Common fixes by operation

  • Mount (tmpfs, bind, overlay):
    • --cap-add SYS_ADMIN
    • --security-opt seccomp=unconfined
    • --security-opt apparmor=unconfined (host dependent)
  • FUSE (rclone, sshfs):
    • --device /dev/fuse
    • --cap-add SYS_ADMIN (often required for mount)
    • seccomp=unconfined; apparmor=unconfined if needed
  • iptables, tc, routes:
    • --cap-add NET_ADMIN --cap-add NET_RAW
    • For sysctls: --sysctl net.ipv4.ip_forward=1 (host must allow)
  • Loop devices, mkfs:
    • --cap-add SYS_ADMIN --cap-add MKNOD
    • --device /dev/loop-control --device /dev/loop0 --device /dev/loop1 ...
  • ptrace, debuggers (gdb, strace):
    • --cap-add SYS_PTRACE
    • --security-opt seccomp=unconfined for broader ptrace
  • eBPF or perf:
    • Often requires --privileged or a tuned seccomp profile plus CAP_BPF/CAP_SYS_ADMIN depending on kernel

Diagnose systematically (least-to-most invasive)

  1. Check Docker mode and isolation:
  • docker info | grep -i rootless
  • If Rootless: some privileged ops (mounting block devices, creating device nodes) will never work. Consider rootful Docker.
  1. Verify the attempted operation and syscall:
  • Look at the exact command failing and error. For mount, assume CAP_SYS_ADMIN + seccomp unconfined.
  • Optional: strace the command in a test container to confirm the blocked syscall.
  1. Add the minimal capability:
  • docker run --cap-add <CAP> ...
  • List current caps inside a running container: cat /proc/1/status | grep CapEff
  1. Address seccomp:
  • Default profile blocks some syscalls. Try --security-opt seccomp=unconfined as a test.
  • If that fixes it, adopt a custom, minimally-permissive seccomp profile later.
  1. Address AppArmor/SELinux:
  • AppArmor: --security-opt apparmor=unconfined
  • SELinux (on Fedora/CentOS hosts): prefer correct labels on bind mounts (:z or :Z). If still blocked, test with --security-opt label=disable (privileged implies this).
  1. Map required devices:
  • --device /dev/fuse, /dev/net/tun, /dev/loop* as needed
  • For TUN: also CAP_NET_ADMIN
  1. Consider namespaces and userns:
  • If Docker daemon uses userns-remap, some host resources appear unprivileged. Test with --userns=host for the container, or disable remapping for that service.
  1. As a last resort, use --privileged:
  • Validate the operation works, then iterate to least privilege (caps + device + profiles).

Kubernetes equivalents

For Pods, set securityContext and annotations.

Minimal pod with mount inside container:

apiVersion: v1
kind: Pod
metadata:
  name: mount-test
  annotations:
    container.apparmor.security.beta.kubernetes.io/mount: unconfined
spec:
  containers:
  - name: mount
    image: alpine:3.20
    command: ["sh", "-c", "mkdir -p /mnt && mount -t tmpfs tmpfs /mnt && sleep 5"]
    securityContext:
      privileged: false
      allowPrivilegeEscalation: true
      capabilities:
        add: ["SYS_ADMIN"]

Notes:

  • Admission policies may forbid privileged or SYS_ADMIN.
  • On SELinux hosts, use proper volume labels; cluster defaults may still block mounts.
  • For FUSE, add volume for /dev/fuse and capability as above.

Pitfalls

  • Relying on --privileged hides missing device mappings and policy issues; prefer targeted caps.
  • Rootless Docker cannot perform many privileged operations regardless of caps.
  • Read-only rootfs or masked paths can mimic permission errors; check docker run --read-only and masked paths.
  • Bind-mounting host paths with restrictive SELinux labels causes EPERM; use :z or :Z on volumes on SELinux hosts.
  • Kubernetes PSP/PodSecurity/OPA/Gatekeeper may silently strip capabilities; verify the effective securityContext at runtime.
  • Default seccomp profiles vary by engine version; an upgrade can change behavior.

Performance notes

  • FUSE filesystems incur user-space context switch overhead versus kernel mounts; expect higher CPU and lower throughput.
  • Giving --privileged disables many resource isolations; noisy-neighbor effects can degrade cluster performance.
  • eBPF/perf inside containers can contend for system-wide resources; limit scope and sampling.
  • Excess capabilities and unconfined seccomp have negligible direct performance cost but increase attack surface; security incidents have far greater operational impact.
  • Loop devices and dm-crypt in containers add I/O overhead; consider host-managed storage instead.

Tiny FAQ

Q: What does CAP_SYS_ADMIN cover? A: It is a broad capability used for many operations (mount, namespace control, device mgmt). Use it sparingly; prefer specific alternatives when available.

Q: Why does --cap-add SYS_ADMIN still fail? A: The syscall may be blocked by seccomp or LSM (AppArmor/SELinux), or the needed device is missing. Add seccomp=unconfined and unconfine AppArmor, and map devices.

Q: Is --privileged the same as adding a few caps? A: No. --privileged grants all capabilities, disables seccomp/AppArmor constraints, and gives broad device access. Use only as a last resort.

Q: Does this work in rootless Docker? A: Many privileged operations (mounting filesystems, creating device nodes) are not possible in rootless mode. Use rootful Docker or move the operation to the host.

Series: Docker

DevOps