Frequently Asked Question
What are Linux capabilities and why split up root?
Historically the kernel had one big switch: either a process was root (UID 0) and
could do anything, or it was not and could do almost nothing privileged. That meant
every program that needed a single privileged trick, binding to port 80, sending an
ICMP echo, changing its UID, had to be setuid root and inherit the keys to the
kingdom. Capabilities, introduced in 2.2 and refined ever since, break that monolith
into about forty distinct privileges (CAP_NET_BIND_SERVICE, CAP_NET_RAW,
CAP_CHOWN, CAP_SYS_ADMIN, …) that can be granted to a binary or process
individually.
The user-space tooling is setcap to attach capabilities to a file and getcap to
list them; man 7 capabilities enumerates the full set. A modern ping, for example,
no longer needs to be setuid root: it carries cap_net_raw+ep and nothing else, so
even a successful exploit in ping cannot read other users' files. Container runtimes
like Docker and Podman use the same mechanism to give containers only the kernel
privileges they need.
For a defender, capabilities are how you express least privilege at the kernel level.
When you write a systemd unit, prefer AmbientCapabilities=CAP_NET_BIND_SERVICE
and CapabilityBoundingSet= to scoping in the smallest set of privileges, rather
than running the service as root because "that worked".