Frequently Asked Question

How strong is the container security boundary, really?

The honest answer is: weaker than a virtual machine and stronger than nothing. A container shares one kernel with the host and every other container on the box, so any kernel vulnerability that lets a process escape its namespaces, and there have been a steady trickle of them, in user namespaces, in overlayfs, in cgroup release handlers, in eBPF, in io_uring, is a host compromise. A VM, by contrast, has the hypervisor's hardware boundary in front of it; a kernel bug inside the guest still leaves the hypervisor between attacker and host. Compliance regimes that require isolating mutually distrusting tenants almost always demand VMs (or "secure containers" like Kata or Firecracker, which are VMs in disguise).

For cooperating workloads in a single trust domain, say, the dozen microservices of a single application, containers are usually fine, especially if you reduce the attack surface with the basic hardening steps: run as a non-root USER, drop all capabilities you don't need (--cap-drop=ALL then add back specifics), apply a tight seccomp profile, use a read-only root filesystem (--read-only), and run rootless where possible. These together close most realistic escape paths short of a kernel zero-day. Where the trust boundary really matters, run containers inside VMs and you have both.

Further reading and video