Frequently Asked Question

What is seccomp and how do filters reduce kernel attack surface?

The Linux kernel exposes around 350 system calls. Most processes use only a handful; a web server opens sockets, reads files, writes log lines, and exits, but the others remain reachable, and a bug in any one of them is potential privilege escalation. Seccomp (secure computing mode) is a kernel feature that lets a process voluntarily shrink the set of syscalls it is allowed to make. Once a filter is installed it cannot be loosened, so even if the process is later compromised, the attacker is stuck inside the same restricted syscall set.

The original "strict" mode allowed only read, write, _exit, and sigreturn; modern seccomp uses BPF filters (the same in-kernel virtual machine that powers packet filtering) to express richer policies: allow openat but only with O_RDONLY, deny mount entirely, return EPERM rather than killing the process on violation. Container runtimes ship default seccomp profiles that block a few dozen rarely-needed syscalls (kexec_load, bpf, mount, clone with namespace flags), and systemd units can pull in named filter sets with SystemCallFilter=.

For a defender, seccomp is the cheapest large-impact hardening available: an Ubuntu machine running stock Docker is already enjoying it, and adding SystemCallFilter= ~@mount @swap @reboot to a systemd unit costs nothing and prunes whole classes of kernel-LPE primitives from the attacker's reach.

Further reading and video