Skip to main content

Linux Capabilities In Containers & Kubernetes

โ€” Bibin Wilson

In this blog, we will look a little deeper into Linux Capabilities to understand how they relate to containers and Kubernetes using practical examples.

By the end of the guide, you will understand

  1. What Are Linux Capabilities?
  2. Linux Capabilities in Containers
  3. Adding Linux Capabilities in Kubernetes pods
  4. Anaalyzing Capabilities using systemd

In our container non-root blog, I mentioned the concepts of Linux Capabilities.

It is an overlooked security feature that plays a key role in container security and Kubernetes configurations.

What Are Linux Capabilities?

In traditional Linux, a process is either root (superuser) or non-root (restricted). A concept you all know.

Linux Capabilities was introduced in kernel 2.2. Before that,

  • Processes either had root privileges (Privileged processes (UID=0): Full root access.)
  • Regular user privileges (Non-privileged processes (UIDโ‰ 0): Limited permissions.

The problem with this approach was if a non-root user program needed to run one privileged operation, it had to run with full root access. For example, binding to a ports below 1024 that require root privileges.

What if there is mechanism where a non-root user can be provided privileged access to that only operation?

This is what Linux Capabilities solve.

Linux capabilities solve this by segregating root privileges into separate units that can be given access individually.

For example,

  1. CAP_NET_BIND_SERVICE: To grant the permission to bind to privileged ports.
  2. CAP_NET_ADMIN: Allows modifying network interfaces, routing tables, and other network configurations.
  3. CAP_SYS_TIME: To modify the system clock.
  4. CAP_DAC_OVERRIDE: Bypasses discretionary access control (DAC), allowing a process to ignore file permission checks.
  5. CAP_CHOWN: Allows changing the ownership of files, bypassing normal user restrictions.
  6. CAP_NET_RAW: Allows sending and receiving raw packets (e.g., crafting custom network packets). Used in tools like ping and tcpdump.

With this, a non-root user can be granted only CAP_NET_BIND_SERVICE to bind to a privileged port while blocking all the other root related access.

Just run the following command to list all the supported capabilities in Linux.

man capabilities

There are 50 different capabilities in today's Linux kernel (I tested this on an Ubuntu server).

Now that we have an understanding of Linux capabilities, let's understand how containers and Kubernetes use them.

Linux Capabilities & Containers

By default, containers run as root (unless you run as non-root).

But this doesnโ€™t mean they have full root privileges on the host.

Docker and other container runtimes use Linux Capabilities to restrict container permissions for enhanced security. This make the container environment more secure, even though the user ID (UID 0) remains the same inside the container and on the host.

Docker, for instance, drops many Capabilities by default and uses only the required Capabilities.

Containerd code shows these default allowed capabilities.the

CRIO has the following defaults. Refer doc here.

Let's look at an example using Docker.

Let's try to create a BusyBox container to create a dummy network interface.

$ docker run --rm -it \
    --name test_no_cap busybox sh

/ # ip link add dummy0 type dummy
ip: RTNETLINK answers: Operation not permitted

As you can see, the container lacks CAP_NET_ADMIN, so it cannot modify network interfaces.

The --cap-add is used to grant additional Linux capabilities to a container.

Now, run the same container but with the required capability using --cap-add=NET_ADMIN flag.

$ docker run --rm -it --cap-add=NET_ADMIN \
    --name test_with_cap busybox sh

/ # ip link add dummy0 type dummy
/ # ip link show dummy0
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop qlen 1000
    link/ether 0a:0c:31:af:1e:0b brd ff:ff:ff:ff:ff:ff

Since the container has CAP_NET_ADMIN, it can create network interfaces.

Kubernetes and Linux Capabilities

When it comes to kubernetes, You can add or drop Linux capabilities in your SecurityContext to reduce attack surfaces.

Please refer to the SecurityContext blog to know more about k8s SecurityContext

Let's understand this with an example.

When you run a pod with a BusyBox image, by default, you will be able to use ping.

For example:

$ kubectl run ping-pod \
      --image=busybox --restart=Never \
      -it -- sh -c "ping 8.8.8.8"

64 bytes from 8.8.8.8: seq=1 ttl=61 time=13.500 ms
64 bytes from 8.8.8.8: seq=2 ttl=61 time=16.598 ms
64 bytes from 8.8.8.8: seq=3 ttl=61 time=16.262 ms

Now, let's say you don't want to allow the BusyBox pod to perform ping.

In this case, we drop the NET_RAW capability using the Security Context.

The NET_RAW capability allows a container to create and use raw network sockets. This is required for commands like ping and some network debugging tools.

By dropping NET_RAW, we prevent the container from sending raw packets

For example,

apiVersion: v1
kind: Pod
metadata:
  name: busybox-ping
spec:
  containers:
  - name: busybox
    image: busybox:latest
    command: ["sleep", "3600"]
    securityContext:
      capabilities:
        drop:
          - NET_RAW

If you deploy this pod and try ping, you will get the following error.

$ kubectl exec -it busybox-secure -- ping 8.8.8.8

PING 8.8.8.8 (8.8.8.8): 56 data bytes
ping: permission denied (are you root?)
command terminated with exit code 1

Analyze Capabilities With Systemd

In linux systems, the command systemd-analyze security is used to evaluate the security of systemd services by analyzing their sandboxing and security features.

It inspects how a service is configured in terms of Capability restrictions (e.g., CAP_NET_ADMIN, CAP_DAC_OVERRIDE) and more.

Here is an exmaple output of systemd-analyze securitycommand.

c

You can further analyse a specific service using the unit name.

For example,

systemd-analyze security apache2.service

Conclusion

Linux capabilities play an important role in enforcing the principle of least privilege by allowing fine-grained control over what processes can do.

For DevOps engineers, being proactive about these best practices is important.

Security should not be an afterthought.

Iโ€™ve seen teams rushing to fix clusters after a security audit when issues could have been prevented earlier.

By taking small but smart security steps wherever necessaryyou can avoid last-minute surprises and keep your infrastructure safe from the start.

If you have any doubts about this blog, drop it on the comment!