Container History and Linux Namespaces Part 2: Cgroups

by Hadean Platform

Recap

In the last article we looked at a(n incomplete) history of containers and dived a bit into Linux namespaces, which help you partition parts of your system and limit their capabilities. However, namespaces aren’t so useful for limiting hardware resources.

This article will cover how container runtimes deal with this using cgroups.

Core Isolation Technologies (continued)

cgroups

Linux namespaces are great, but don’t really touch classic resource usage like memory and CPU. cgroups (short for control groups) take a step in filling this gap by providing a unified filesystem-based interface for grouping processes, with assorted ‘subsystems’ supporting the alteration of process behaviour.

Among many other things, leveraging cgroups allows

  • Assignment of processes one or more specific CPUs (e.g. the ––cpuset-cpus argument to docker run)
  • Limiting of process memory and swap usage (e.g. the ––memory argument to docker run)
  • Freezing and resuming processes (e.g. docker pause)

Note that cgroups are used for more than just containers! For example, systemd allows configuration of cgroups for services, slices and scopes.

cgroups: Hands On

You can dig into container cgroups by looking at the cgroup information for any process within that container (I use podman as my container runtime which requires using sudo when applying cgroups, but Docker should work equivalently):

$ sudo podman run -it –rm ––cpuset-cpus 0,1 alpine:3.11 sh -c ‘cat /proc/self/cgroup && while true; do echo -n x; sleep 10; done’

[…]

3:cpuset:/machine.slice/libpod-1e2bca888e33e6c960c3eb7d957ba5a24c7998dedf7d907d576311613d98cd0a.scope

2:freezer:/machine.slice/libpod-1e2bca888e33e6c960c3eb7d957ba5a24c7998dedf7d907d576311613d98cd0a.scope

[…]

xxxxxxxxxxx

The shell commands in the container will first print cgroup information, and then print an x every 10 seconds. In a separate terminal I can now take a look at my cgroup v1 filesystem (may be in a different location for you due to cgroup nuances!), as well as apply new things to it:

$ cat /sys/fs/cgroup/cpuset/machine.slice/libpod-1e2bca888e33e6c960c3eb7d957ba5a24c7998dedf7d907d576311613d98cd0a.scope/cpuset.cpus

0-1

$ sudo sh -c ‘echo FROZEN > /sys/fs/cgroup/freezer/machine.slice/libpod-1e2bca888e33e6c960c3eb7d957ba5a24c7998dedf7d907d576311613d98cd0a.scope/freezer.state’

The first command gets the current setting of the cpuset.cpus attribute, representing ––cpuset-cpus 0,1. The second command applies the FROZEN state to the container cgroup – you’ll see the xs stop printing because the process is frozen! Note that we needed root permissions to change cgroup state.

This kind of direct alteration can confuse container runtimes because they internally track container status. In this case you’ll need to apply THAWED in the same way you applied the FROZEN state, by overwriting the cgroup state file, rather than using docker unpause.

To learn more, you might want to start with the manpage with man cgroups! This gives an overview of the different versions of cgroups as well as points to locations in the kernel source repository where you can read more about the assorted cgroup subsystems and attributes.

Conclusions

The key takeaway from all of this is that containers aren’t magical – they’re implemented with a set of features in the Linux kernel that you can play around with yourself!

Containers do provide an incredible convenience for packaging and deploying software and they’re built on a number of very powerful lower-level technologies. Knowing about these underpinnings can help you understand the fundamental capabilities of your system and give context on issues you may encounter.

For me, awareness of cgroups and namespaces has enabled me to provide advice for container configurations, debug container issues on different Linux distributions and achieve a mental model for how a scalable container deployment is put together – hopefully you’ll find this knowledge equivalently useful!