What is Docker?

What is docker? Silly question to ask, you might say, in 2022.

I’ve always been a bit weary of containerization, ever since coming across Docker the first time professionally, in maybe 2013.

It always seemed to be a bit all over the place, with it’s Dockerfiles, and images. And hidden layers upon hidden layers. And holed up server sockets. And no way to access stuff your program stores in files. But there’s definitely a legitimate core hidden under all of this. So let’s unpack: What’s a container?

At it’s core, the container just fixes the OS in one way: it remove global variables from processes.

Let me explain.

It’s a platitude, but know it since a long time: if you program, don’t use global variables.

Global variables make it hard to reason about our program.

The same holds for other abstractions, like our processes, too. Only here the global variables are called differently: files and network.

The files a program is running in is just global state. It does have different mechanics. It’s read in instead of kept in memory. But it potentially becomes its configuration or its input, depending on what a program does.

Files are global state of a program running under an OS. They make it harder to reason about. Why is my web server not coming up? Better look if somethings off with the config files. We can control what other users do with file permissions, but that’s complex and often not perfect. Maybe someone did change something? We don’t know.

Inside the container, there are no files we didn’t put there. No unpredictable global state.

Similar the situation with the network. Sockets are distinguished by network interface and port, but otherwise visible for all processes. Global state again. Is something already bound to port 5000? We don’t know.

The container does away with that, too.

Of course you still want to use files and network. They are just scoped to the containers. We have images with the local file system state. We mount volumes to share some files explicity. We have local ports and interfaces inside the container. And we have exposed and mapped ports, so that the container process has some far end to telephone with. It wouldn’t make much sense otherwise. But we don’t leak global state by default.

If your container should share it, you explicitly mount or map.

Everything else is just there to make it easier to handle, but that’s the essence: doing away with global state that OS processes carry, for historical reasons.