19 janvier 2018

to DinD or not do DinD ?

A colleague of mines pointed me to this interesting article about using Docker-in-Docker ("DinD") as a valid solution for Continuous Integration use-case. 

Don't bind-mount docker.sock !

This article is pretty interesting as it explains very well the issue by exposing underlying docker socket to a container. tldr; you just give up with security and isolation. Remember this single excerpt:

" they can create Docker containers that are privileged or bind mount host paths, potentially creating havoc on the host "

This article starts with a reference to Jérôme's blog post explaining why one should not use DinD for CI, so this is interesting to understand the reasoning to adopt a solution the original author explicitly disclaimed for this usage.

Let's now have a look at the follow-up article on DinD : A case for Docker-in-Docker on Kubernetes (Part 2)

Here again, the issue exposing underlying docker infrastructure is well described. Please read-it, I'm exhausted trying to explain why '-v /var/run/docker.sock:/var/run/docker.sock' is an option you never should type.

Then the DinD solution applied to kubernetes is demonstrated, and a point I want you to notice is this one in pod's yaml definition :

securityContext: 
    privileged: true 

Privileged ?

What does this option implies ? It sounds like few people actually understand the impact. The option name should anyway ring a bell.

Let's have a look at the reference documentation:

The --privileged flag gives all capabilities to the container, and it also lifts all the limitations enforced by the device cgroup controller"

Such a container can then access every hardware resource exposed at lowest level within the host's /dev pseudo-filesystem, which includes all your disks, which is the most obvious security issue. Are you comfortable your build can also access /dev/mem (physical memory) ? 

Allowing all capabilities also means your container can use all system calls from cap_sys_admin capability, which as a short overview of Linux capabilities means ... there's no restriction what this process can do on system. Typically, with cap_sys_admin you can use mknod to create /dev/*  if you didn't already had access to it from container...

--privileged is sort of a sudo++. Just like if you could use this :


  ~ echo hello
Permission denied
  ~ sudo echo hello
Permission denied
  ~ echo --privileged hello
hello

So, a DinD container runs as root, without restriction on system calls it can run, and access to all devices. Sounds like a good place to run arbitrary processes and build pull-requests, isn't it ?

Maybe you consider docker resources isolation as "we want to prevent development team to shoot into it's own foot" and just ensure no build process will start an infinite fork loop or break the CI service with a memory leak. Only public cloud services need to prevent hackers to break the isolation and steal secrets, right ? If so, please take few minutes to talk with your Ops team :P


So, is DinD such a bad idea ?

Actually, one can use privileged container and enforce security, using a fine-grained AppArmor  profile to ensure only adequate resources. You also can use docker's --device to restrict the devices your DinD container actually can use, and --cap-drop to restrict allowed system calls to the strict minimum. This is actually how play-with-docker is built, but as you can guess this wasn't created within a day, and require advanced understanding of those security mechanisms. 

Is there any alternative ?

My guess is that Applatix solution is driven by lack for a simple and viable alternative. Exposing underlying docker infrastructure is just a no-go, as you then loose kubernetes management control on your side containers. Your nodes would quickly be running thousands orphaned containers. From this point of view, using DinD allows to maintain all your containers under cluster management.

How do others solve this issue ?

CircleCI for sample do allow access to a docker infrastructure to build your own image. The documentation explains a dedicated, remote, docker machine will be allocated for your build. So they just create VMs (or something comparable) to allow your build to access a dedicated docker daemon with some strong isolation. This is far from being transparent for end-user, but at least don't give up with a secured solution.

My recommendation is to have your build include the required logic for such a dedicated docker box to be setup. In terms of a Jenkinsfile pipeline, you could mimic CircleCI with a shared library to offer a setup_remote_docker() high-level function to jobs within your company. This library would allocated a short lived VM on your infrastructure to host docker commands, and inject DOCKER_HOST environment variable accordingly.

What's next ?

Another solution I've been investigating is to create a docker API proxy, which do expose the underlying docker infrastructure but filter all API calls to reject anything you're not supposed to do :
  • only proxy supported API calls (whitelist)
  • parse API payload and rebuild payload sent to underlying infrastructure. This ensure only supported options will be passed to docker daemon.
  • reject security related options like : bind mounts, privileged, cap-add, etc
  • block access to containers/volumes/networks you didn't created
  • filter API responses to only let you see legitimate resources (for sample, docker ps will only give you access to your own containers)
This proxy also transparently adds constraints to API commands: it enforces all containers you create do inherit from the same cgroup hierarchy. So if your build is constrained to 2Gb memory, you can't get more running side containers. It also adds labels, which could be used for infrastructure monitoring to track resources ownership.

so, generally speaking, this proxy adds a "namespace" feature on top of Docker API.

This is just a prototype so far, and sorry : it's not open-source...