9 min read

Perfecting the Dockerfile

Perfecting the Dockerfile

There are plenty of Docker best practices articles out there, this is not one, exactly.. I will instead talk to maintainability, efficiency, and security covering best practices where relevant.

I've been using Docker for a couple of years now, it seems like a lifetime ago I'd jump into a new project and setup Vagrant.

Just like we had to perfect the Vagrantfile with its complicated network and volume sharing, so too does the Dockerfile present certain challenges (but we finally have nice shared networks and volumes).

What does a basic Dockerfile look like?

FROM python:2.7

ENV APP_ROOT=/usr/src/app

ADD requirements.txt .
RUN pip install -r requirements.txt

ADD ./src $APP_ROOT/src

CMD ["python", "command.py", "start"]

Looks simple enough, what can go wrong?

  • Basics; FROM, WORKDIR, ENV, ARG, and LABEL
  • Requiring file resources; side-effects of ADD
  • Reducing image size
    • Caching (layers)
    • Caching (combined commands)
    • Caching (optimisation)
    • Ignoring files
  • Change the executable; using ENTRYPOINT and CMD
  • Mapping volumes or mounts
  • Security
    • Root access prevention
    • Lateral movement
  • Monitoring; using HEALTHCHECK

That's a lot to cover.

Basics

The FROM keyword takes a base container and extends it with layer for your purpose.

A base is usually a operating system such as FROM ubuntu:17.10 but it is becoming more common to choose a combination of programming language and O/S FROM python:2.7-debian.

In the example Dockerfile we didn't have WORKDIR which is a useful basic and does what it says on the box, sets the containers following commands to be executed in the working directory defined WORKDIR /usr/src. When you start a session into a running container you will be in this directory.

Next we look at ENV which is a helpful tool when building a container but comes with side-effects.

FROM python:2.7
ENV APP_ROOT=/usr/src/app
ENV PG_MAJOR 9.3
ENV PG_VERSION 9.3.4

WORKDIR $APP_ROOT
RUN curl -SL http://example.com/postgres-$PG_VERSION.tar.xz | tar -xJC /usr/src/postgress && …
ENV PATH /usr/local/postgres-$PG_MAJOR/bin:$PATH

ADD requirements.txt .
RUN pip install -r requirements.txt

CMD ["python", "command.py", "start"]

Here we use ENV to set the WORKDIR and define software versions that get installed and the side-effect is in the ENV PATH declaration as this will change the existing O/S environment variable. You can pass in new values at build time using --env APP_ROOT=/usr/src

To avoid side-effects and still have the benefits of parameterising our build we can use ARG instead of ENV.

With ARG we can take a build argument from the executor ARG PG_VERSION used as --build-arg PG_VERSION=9.3.4 and then in the Dockerfile $PG_VERSION has the value 9.3.4. We can use a default in the Dockerfile ARG PG_VERSION 9.3.4 so that the --build-arg is not mandatory.

You can combine ENV and ARG also;

ENV PG_MAJOR 9.3
ENV PG_VERSION ${PG_MAJOR}.0

Now the value of $PG_VERSION is 9.3.0 but when we pass --env PG_MAJOR=9.2 then the ARG $PG_VERSION is now 9.2.0 from that ENV change.

Lastly we look at LABEL which simply adds metadata to the final image. Any key value pair can be given without side-effects LABEL version="1.0" will not effect any other versions unless there was a LABEL called version already.

Requiring file resources

The no side-effects way to require files is using COPY, simply give it source on host and destination in container values;

COPY ./requirements.txt /usr/src/app/requirements.txt
COPY ./src /usr/src/app

Both relative and absolute works here, and copy in sequence over-writes so if there was a requirements.txt fine in /src also, it will over-write the one in root ./requirements.txt.

Be careful when using ADD, and avoid it unless you actually know why you are using it. ADD takes 2 arguments, remote source file and local destination directory.

ADD http://example.com/foobar.tar.gz /tmp/

It will resolve, download, and auto-extract files in the archive to the destination /tmp.

RUN curl -SL http://example.com/foobar.tar.gz && \
  tar -xfz foobar.tar.gz -C /tmp/

But works for all operating systems as it's executed by the host not inside the container however the layer itself is quite large.

Reducing image size

Docker containers are made up of layers each contributing to the final image size itself, so if you are not careful a simple app can become larger then 1gb easily.

Caching (layers)

Reducing layers is a great way to achieve smaller image sizes, lets take this example

FROM alpine:latest

RUN apk update
RUN apk add --update ca-certificates wget
RUN update-ca-certificates
RUN wget -q --no-check-certificate https://download.docker.com/linux/static/stable/x86_64/docker-17.09.0-ce.tgz -O- | tar xzC /tmp/
RUN cp /tmp/docker/* /usr/bin/

Outputs 6 layers at 197MB

~$ docker image list
REPOSITORY          TAG                IMAGE ID            CREATED    SIZE
add-test            latest             106ef90bfa53        Just now   197MB

Now combine into a single RUN command

FROM alpine:latest

RUN apk update \
    && apk --no-cache add --update --virtual build-deps ca-certificates wget \
    && update-ca-certificates \
    && wget -q --no-check-certificate https://download.docker.com/linux/static/stable/x86_64/docker-17.09.0-ce.tgz -O- \
    | tar xzC /tmp/ \
    && cp /tmp/docker/* /usr/bin/ \
    && rm -rf /tmp/ \
    && apk del build-deps \
    && rm -rf /var/cache/apk/*

Outputs 2 layers though still with a total 197MB we now speed up subsequent builds as there are less layers to compute and fetch from cache or build anew.

Caching (combined commands)

In the last example we saw how combining commands works to improve build times leveraging cached layers better, now we will look at how we can combine even more commands to reduce the image size;

FROM alpine:latest

RUN apk update \
    && apk add --update ca-certificates wget \
    && update-ca-certificates \
    && wget -q --no-check-certificate https://download.docker.com/linux/static/stable/x86_64/docker-17.09.0-ce.tgz -O- \
    | tar xzC /tmp/ \
    && cp /tmp/docker/* /usr/bin/ \
    && rm -rf /tmp/ \
    && apk del ca-certificates wget \
    && rm -rf /var/cache/apk/*

Outputs 2 layers still but now only 99.2MB

~$ docker image list
REPOSITORY          TAG                IMAGE ID            CREATED    SIZE
add-test            latest             7b6ba76d4fc3        Just now   99.2MB

I utilised the package manager to clean up our build only programs that aren't needed to run our app, as well as removing any temp files for the build.

Another useful feature in Alpine package manager we don't have in many others is the ability to tell apk that these packages are build only packages so they can be cleaned up later with only 1 command argument

FROM alpine:latest

RUN apk update \
    && apk add --update --virtual build-deps ca-certificates wget \
    && update-ca-certificates \
    && wget -q --no-check-certificate https://download.docker.com/linux/static/stable/x86_64/docker-17.09.0-ce.tgz -O- \
    | tar xzC /tmp/ \
    && cp /tmp/docker/* /usr/bin/ \
    && apk del build-deps \
    && rm -rf /var/cache/apk/*

The size and build time is unchanged, but it is now easier to manage our build only programs as our app grows.

Caching (optimisation)

We've seen how less layers speeds up our build times when layers are built from cache, but sometimes things change between builds forcing the layers to skip cache.

In Docker, if any layer is changed all layers after it will be rebuilt too, so ensure you order your layers not only by logical build steps, but also by least changed steps first.

FROM alpine:latest

COPY ./src /usr/src

RUN apk update \
    && apk add --update --virtual build-deps ca-certificates wget \
    && update-ca-certificates \
    && wget -q --no-check-certificate https://download.docker.com/linux/static/stable/x86_64/docker-17.09.0-ce.tgz -O- \
    | tar xzC /tmp/ \
    && cp /tmp/docker/* /usr/bin/ \
    && apk del build-deps \
    && rm -rf /var/cache/apk/*

Above, the app source files will change between builds, so having it at the top forces Docker to rebuild all layers after the COPY. Move most likely to change often steps the the bottom to utilise cached layers better.

Ignoring files

If you're familiar with how .gitignore is used to keep your git repo small, tidy, and free of large files such as unwanted archives and binaries in your tree - the same applies to the .dockerignore file.

Place .dockerignore in your project root (where you run docker cli) and during builds the docker cli will ignore unwanted files.

If you are developing in one O/S and deploying to another (even another based on Linux) it is important to consider that binaries will compile differently on your host then they should be for the container so ensure you ignore all your dependency directories so they are built inside the container.

Change the executable

Using ENTRYPOINT and CMD can be confusing.

You might need to be familiar with Linux process and sessions to fully grasp this, so I'll do my best to keep this high level enough for Docker users unfamiliar with Linux.

Think of ENTRYPOINT as the base program you want the container to run, because a Docker container can only run 1 program while it is running. When you start an interactive shell (terminal session) you generally tell Docker to use /bin/sh or /bin/bash depending on O/S as the base program and during that interactive session you type out commands like ls, cd, java for example, these are the CMD in your Dockerfile!

So if you were to run docker run -it alpine to start a session and then ls /usr (or just one line docker run -it alpine ls /usr) it is the same as;

FROM apline
ENTRYPOINT /bin/sh
CMD ["ls", "/usr"]

So you can see how this might be useful to change to something like (if you have a Nodejs app);

FROM node
ENTRYPOINT /usr/local/bin/npm
CMD ["run", "start-server"]

Mapping volumes or mounts

You need to make a decision whether you want to have a mount or a volume.

Both have all of the same arguments available on the cli but if you're intending to run the docker container as a service you can only use mounts whereas if you intend to use a plugin or driver such as vieux/sshfs which is a file system over ssh, you must use a volume to make use of volume drivers.

In a Dockerfile you can define what the expected VOLUMES should be, but it's not possible to mount a host directory using a Dockerfile by convention because images should be portable.
You must map the volume when you run the container, e.g. docker run -v "$(pwd)"/src:/usr/src/app:ro or docker run --mount type=bind,source="$(pwd)"/src,target=/usr/src/app,readonly.

Security

We talked about volumes and mounts, regardless if you choose mounts or volumes you should always make them read-only for security reasons.
Never choose shared bind propagation on a mount point and Docker is secure by default meaning the most secure option rprivate is also the default but it is best to be explicit on this so docker run --mount type=bind,source="$(pwd)"/src,target=/usr/src/app,readonly,bind-propagation=rprivate.

Some other points;

  • Do not docker exec commands with privileged option
  • Configure centralized and remote logging
  • Do not disable default seccomp profile
  • Use either apparmor or selinux
  • Set a non-host user with useradd
  • Set --ulimit and --memory when running a Docker container
  • Use docker secrets instead of environment or configuration alternatives
  • Strictly no SSH access into containers
  • Scan Dockerfiles with Docker Bench (or chosen proprietary alternative)

Root access prevention

The SUID flag on binaries has a vulnerability where intruders have a vector for assuming root access to the host. It's best to just remove these as it's unlikely you'll be sing them from your app.

RUN for i in `find / -perm +6000 -type f`; do rm -f $i; done

If you're unsure and removing them breaks stuff you can also unset the flag on each file with

RUN for i in `find / -perm +6000 -type f`; do chmod a-s $i; done

But if you want finer control while keeping the binaries and know your way around Linux capability controls you can use something like this

RUN setcap cap_net_raw+p /bin/ping

Which unsets SUID and allows the use of RAW and PACKET sockets.

It was mentioned above, but another control is having a restricted user that is not a host user, you can create one by running

RUN adduser --system --no-create-home --disabled-password --disabled-login --shell /bin/sh myappuser
USER myappuser

Each command after USER is executed as the new restricted user, so do this at the top of the Dockerfile.

Lateral movement

Using build arguments --icc, --link, and --iptables flags control which containers are allowed inter-container communication which is the risk mechanism used for moving from one hacked system to another.

Docker prides itself that it is security first, but this is one case where convenience was chosen at the compromise of security because by default all containers may communicate with one another freely.

Monitoring

I recommend using HEALTHCHECK on every container without exception because Docker must be running a process to stay active so inherently there is something to monitor.

HEALTHCHECK uses non-zero exit codes to detect a processes health, so a simply check for a web server would be;

HEALTHCHECK --interval=12s --timeout=12s --start-period=30s \
    CMD curl --silent --fail https://localhost:36000/ || exit 1

Of course if you're on Windows curl is either not there or it's alias to powershell Invoke-WebRequest is there but the arguments aren't the same, therefore it is best not to use any host O/S specific programs and write one in python, go, java, nodejs, whatever language you feel comfortable using - just make sure it emits proper exit codes.

Thank you for reading and don't forget to share this if you found it interesting.