Perfecting the Dockerfile
There are plenty of Docker best practices articles out there, this is not one, exactly.. I will instead talk to maintainability, efficiency, and security covering best practices where relevant.
I've been using Docker for a couple of years now, it seems like a lifetime ago I'd jump into a new project and setup Vagrant.
Just like we had to perfect the Vagrantfile
with its complicated network and volume sharing, so too does the Dockerfile
present certain challenges (but we finally have nice shared networks and volumes).
What does a basic Dockerfile look like?
FROM python:2.7
ENV APP_ROOT=/usr/src/app
ADD requirements.txt .
RUN pip install -r requirements.txt
ADD ./src $APP_ROOT/src
CMD ["python", "command.py", "start"]
Looks simple enough, what can go wrong?
- Basics; FROM, WORKDIR, ENV, ARG, and LABEL
- Requiring file resources; side-effects of ADD
- Reducing image size
- Caching (layers)
- Caching (combined commands)
- Caching (optimisation)
- Ignoring files
- Change the executable; using ENTRYPOINT and CMD
- Mapping volumes or mounts
- Security
- Root access prevention
- Lateral movement
- Monitoring; using HEALTHCHECK
That's a lot to cover.
Basics
The FROM
keyword takes a base container and extends it with layer for your purpose.
A base is usually a operating system such as FROM ubuntu:17.10
but it is becoming more common to choose a combination of programming language and O/S FROM python:2.7-debian
.
In the example Dockerfile we didn't have WORKDIR
which is a useful basic and does what it says on the box, sets the containers following commands to be executed in the working directory defined WORKDIR /usr/src
. When you start a session into a running container you will be in this directory.
Next we look at ENV
which is a helpful tool when building a container but comes with side-effects.
FROM python:2.7
ENV APP_ROOT=/usr/src/app
ENV PG_MAJOR 9.3
ENV PG_VERSION 9.3.4
WORKDIR $APP_ROOT
RUN curl -SL http://example.com/postgres-$PG_VERSION.tar.xz | tar -xJC /usr/src/postgress && …
ENV PATH /usr/local/postgres-$PG_MAJOR/bin:$PATH
ADD requirements.txt .
RUN pip install -r requirements.txt
CMD ["python", "command.py", "start"]
Here we use ENV
to set the WORKDIR and define software versions that get installed and the side-effect is in the ENV PATH
declaration as this will change the existing O/S environment variable. You can pass in new values at build time using --env APP_ROOT=/usr/src
To avoid side-effects and still have the benefits of parameterising our build we can use ARG
instead of ENV
.
With ARG we can take a build argument from the executor ARG PG_VERSION
used as --build-arg PG_VERSION=9.3.4
and then in the Dockerfile $PG_VERSION
has the value 9.3.4
. We can use a default in the Dockerfile ARG PG_VERSION 9.3.4
so that the --build-arg
is not mandatory.
You can combine ENV
and ARG
also;
ENV PG_MAJOR 9.3
ENV PG_VERSION ${PG_MAJOR}.0
Now the value of $PG_VERSION
is 9.3.0
but when we pass --env PG_MAJOR=9.2
then the ARG $PG_VERSION
is now 9.2.0
from that ENV change.
Lastly we look at LABEL
which simply adds metadata to the final image. Any key value pair can be given without side-effects LABEL version="1.0"
will not effect any other versions unless there was a LABEL called version already.
Requiring file resources
The no side-effects way to require files is using COPY
, simply give it source on host and destination in container values;
COPY ./requirements.txt /usr/src/app/requirements.txt
COPY ./src /usr/src/app
Both relative and absolute works here, and copy in sequence over-writes so if there was a requirements.txt
fine in /src
also, it will over-write the one in root ./requirements.txt
.
Be careful when using ADD
, and avoid it unless you actually know why you are using it. ADD takes 2 arguments, remote source file and local destination directory.
ADD http://example.com/foobar.tar.gz /tmp/
It will resolve, download, and auto-extract files in the archive to the destination /tmp
.
RUN curl -SL http://example.com/foobar.tar.gz && \
tar -xfz foobar.tar.gz -C /tmp/
But works for all operating systems as it's executed by the host not inside the container however the layer itself is quite large.
Reducing image size
Docker containers are made up of layers each contributing to the final image size itself, so if you are not careful a simple app can become larger then 1gb easily.
Caching (layers)
Reducing layers is a great way to achieve smaller image sizes, lets take this example
FROM alpine:latest
RUN apk update
RUN apk add --update ca-certificates wget
RUN update-ca-certificates
RUN wget -q --no-check-certificate https://download.docker.com/linux/static/stable/x86_64/docker-17.09.0-ce.tgz -O- | tar xzC /tmp/
RUN cp /tmp/docker/* /usr/bin/
Outputs 6 layers at 197MB
~$ docker image list
REPOSITORY TAG IMAGE ID CREATED SIZE
add-test latest 106ef90bfa53 Just now 197MB
Now combine into a single RUN command
FROM alpine:latest
RUN apk update \
&& apk --no-cache add --update --virtual build-deps ca-certificates wget \
&& update-ca-certificates \
&& wget -q --no-check-certificate https://download.docker.com/linux/static/stable/x86_64/docker-17.09.0-ce.tgz -O- \
| tar xzC /tmp/ \
&& cp /tmp/docker/* /usr/bin/ \
&& rm -rf /tmp/ \
&& apk del build-deps \
&& rm -rf /var/cache/apk/*
Outputs 2 layers though still with a total 197MB we now speed up subsequent builds as there are less layers to compute and fetch from cache or build anew.
Caching (combined commands)
In the last example we saw how combining commands works to improve build times leveraging cached layers better, now we will look at how we can combine even more commands to reduce the image size;
FROM alpine:latest
RUN apk update \
&& apk add --update ca-certificates wget \
&& update-ca-certificates \
&& wget -q --no-check-certificate https://download.docker.com/linux/static/stable/x86_64/docker-17.09.0-ce.tgz -O- \
| tar xzC /tmp/ \
&& cp /tmp/docker/* /usr/bin/ \
&& rm -rf /tmp/ \
&& apk del ca-certificates wget \
&& rm -rf /var/cache/apk/*
Outputs 2 layers still but now only 99.2MB
~$ docker image list
REPOSITORY TAG IMAGE ID CREATED SIZE
add-test latest 7b6ba76d4fc3 Just now 99.2MB
I utilised the package manager to clean up our build only programs that aren't needed to run our app, as well as removing any temp files for the build.
Another useful feature in Alpine package manager we don't have in many others is the ability to tell apk that these packages are build only packages so they can be cleaned up later with only 1 command argument
FROM alpine:latest
RUN apk update \
&& apk add --update --virtual build-deps ca-certificates wget \
&& update-ca-certificates \
&& wget -q --no-check-certificate https://download.docker.com/linux/static/stable/x86_64/docker-17.09.0-ce.tgz -O- \
| tar xzC /tmp/ \
&& cp /tmp/docker/* /usr/bin/ \
&& apk del build-deps \
&& rm -rf /var/cache/apk/*
The size and build time is unchanged, but it is now easier to manage our build only programs as our app grows.
Caching (optimisation)
We've seen how less layers speeds up our build times when layers are built from cache, but sometimes things change between builds forcing the layers to skip cache.
In Docker, if any layer is changed all layers after it will be rebuilt too, so ensure you order your layers not only by logical build steps, but also by least changed steps first.
FROM alpine:latest
COPY ./src /usr/src
RUN apk update \
&& apk add --update --virtual build-deps ca-certificates wget \
&& update-ca-certificates \
&& wget -q --no-check-certificate https://download.docker.com/linux/static/stable/x86_64/docker-17.09.0-ce.tgz -O- \
| tar xzC /tmp/ \
&& cp /tmp/docker/* /usr/bin/ \
&& apk del build-deps \
&& rm -rf /var/cache/apk/*
Above, the app source files will change between builds, so having it at the top forces Docker to rebuild all layers after the COPY. Move most likely to change often steps the the bottom to utilise cached layers better.
Ignoring files
If you're familiar with how .gitignore
is used to keep your git repo small, tidy, and free of large files such as unwanted archives and binaries in your tree - the same applies to the .dockerignore
file.
Place .dockerignore
in your project root (where you run docker cli) and during builds the docker cli will ignore unwanted files.
If you are developing in one O/S and deploying to another (even another based on Linux) it is important to consider that binaries will compile differently on your host then they should be for the container so ensure you ignore all your dependency directories so they are built inside the container.
Change the executable
Using ENTRYPOINT
and CMD
can be confusing.
You might need to be familiar with Linux process and sessions to fully grasp this, so I'll do my best to keep this high level enough for Docker users unfamiliar with Linux.
Think of ENTRYPOINT as the base program you want the container to run, because a Docker container can only run 1 program while it is running. When you start an interactive shell (terminal session) you generally tell Docker to use /bin/sh
or /bin/bash
depending on O/S as the base program and during that interactive session you type out commands like ls
, cd
, java
for example, these are the CMD
in your Dockerfile!
So if you were to run docker run -it alpine
to start a session and then ls /usr
(or just one line docker run -it alpine ls /usr
) it is the same as;
FROM apline
ENTRYPOINT /bin/sh
CMD ["ls", "/usr"]
So you can see how this might be useful to change to something like (if you have a Nodejs app);
FROM node
ENTRYPOINT /usr/local/bin/npm
CMD ["run", "start-server"]
Mapping volumes or mounts
You need to make a decision whether you want to have a mount or a volume.
Both have all of the same arguments available on the cli but if you're intending to run the docker container as a service you can only use mounts whereas if you intend to use a plugin or driver such as vieux/sshfs
which is a file system over ssh, you must use a volume to make use of volume drivers.
In a Dockerfile you can define what the expected VOLUMES
should be, but it's not possible to mount a host directory using a Dockerfile by convention because images should be portable.
You must map the volume when you run the container, e.g. docker run -v "$(pwd)"/src:/usr/src/app:ro
or docker run --mount type=bind,source="$(pwd)"/src,target=/usr/src/app,readonly
.
Security
We talked about volumes and mounts, regardless if you choose mounts or volumes you should always make them read-only for security reasons.
Never choose shared
bind propagation on a mount point and Docker is secure by default meaning the most secure option rprivate
is also the default but it is best to be explicit on this so docker run --mount type=bind,source="$(pwd)"/src,target=/usr/src/app,readonly,bind-propagation=rprivate
.
Some other points;
- Do not docker exec commands with privileged option
- Configure centralized and remote logging
- Do not disable default seccomp profile
- Use either apparmor or selinux
- Set a non-host user with
useradd
- Set
--ulimit
and--memory
when running a Docker container - Use docker secrets instead of environment or configuration alternatives
- Strictly no SSH access into containers
- Scan Dockerfiles with Docker Bench (or chosen proprietary alternative)
Root access prevention
The SUID flag on binaries has a vulnerability where intruders have a vector for assuming root access to the host. It's best to just remove these as it's unlikely you'll be sing them from your app.
RUN for i in `find / -perm +6000 -type f`; do rm -f $i; done
If you're unsure and removing them breaks stuff you can also unset the flag on each file with
RUN for i in `find / -perm +6000 -type f`; do chmod a-s $i; done
But if you want finer control while keeping the binaries and know your way around Linux capability controls you can use something like this
RUN setcap cap_net_raw+p /bin/ping
Which unsets SUID and allows the use of RAW and PACKET sockets.
It was mentioned above, but another control is having a restricted user that is not a host user, you can create one by running
RUN adduser --system --no-create-home --disabled-password --disabled-login --shell /bin/sh myappuser
USER myappuser
Each command after USER
is executed as the new restricted user, so do this at the top of the Dockerfile.
Lateral movement
Using build arguments --icc
, --link
, and --iptables
flags control which containers are allowed inter-container communication which is the risk mechanism used for moving from one hacked system to another.
Docker prides itself that it is security first, but this is one case where convenience was chosen at the compromise of security because by default all containers may communicate with one another freely.
Monitoring
I recommend using HEALTHCHECK on every container without exception because Docker must be running a process to stay active so inherently there is something to monitor.
HEALTHCHECK uses non-zero exit codes to detect a processes health, so a simply check for a web server would be;
HEALTHCHECK --interval=12s --timeout=12s --start-period=30s \
CMD curl --silent --fail https://localhost:36000/ || exit 1
Of course if you're on Windows curl is either not there or it's alias to powershell Invoke-WebRequest is there but the arguments aren't the same, therefore it is best not to use any host O/S specific programs and write one in python, go, java, nodejs, whatever language you feel comfortable using - just make sure it emits proper exit codes.