Methods to Optimize Docker Image Size

The Problem

Docker is a powerful tool, but its images can often become quite large in size, taking up disk space, but more problematically wasting valuable bandwidth as containers get deployed at scale. Therefore this post provides an overview of some methods that can be utilized to optimize Docker image sizes. Using Redis whose official latest image on Docker Hub weighs in at 98mb, a thorough explanation and comparison of the pros and cons of each method shall be provided.

Note: The standard way to view a Docker image size is of course $ docker images. But during research and writing of this comparison article, we found dive to be a nifty tool, which allows one to not only easily see an image’s overall size information, but also inspect the changes that occurred from one layer to the next.

Manual Management

The naive way would be to take the official Redis image, then manually remove components no longer needed by your running dockerized application. Not only would this be labor-intensive and error-prone, but it is also completely ineffective. As most seasoned Docker users are probably aware, file manipulation within Docker images does not work the same way as they would in conventional file systems.

Each command executed from your Dockerfile creates a new layer for that image. And much like adding new commits in git, previous layers would still be available since they are stored as part of the final image.

So even with the removal of unneeded software, you would likely find no improvements in its final size, if not an outright increase. Therefore, this method cannot be considered a real solution.

Multi-Stage Build

A built-in solution provided by Docker itself is its relatively new (as of Docker 17.05) multi-stage builds feature. Continuing with the git analogy, this method would loosely be akin to squashing the commit history of a project. This is done by discarding the intermediate layers which store and provide information on how the final image and its contents were created; only keeping components required by your containerized application.

The high-level implementation procedure is this:

1. start off with some image as the base for building

2. run commands to build your program/application as normal

3. selectively copy only the desired artifacts to a separate base

This is a better method to be sure, but the true value offered by multi-stage builds is in the mechanism it provides, which the next approach shall demonstrate.

Distroless

Having been heavily reliant on containerization technologies and Docker, in particular, Google has long since realized the drawbacks of using bloated images. So they have created their own method to tackle this problem, the distroless image. As its name suggests, unlike typical Linux base images, which come bundled with a myriad of software, dockerizing your application on distroless results in a final image containing only the application and its runtime dependencies. Standard software included in most Linux distributions, such as the package manager, and even the shell are excluded.

To actually take advantage of Google’s distroless image, the multi-stage build mechanism must be utilized. As you can see below:

FROM redis:latest AS build

ARG TIME_ZONE

RUN mkdir -p /opt/etc && \
    cp -a --parents /lib/x86_64-linux-gnu/libm.so.* /opt && \
    cp -a --parents /lib/x86_64-linux-gnu/libdl.so.* /opt && \
    cp -a --parents /lib/x86_64-linux-gnu/libpthread.so.* /opt && \
    cp -a --parents /lib/x86_64-linux-gnu/libc.so.* /opt && \
    cp -a --parents /usr/local/bin/redis-server /opt && \
    cp -a --parents /usr/local/bin/redis-sentinel /opt && \
    cp /usr/share/zoneinfo/${TIME_ZONE:-UTC} /opt/etc/localtime


FROM gcr.io/distroless/base

COPY --from=build /opt /

VOLUME /data
WORKDIR /data

ENTRYPOINT ["redis-server"]

Using redis:latest as the base, we collect the binaries of interest (the redis-server binary along with all of its shared object dependencies). Then using distroless as the base for which the final image shall be built upon, copy the opt directory containing all collected objects into it.

Now just run $ docker build -t redis:distroless . and voila! The resultant
redis:distroless clocks in at only 27.3mb!

Note: the shared objects a given binary depends on can be found by using the ldd utility on Linux; for example, $ ldd $(which redis-server). Or otoolif you are a macOS user.

Using distroless is a perfectly valid method for reducing Docker image sizes, as it’s evidently very effective at doing so. However one drawback is the lack of a shell in the final image, making debugging the dockerized application effectively impossible. At the same time, this also minimizes the attack surface of the application, making it more secure.

Alpine Linux

Taking this idea of not settling for the official image on Docker Hub even further, we can choose application images that have been built (or build them yourself) on top of Alpine Linux, a distro that is especially suited for creating minimized Docker images.

Instead of glibc, Alpine Linux uses the smaller musl C library, and statically links it. This means programs compiled against musl become relocatable binaries, eliminating the need to also include shared objects, enabling significant reductions in the final image size; the redis:alpine images is about 30mb.

The disadvantage to this is that generally speaking, musl is not as performant as glibc. Another plus of using this method is that unlike the distroless approach above, Alpine, being a full-fledged Linux distribution, provides basic shell access, making it possible to debug the dockerized applications.

Alpine versions of almost all popular software, such as Redis, Nginx, PostgreSQL, etc., can be found on Docker Hub, saving you the need to create them yourself using a Dockerfile, as you would need to use the distroless method.

Lastly, if your application uses any such popular software, then you can simply copy the single compiled binary from their Alpine Docker images to your final application image via the multi-stage build mechanism.

GNU Guix (& Nix)

Last but not least, we can use GNU Guix, a functional package management tool. Among its numerous powerful capabilities is the ability to also create Docker images. Guix distinguishes between runtime versus build dependencies of its packages; thus Docker images built by Guix will only contain programs explicitly specified by the packager, plus their runtime dependencies, just like the distroless approach. But unlike distroless, which requires you to chase down a program’s runtime dependencies yourself (and of course, also write the Dockerfile), doing so with Guix is as simple as running a single command: $ guix pack -f docker redis

The Redis Docker image created from the above command is about 70mb, which is also a significant reduction in size compared to the official image from Docker Hub. And although this is larger than the equivalent images created by the distroless and Alpine methods, using Guix does offer some other benefits not easily achieved by either. For example, if you would like your final image to also contain a shell for easy debugging like in Alpine, then it’s as simple as adding one to the list of packages to be packed by Guix: $ guix pack -f docker redis bash; likewise if you would like to include even more packages.

The functional nature of Guix means that packages can be built with 100% reproducibility. So the addition of Guix at the head of the deployment pipeline guarantees the creation of Docker images themselves to be reproducible, which is definitely something that cannot be said for images built from conventional Dockerfiles.

One disadvantage is that being a GNU project, the official Guix repository (or channel as its more formally termed) only contains free software. Luckily this is not a real problem in practice since plenty of third-party channels containing non-free software exist; and you can always create your own channel easily.

At this point you might be thinking, Guix sounds cool and all, but I don’t want to have to download and install another tool just to dockerize my application; not to mention Guix only works on Linux, and given how many developers are macOS users, setting up Guix justifiably is starting to sound like it might be more trouble than it’s worth. But once again, this is a solved problem in the resourceful DevOps ecosystem; Docker images for Guix itself exist on Docker Hub, so using it is no more complicated than a simple $ docker run guix command.

Note: the Redis package in the official Guix channel currently sits at v4.0.10, so the size comparison with the previous methods is not entirely 1-to-1. I have submitted a patch to update it to the latest v5.0.7.

Finally, the Nix package manager, the more well-known and widely-used cousin of Guix, deserves a mention. Although I am personally partial to Guix, which in my opinion is technically superior to Nix in every way, every point stated above for Guix is equally valid and applicable to Nix. Furthermore, Nix even offers some practical advantages over Guix:

before breaking changes introduced in Catalina, Nix also officially supports macOS; this functionality will likely be regained in due time
Nix being a project not affiliated with the FSF/GNU, as well as being older, has an ecosystem that contains a larger collection of packages, both free and non-free
for reasons related to the above, Nix also has more users, thereby likely also more documented resources

Combinations

Those who are naturally creative thinkers probably have already realized the vast possibilities the above methods (and likely others not mentioned in this article) can yield when combined. Here are two examples:

using Alpine images as the base of a multi-stage distroless build. This combines the benefit of a single statically-linked binary offered through Alpine, with the enhanced security that comes with using the distroless base image
define your own Guix package definition, which builds the desired program using musl, just like in Alpine; simultaneously reaping the benefits from Guix’s Dockerfile-less approach, and Alpine’s highly reduced image sizes. In fact, someone has done exactly this with Nix, creating a Redis image that’s <2mb in size, over 10x less than the same Redis version based off Alpine

Final Words

In closing, a number of great options exist for creating Docker images that are significantly smaller in size than the standard ones typically found on Docker Hub. Each method presented in this article has its own pros and cons; but if you are knowledgeable about the technical specifications of your project, then it should be straight-forward to choose and implement one of these methods. Furthermore, if and when one of these approaches become insufficient, with a little bit of creative thinking, they can be easily combined or even innovated upon to produce even greater effects.

Explore Gcore Container as a Service