Building thin Docker images using multi-stage build for your java apps!

The release of Docker CE 17.05 (EE 17.06) introduced a new feature that helps create thin Docker images by making it possible to divide the image building process into multiple stages, or in short words: multi-stage builds. This feature allows the reuse of artifacts produced in one stage by another stage. The main advantage of the multi-stage build feature is that it can help creating smaller images.

The old way

Let's look take a look at this Dockerfile that I'm using to build an image of my flight tracker application:

FROM maven:3.5.2-jdk-9  
COPY src /usr/src/app/src  
COPY pom.xml /usr/src/app  
RUN mvn -f /usr/src/app/pom.xml clean package

EXPOSE 8080  
ENTRYPOINT ["java","-jar","/usr/src/app/target/flighttracker-1.0.0-SNAPSHOT.jar"]  

My app is a Spring boot application, and as any java application, the build phase typically involves building the app and packaging the generated artifact into an image. I'm using maven as my build tool, which means it will download the required dependencies from repositories and keep them in the image. The number of JARs in the local repository could be significant depending upon the number of dependencies in the pom.xml, this can cause an unnecessary bloat in the image size at runtime. The total size of this image 980MB!

Worth nothing to mention that Spring boot generates fat jars. This means we could simply use a openjdk image as a base for my production image.

Also, One could suggest solving this by splitting the Dockerfile into two files. The first file will build the artifact and copy it to a common location using volumes. The second file will then pick up the generated artifact and then use the lean base image. This approach comes with drawbacks where the multiple Dockerfiles need to be maintained separately.

Moving on to a multi-stage build

with the multi-stage build, The Dockerfile can contain multiple FROM lines and each stage starts with a new FROM line and a fresh context. You can copy artifacts from stage to stage and the artifacts not copied over are discarded. This allows to keep the final image smaller and only include the relevant artifacts.

The updated Dockerfile for my application looks like this:

FROM maven:3.5.2-jdk-9 AS build  
COPY src /usr/src/app/src  
COPY pom.xml /usr/src/app  
RUN mvn -f /usr/src/app/pom.xml clean package

FROM openjdk:9  
COPY --from=build /usr/src/app/target/flighttracker-1.0.0-SNAPSHOT.jar /usr/app/flighttracker-1.0.0-SNAPSHOT.jar  
EXPOSE 8080  
ENTRYPOINT ["java","-jar","/usr/app/flighttracker-1.0.0-SNAPSHOT.jar"]  

Notice that there are two FROM instructions. This means this is a two-stage build. The maven:3.5.2-jdk-9 stage is the base image for the first build, It is named build. This is used to build the fat jar file for the application.
As for the openjdk:8, it's the second and the final base image for the build. the JAR file generated in the first stage is copied over to this stage using COPY --from syntax. This has the great benefit of reducing the overall size of the runtime image, by allowing us to accordingly choose the base image for the final image to meet the runtime needs. Additionally, the cruft from build time is discarded during intermediate stage. With this update, our production image is only 700MB.

Final word

There are certainly many other ways to craft your build cycle, but if you are using Dockerfile to build your artifact, then you should seriously consider multi-stage builds.


Credit:

Image taken from: https://www.slideshare.net/ozlerhakan/ignite-session-the-journey-of-multi-stage-builds-moby-project-and-linuxkit