Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
switch unnecessary add statements to copy and re-order layers for bui…
…ld efficiency
  • Loading branch information
jebentier committed Oct 27, 2017
commit 66dad333893810c3acf937042b56a03784757d9a
6 changes: 3 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
FROM jimschubert/8-jdk-alpine-mvn:1.0

COPY docker-entrypoint.sh /usr/local/bin/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COPY should be after apk add otherwise a change to the file would result in not using the cache install of OS packages.


ENV GEN_DIR /opt/swagger-codegen

RUN set -x && \
apk add --no-cache bash

RUN mkdir /opt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line could be changed to WORKDIR ${GEN_DIR} which will result in the directory being created.

https://docs.docker.com/engine/reference/builder/#workdir
snippet

If the WORKDIR doesn’t exist, it will be created even if it’s not used in any subsequent Dockerfile instruction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc, WORKDIR is evaluated and created after COPY. Do you know if this has changed? I can play around with it when I'm not in mobile.

Also, in the past, COPY did not work the same as mkdir -p, so I'm not certain WORKDIR would be a great solution in the case that we add new subfolders.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WORKDIR https://docs.docker.com/engine/reference/builder/#workdir

The WORKDIR instruction sets the working directory for any RUN, CMD, ENTRYPOINT, COPY and ADD instructions that follow it in the Dockerfile.

COPY https://docs.docker.com/engine/reference/builder/#copy

If <dest> doesn’t exist, it is created along with all missing directories in its path.

If any change is to be made to the existing Dockerfile, the following

FROM jimschubert/8-jdk-alpine-mvn:1.0

# install OS packages; should only change when new packages required
# or when the base image is updated.
RUN apk add --no-cache bash

# only changes if the value of GEN_DIR is changed
ENV GEN_DIR /opt/swagger-codegen
WORKDIR ${GEN_DIR}

# most frequent change as it includes the entire context
# copies all of the contents of ./modules/ to the directory created
# by the COPY command 'modules/'. This means if any new modules
# are added then they will get included. 
COPY ./modules/ ${GEN_DIR}/modules/

VOLUME ${MAVEN_HOME}/.m2/repository

# most costly to execute from a time perspective
RUN mvn -am -pl "modules/swagger-codegen-cli" package

# least likely to change but if any higher in the file it could result in the maven build running
COPY docker-entrypoint.sh /usr/local/bin/

ENTRYPOINT ["docker-entrypoint.sh"]
CMD ["build"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenjones-cisco I'm a little confused by your proposal, so I'd like to clarify a bit.

Is your suggestion basically to move WORKDIR up and replace the mkdir with it?

Also, are you suggesting to include only ./modules? You'd also need at a minimum the pom.xml from the root for this to even build. I'd think this is fine to not include all the samples and build related files in the image.

Being that WORKDIR, ENV, and VOLUME are 0B layers, I don't think there's any benefit to having them in different areas of the Dockerfile. These are build-time variables and specifying them along with docker run doesn't really make a difference as it doesn't change the built image, only the running container.

I also have some questions about your comments.

only changes if the value of GEN_DIR is changed

I don't understand this. The layer of the image doesn't change if you pass -e "GEN_DIR=/src". Also, because WORKDIR is defined within the build context as /opt/swagger-codegen, this doesn't change when you specify an override for GEN_DIR variable. Example:

$ docker run -it -e "GEN_DIR=/src" -v $PWD:/src swagger-codegen /bin/bash
bash-4.3# pwd
/opt/swagger-codegen

least likely to change but if any higher in the file it could result in the maven build running

I don't understand what this means. The addition of the entry point is 899B. It only matters that it comes after the apk add. Everything else in the Dockerfile is a 0B layer, or will change on every build (this image is built on every commit). The reason this is in a different location is because copying to /usr/local/bin is a common practice, and because tweaking the entrypoint and rebuilding means we don't have to add the whole project and run the compilation again when we want to modify the entrypoint... is this what you're referring to? It absolutely would result in the build running through again if this was like:

COPY docker-entrypoint.sh /usr/local/bin/
# 0 or more instructions, then…
RUN mvn -am -pl "modules/swagger-codegen-cli" package

…and we were tweaking docker-entrypoint.sh. This is why I had it at the bottom of the file to begin with.

However, regardless of where this instruction is in the dockerfile, the maven build will always run when you run docker build -t imagename . on a new branch (this is where I'm confused by the statement).


tl;dr

I think your proposal would work if we add the pom.xml and clean up some of the comments as they seem to discuss the image build context as if it's the container run context.

Also, I've realized that running with the default CMD of build is not valid. We should probably change it to langs or help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah it makes sense that it sets the working directory for all following instructions, otherwise that wouldn't really be intuitive. I've always used absolute paths for copy, so never had to consider WORKDIR there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenjones-cisco played around with this a bit, samples and other unnecessary files are already excluded by the dockerignore. Also, in addition to pom.xml we also need the checkstyles config.

Here's a working example:

FROM jimschubert/8-jdk-alpine-mvn:1.0

RUN set -x && \
    apk add --no-cache bash

ENV GEN_DIR /opt/swagger-codegen
WORKDIR ${GEN_DIR}
VOLUME  ${MAVEN_HOME}/.m2/repository

COPY ./LICENSE ${GEN_DIR}
COPY ./google_checkstyle.xml ${GEN_DIR}
COPY ./modules/ ${GEN_DIR}/modules
COPY ./pom.xml ${GEN_DIR}

RUN mvn -am -pl "modules/swagger-codegen-cli" package

COPY docker-entrypoint.sh /usr/local/bin/

ENTRYPOINT ["docker-entrypoint.sh"]

CMD ["help"]

Yet, because almost everything else is already excluded, we receive very little gains from restructuring the file in this way (top is built with my changes above, bottom is built from master):

swagger-codegen-dockerfile       latest              301867e4fa00        40 seconds ago      497MB
swagger-codegen                  latest              f1dbcf25e9df        About an hour ago   500MB

So, in the end... I think the only worthwhile change to the Dockerfile would be to change the CMD to help or langs. Even changing ADD to COPY has very little benefit as ADD internally does COPY with additional functionality for tgz extraction and externally added files. I'd say since we have to change CMD, we can change ADD -> COPY and maybe add comments. But breaking apart the COPY instruction has very little gain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree, the WORKDIR change and ADD -> COPY are good changes.

The one way that breaking apart the COPY instructions is positive is related to layer caching. When you do COPY . ${GEN_DIR}, and the only thing that changed was docker-entrypoint.sh then because you are copying from . then it will invalidate that cache.

Otherwise from a size perspective, there will not be any change or real value especially if the .dockerignore is already well defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: breaking apart modules into layers

do you think this is worth it? It'll benefit is who pull the latest tag often. I think that's beneficial enough to eat the additional maintenance (although this would be minimal as a cli container). I've can make the changes in a separate PR, if you agree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!


ADD . ${GEN_DIR}
COPY . ${GEN_DIR}

VOLUME ${MAVEN_HOME}/.m2/repository

WORKDIR ${GEN_DIR}

RUN mvn -am -pl "modules/swagger-codegen-cli" package

COPY docker-entrypoint.sh /usr/local/bin/

ENTRYPOINT ["docker-entrypoint.sh"]

CMD ["build"]
2 changes: 1 addition & 1 deletion modules/swagger-codegen-cli/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
FROM java:8-jre-alpine

ADD target/swagger-codegen-cli.jar /opt/swagger-codegen-cli/swagger-codegen-cli.jar
COPY target/swagger-codegen-cli.jar /opt/swagger-codegen-cli/swagger-codegen-cli.jar

ENTRYPOINT ["java", "-jar", "/opt/swagger-codegen-cli/swagger-codegen-cli.jar"]

Expand Down