Monorepo 탐색 #5: 완벽한 도커

안녕하세요, 요약부터 시작하겠습니다.
  • 2개의 앱과 3개의 라이브러리가 포함된 pnpm 기반 모노레포가 있습니다.
  • 해당 패키지는 모두 Docker화되어 있습니다.
  • GitHub 작업 파이프라인은 각 커밋에서 모든 패키지를 빌드합니다.

  • 오늘 우리는 문제 해결에 집중할 것입니다.
  • 소스 코드만 변경된 경우 종속성을 다시 설치하지 마십시오. 많은 시간을 낭비합니다.
  • 복사할 개별 패키지를 수동으로 지정하지 마십시오. 유지 관리가 엉망입니다.
  • 최종 이미지는 개발 종속성을 포함하지 않아야 하며 가능한 한 깔끔하고 최적이어야 합니다.

  • 이러한 문제가 어떻게 발생했는지에 대한 자세한 내용은 이전 블로그를 참조하십시오. 이제 문제를 해결하는 방법을 살펴보겠습니다.

    목차


  • Converging on a plan
  • Custom Context Script

  • Dockerfile
  • Cache the pnpm store

  • Strip dev-dependencies
  • Updating the CI Script
  • Conclusion

  • 계획에 수렴

    It's critical to understand that Docker caches each line in the Dockerfile, and that the output of one line is the input of the next. So if a line generates new output all subsequent caches are invalidated. With that in mind, here's a common Docker anti-pattern that causes issue 1:

    COPY . .
    RUN pnpm install
    

    If anything changes in any file then pnpm install has to run from scratch, because the COPY . . would produce a different output. This should always be optimized so only the files necessary to install dependencies are copied in first, then dependencies are installed, and then the rest of the source-files are copied in. Something like this:

    COPY package.json .
    COPY pnpm-lock.yaml .
    COPY pnpm-workspaces.yaml .
    COPY apps/web/package.json ./apps/web/
    COPY libs/types/package.json ./libs/types/
    RUN pnpm install
    COPY . .
    

    Now all steps up to and including pnpm install remain cached so long as none of those meta-files change, and so Docker will skip all those steps. This is a massive speedup.

    The downside is we're now manually specifying all those meta-files ☹️. And that leads to issue 2:

    Using the COPY <meta-file> construct scales poorly because we have to author each Dockerfile with explicit and detailed information about which dependencies to copy in. And by using the COPY . . construct we copy all monorepo files, which needlessly bloats the image because for this example we only need the source-files from apps/web and libs/types (it's but web only depends on types ).

    The key insight is that pnpm already understands how dependencies depend on each other, so we should be able to leverage that. We can't use pnpm directly from Dockerfile's COPY construct, but what if we use pnpm to generate a context that only contains the files needed for a specific package? Then the Dockerfile for that package could use COPY . . but it'd actually only copy in just the right files…

    And, hang on, lets consider the meta-files too. The challenge is we can't isolate all the package.json files easily so we resort to path-specific COPY commands, but what if we get really clever and create our custom context such that all the meta-files are placed in a /meta folder inside the context for easy copying, and we put the rest of the source-files in another folder?

    Let's see if that'll work!

    사용자 정의 컨텍스트 스크립트

    We introduced the custom context technique where we simply piped tar into Docker:

    $ cd apps/web
    $ tar -cf - ../.. | docker build -f apps/web/Dockerfile -
    

    Now it's time we discard the naive tar command and come up with something more bespoke.

    I've made a script that takes a Dockerfile and finds just the right files needed for that package, and outputs it all as a tarball so it's a drop-in replacement for the tar command.

    ℹ️ BTW, the full script is available on GitHub1s.com if you'd like to have a look.



    사용 방법은 다음과 같습니다.

    $ pnpm --silent pnpm-context -- --list-files apps/web/Dockerfile
    Dockerfile
    deps/libs/types/.gitignore
    deps/libs/types/Dockerfile
    deps/libs/types/libs-types.iml
    deps/libs/types/package.json
    deps/libs/types/src/index.ts
    deps/libs/types/tsconfig.json
    meta/apps/web/package.json
    meta/libs/types/package.json
    meta/package.json
    meta/pnpm-lock.yaml
    meta/pnpm-workspace.yaml
    pkg/apps/web/.gitignore
    pkg/apps/web/apps-web.iml
    pkg/apps/web/package.json
    pkg/apps/web/src/client.tsx
    pkg/apps/web/src/index.ts
    pkg/apps/web/src/node.d.ts
    pkg/apps/web/src/pages/App.css
    pkg/apps/web/src/pages/App.tsx
    pkg/apps/web/src/pages/Home.css
    pkg/apps/web/src/pages/Home.spec.tsx
    pkg/apps/web/src/pages/Home.tsx
    pkg/apps/web/src/pages/react.svg
    pkg/apps/web/src/server.tsx
    pkg/apps/web/tsconfig.json
    pkg/apps/web/typings/index.d.ts
    


    이제 린 컨텍스트입니다! "libs/types"및 "apps/web"파일만 존재하고 파일이 "deps", "meta"및 "pkg"의 세 폴더로 분할되는 방법에 주목하십시오. 이것이 Dockerfile에서 메타 파일만 복사하는 데 사용할 메커니즘이지만 잠시 후에 살펴보겠습니다.

    사실 이 컨텍스트는 너무 간결합니다 😅: 루트tsconfig.json 파일은 pnpm이 사용 여부를 알 수 없기 때문에 포함되지 않지만 패키지는 이에 의존합니다. 그리고 bin/postinstall 스크립트도 필요합니다. 이 문제를 해결하기 위해 -p 인수를 사용하여 추가 포함 패턴을 지정할 수 있습니다.

    $ pnpm --silent pnpm-context -- -p 'tsconfig.json' -p 'bin/' --list-files apps/web/Dockerfile
    ...
    pkg/bin/preinstall
    pkg/tsconfig.json
    


    ℹ️ BTW, the repository actually calls pnpm-context.mjs with a few more arguments, see the "docker:build" script in package.json on GitHub1s.com for all the details.



    이제 컨텍스트가 좋습니다. Docker로 파이프하여 이미지를 빌드하는 방법을 살펴보겠습니다.

    $ pnpm --silent pnpm-context -- -p 'tsconfig.json' -p 'bin/'\
    apps/web/Dockerfile | docker build --build-arg PACKAGE_PATH=apps/web - -t mono-web
    [+] Building 3.1s (19/19) FINISHED
    


    효과가있다! 그러나 Dockerfile이 이 새로운 컨텍스트에서 실제로 어떻게 작동하는지 봅시다.

    도커파일

    ℹ️ BTW, in this article I'll only show explanatory snippets/examples of the Dockerfile, but you can see the full Dockerfile on GitHub1s.com.



    새 사용자 지정 컨텍스트 하위 폴더를 사용하는 것은 매우 간단합니다. 다음은 새 Dockerfile이 구성되는 방식의 예입니다.

    ARG PACKAGE_PATH
    # ↑ Specified via Docker's `--build-arg` argument
    COPY ./meta .
    RUN pnpm install --filter "{${PACKAGE_PATH}}..." --frozen-lockfile
    # ↑ `...` selects the package and its dependencies
    
    COPY ./deps .
    RUN pnpm build --if-present --filter "{${PACKAGE_PATH}}^..."
    # ↑ `^...` ONLY selects the dependencies of the package, but not the package itself
    
    COPY ./pkg .
    RUN pnpm build --if-present --filter "{${PACKAGE_PATH}}"
    RUN pnpm test --if-present --filter "{${PACKAGE_PATH}}"
    
    # Everything's built and good to go 🎉
    


    이 구조pnpm install를 사용하면 메타 파일이 변경되고 Dockerfile에 수동으로 지정된 패키지별 경로가 포함되지 않은 경우에만 실행됩니다. 우리는 문제 #1과 2를 분쇄했습니다! 🎉

    pnpm 저장소 캐시

    It's fine we preserve the pnpm install cache as much as we can, but when it does have to run it frustratingly re-downloads every single dependency from scratch. That's very wasteful in time and bandwidth! On our own machines pnpm downloads to a persisted store so it never has to re-download a package, but that store never gets persisted inside Docker because it evaporates as soon as a meta-file changes.

    But Docker has a mechanism for exactly this: It allows a RUN command to mount a folder which is persisted on the host machine, so when the command runs it has access to files from previous runs. The code for this ends up a bit complex-looking, but it's worth the performance boost so let's try it out:

    ARG PACKAGE_PATH
    COPY ./meta .
    RUN --mount=type=cache,id=pnpm-store,target=/root/.pnpm-store\
     # ↑ By caching the content-addressable store we stop
     # downloading the same dependencies again and again.
     # Unfortunately, doing this causes Docker to place 
     # the pnpm content-addressable store on a different
     # virtual drive, which prohibits pnpm from 
     # symlinking its content to its virtual store,
     # and that causes pnpm to fall back on copying the
     # files, and… that's totally fine! Except pnpm emits 
     # many warnings that its not using symlinks, so 
     # we also must use `grep` to filter out those warnings.
     pnpm install --filter "{${PACKAGE_PATH}}..." \
         --frozen-lockfile\
     | grep --invert-match "cross-device link not permitted\|Falling back to copying packages from store"
    # ↑ Using `--invert-match` to discard annoying output
    

    It would be nice if we could tell pnpm to be quiet when it can't symlink, but we can survive this complexity.

    개발 종속성 제거

    We've reached the last issue: We're bloating the final image with dev-dependencies because we don't clean up after building apps/web inside the image. It's a waste we shouldn't allow.

    The solution is to reset back to having no dependencies installed, and then only installing the production dependencies. This is pretty straightforward to do by using Docker stages:

    FROM node:16-alpine AS base
    # Install pnpm
    
    FROM base AS dev
    # Install all dependencies and build the package
    
    FROM base as prod
    # Install just prod dependencies
    

    With this approach the "prod" stage isn't affected by whatever happens in the "dev" stage. Nice! But because dev builds the package we do need some way to transfer files from dev to prod, because we need the final build code to be moved to prod stage. For that we can introduce an "assets" layer where we isolate just the files that should go into the prod stage. So we can do something like this:

    FROM node:16-alpine AS base
    RUN npm --global install pnpm
    WORKDIR /root/monorepo
    
    FROM base AS dev
    # Install all dependencies and build the package
    
    FROM dev AS assets
    RUN rm -rf node_modules && pnpm recursive exec -- rm -rf ./node_modules ./src
    # ↑ Reset back to no dependencies installed, and delete all
    # src folders because we don't need source-files. 
    # This way whatever files got built are left behind.
    
    FROM base as prod
    pnpm install --prod --filter "{${PACKAGE_PATH}}..."
    # ↑ Install just prod dependencies
    COPY --from=assets /root/monorepo .
    

    So here the "assets" stage isolates whatever code was generated in the dev stage, which the prod stage then copies into itself. Does it work?

    $ cd apps/web
    $ pnpm build
    $ docker run mono-web
    [razzle] > Started on port 3000
    

    🎉

    CI 스크립트 업데이트

    It's one thing to get all this working locally, but we also need to update our GitHub Actions CI script.

    ℹ️ BTW, but you can see the full CI script on GitHub1s.com.



    첫 번째 문제는 필요한 종속성을 실제로 설치하지 않기 때문에 pnpm-context.mjs 스크립트를 전혀 실행하지 않는다는 것입니다. 그렇게 하려면 모노 저장소의 루트에 대해서만 pnpm install를 실행해야 합니다. pnpm/action-setup 라는 Github Action을 사용하면 쉽게 할 수 있습니다. pnpm을 설치하고 pnpm install 실행할 수 있으므로 monorepository에 대한 종속성을 설치하도록 지시할 수 있습니다.

          - uses: pnpm/action-setup@v2
            with:
              run_install: |
                - args: [--frozen-lockfile, --filter "exploring-the-monorepo"]
    


    하지만 또 다른 흥미로운 오류가 발생합니다. mount 기능(pnpm 저장소를 캐시하기 위해)을 사용하기 때문에 Docker 빌드가 실패하고 이를 사용하려면 "Buildkit"모드를 활성화해야 합니다. Buildkit은 기본적으로 아직 활성화되지 않은 Docker의 향후 기능 세트이며 솔루션은 다소 간단합니다. 환경 변수 설정DOCKER_BUILDKIT:

    $ DOCKER_BUILDKIT=1 docker build
    


    결론

    The issues we set out to vanquish have been resolved 🎉. We now build images that play nice with Docker caching, the Dockerfiles are free from manually-specified dependency concerns, and the final images are very lean and optimal. Quite nice!

    I feel the pnpm investment is really paying off, it was already a nice CLI to use but how amazing they also have a pretty straightforward API to use programmatically to do our dependency-graph logic!

    This article's title promised "perfect", did we achieve that? Well, no, perfection is a high bar, but we've addressed all the practical concerns I've experienced so I'm happy to call it a day here. We wouldn't want to get too carried away after all 👀 (I think for some, this entire article-series is already deep into "carried away" territory).

    I'd love to hear if you have any questions or comments, or if there are any directions you'd like to see explored in future articles. So please leave a comment.

    좋은 웹페이지 즐겨찾기