GCP (or local machine) + Kaggle Docker + VSCode
GCP (or local machine) + Kaggle Docker + VSCode
This document describes how to setup Kaggle Python docker image environment on Google Cloud Platform (GCP) or your local machine by Docker and how to setup Visual Studio Code (VSCode) to connect the environment.
A primally information source comes from Kaggle's docker-python repository . Also, there is a guide , but unfortunately it's a bit obsoleted guide written in 2016.
Note: This method may take 20-30 minutes and over 18.5GB disks for data downloads.
Note: If you do not use VSCode, no need to read this document. See here .
All files in this document are available on my repository .
There are 2 options, GCP or local machine. If you are going to setup the environment on your local machine, skip to
[Option 2] Setup the environment on your local machine
section.[Option 1] Setup the environment on GCP
On GCP, "AI Platform Notebooks" would be easier than "Compute Engine" (GCE) to setup Kaggle Python docker image .
Create an AI Platform Notebook
kaggle-shopee-1
(You must create a project beforehand) NEW INSTANCE
Customize instance
kaggle-test-1
Kaggle Python [BETA]
(This option will automatically prepare Kaggle Python docker image at startup the VM instance) NVIDIA Tesla T4
(You must increase GPU quota beforehand)Install NVIDIA GPU driver automatically for me
Networking
sectionNetworks in this project
Allow proxy access when it's available
(This option will avoid to load unnecessary proxy Docker container) CREATE
docker pull
. If you choose GPU type: None
, it takes a few minutes. Check the console logs at here . Connect to the VM instance
Install Cloud SDK . If you are using macOS and Homebrew,
brew install --cask google-cloud-sdk
may be convenient. gcloud
command should be available on your terminal.% gcloud compute --project "kaggle-shopee-1" ssh --zone "us-west1-b" "kaggle-test-1" -- -L 8080:localhost:8080
Note: You must wait to start up the VM instance. Check the console logs at here .Note: I recommend to limit source IP ranges for SSH and RDP port. See here .
http://localhost:8080
token=...
.If you do not use VSCode, that's all. You do not have to do anything below.
Stop pre-installed Docker container
If you use VSCode to connect GCP Notebook, you must tweak Docker container. At the moment, VSCode can only access to remote Jupyter servers with
token
option enabled. But pre-installed Docker container disables token
option by c.NotebookApp.token = ''
. You must stop pre-installed Docker container and run a new Docker container with token
option enabled instead.% docker ps -a
% docker inspect -f "{{.Name}} {{.HostConfig.RestartPolicy.Name}}" $(docker ps -aq)
% docker update --restart no payload-container
% docker inspect -f "{{.Name}} {{.HostConfig.RestartPolicy.Name}}" $(docker ps -aq)
% docker stop payload-container
% docker ps -a
docker-compose
docker-compose
will be convenient to run containers, even on a single container. See details here .% sudo curl -L "https://github.com/docker/compose/releases/download/1.29.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
% sudo chmod +x /usr/local/bin/docker-compose
Skip to Run Docker container
section.[Option 2] Setup the environment on your local machine
If you setup the environment on your local machine, install and setup Docker .
After that,
docker
and docker-compose
commands should be available on your terminal.% docker -v
Docker version 20.10.5, build 55c4c88
% docker-compose -v
docker-compose version 1.28.5, build c4eb3a1f
Run Docker container (both GCP and local machine)
I prepared a sample repository of the
Dockerfile
, etc. If you do not care about details, execute these commands and skip to Open Notebook by web browser
section.% git clone https://github.com/susumuota/kaggleenv.git
% cd kaggleenv
% docker-compose build
% docker-compose up -d
% docker-compose logs
# Find and copy http://localhost:8080/?token=...
Otherwise, follow the instructions below.Create Dockerfile
Create a directory (e.g.
kaggleenv
) and go there. If you clone the sample repository, just cd kaggleenv
.Create
Dockerfile
like the following. See details here . If you use CPU instead of GPU, edit FROM
lines.# for CPU
# FROM gcr.io/kaggle-images/python:latest
# for GPU
FROM gcr.io/kaggle-gpu-images/python:latest
# apply patch to enable token and change notebook directory to /kaggle/working
# see jupyter_notebook_config.py.patch
COPY jupyter_notebook_config.py.patch /opt/jupyter/.jupyter/
RUN (cd /opt/jupyter/.jupyter/ && patch < jupyter_notebook_config.py.patch)
# add extra modules here
# RUN pip install -U pip
You can specify a tag (e.g. edit latest
to v99
) to keep using the same environment, otherwise it fetches latest one every time you build image. You can find tags from GCR page .Create jupyter_notebook_config.py.patch
This Docker image will run Jupyter Lab with startup script
/run_jupyter.sh
and config /opt/jupyter/.jupyter/jupyter_notebook_config.py
. It needs to be tweaked like the following./kaggle/working
jupyter_notebook_config.py.patch
like the following.--- jupyter_notebook_config.py.orig 2021-02-17 07:52:56.000000000 +0000
+++ jupyter_notebook_config.py 2021-04-05 06:19:23.640584228 +0000
@@ -4 +4 @@
-c.NotebookApp.token = ''
+# c.NotebookApp.token = ''
@@ -11 +11,2 @@
-c.NotebookApp.notebook_dir = '/home/jupyter'
+# c.NotebookApp.notebook_dir = '/home/jupyter'
+c.NotebookApp.notebook_dir = '/kaggle/working'
Note: This patch may not work in the future version of Kaggle Python docker image . In that case, create a new patch with diff -u original new > patch
. At least I confirmed this patch work on v99
tag.Create docker-compose.yml
Create
docker-compose.yml
like the following. See details here . This setting mounts current directory on your local machine to /kaggle/working
on the container. If you use CPU instead of GPU, comment out runtime: nvidia
.version: "3"
services:
jupyter:
build: .
volumes:
- $PWD:/kaggle/working
working_dir: /kaggle/working
ports:
- "8080:8080"
hostname: localhost
restart: always
# for GPU
runtime: nvidia
Create .dockerignore
Create
.dockerignore
like the following. See details here . This setting specifies subdirectories and files that should be ignored when building Docker images. You will mount the current directory, so you do not need to include subdirectories and files into image. Especially, input
directory should be ignored because it may include large files so that build process may take long time.README.md
input
output
.git
.gitignore
.vscode
.ipynb_checkpoints
Run docker-compose build
Run
docker-compose build
to build the Docker image. See details here .Note: This process may take 20-30 minutes and over 18.5GB disks for data downloads on your local machine.
% docker-compose build
Confirm the image by docker images
.% docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
kaggleenv_jupyter latest ............ 28 minutes ago 18.5GB
Run docker-compose up -d
Run
docker-compose up -d
to start Docker container in the background. In addition, the container will automatically run at startup VM instance or local machine. See details here and here .% docker-compose up -d
% docker ps -a
% docker inspect -f "{{.Name}} {{.HostConfig.RestartPolicy.Name}}" $(docker ps -aq)
Find the Notebook URL on the log and copy it.% docker-compose logs
http://localhost:8080/?token=...
Open Notebook by web browser
http://localhost:8080/?token=...
). Python 3
Notebook. !pwd
, !ls
and !pip list
to confirm Python environment. Setup Kaggle API
Setup Kaggle API credentials .
After that,
~/.kaggle/kaggle.json
file should be on your local machine.~/.kaggle/kaggle.json
to current directory on your local machine (so that it can be accessed from the container at /kaggle/working/kaggle.json
) % cp -p ~/.kaggle/kaggle.json .
/kaggle/working/kaggle.json
on the container. !ls -l /kaggle/working/kaggle.json
-rw------- 1 root root 65 Mar 22 07:59 /kaggle/working/kaggle.json
~/.kaggle
directory on the container. !cp -p /kaggle/working/kaggle.json ~/.kaggle/
kaggle.json
on the current directory on your local machine. % rm -i kaggle.json
kaggle
command on the Notebook. !kaggle competitions list
Shutdown the AI Platform Notebook (GCP)
After you finished your work, stop the VM instance.
STOP
or DELETE
DELETE
the VM instance, you will not be charged anything (as far as I know).However, if you
STOP
the VM instance, you will be charged for resources (e.g. persistent disk) until you DELETE
it. You should DELETE
if you do not use it for a long time (though you must setup the environment again). See details here .Run docker-compose down (local machine)
After you finished your work, run
docker-compose down
to stop Docker container. See details here .% docker-compose down
Setup VSCode to open remote Notebooks
If you are using Visual Studio Code (VSCode) , you can setup VSCode to connect to the remote Notebook.
[Optional] Install the latest Notebook extension
There is a revamped version of Notebook extension. See details here . I recommend installing it because this new version can handle custom extensions (e.g. key bindings) properly inside code cells, etc.
Connect to the remote Notebook
Connect to the remote Notebook. See details here .
Command Palette...
Jupyter: Specify local or remote Jupyter server for connections
Existing: Specify the URI of an existing server
http://localhost:8080/?token=...
) token
must be specified.Reload
button Command Palette...
Jupyter: Create New Blank Notebook
!pwd
, !ls
and !pip list
to confirm Python environment. Increase Docker memory (local machine)
Sometimes containers need much memory more than 2GB. You can increase the amount of memory from Docker preferences.
Preferences...
Resources
ADVANCED
Memory
slider over 2.00 GB
Apply & Restart
Maintain Docker containers, images and cache
Basically
docker-compose up -d
and docker-compose down
work well, but sometimes you may need to use these commands to maintain Docker containers, images and cache.% docker ps -a # confirm container ids to remove
% docker rm CONTAINER # remove container by id
% docker rm $(docker ps --filter status=exited -q) # remove all containers that have exited
% docker images # confirm image ids to remove
% docker rmi IMAGE # remove image by id
% docker system df # confirm how much disk used by cache
% docker builder prune
% docker volume prune
TODO
Links
https://amalog.hateblo.jp/entry/data-analysis-docker (Japanese)
Author
Susumu OTA
Reference
이 문제에 관하여(GCP (or local machine) + Kaggle Docker + VSCode), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://zenn.dev/susumuota/articles/gcp-kaggle-docker-vscode텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)