Using GPU instance on GCP from VS Code
A step by step guide
Anuj Arora
Step by step tutorial
Install gcloud CLI
Following the instructions at https://cloud.google.com/sdk/docs/install for Google Cloud CLI installation.
Setup a gcloud project
Once the gcloud is enter gcloud projects create demoproject32, followed by gcloud config set project demoproject32 on the terminal to set up a new project configuration.
If everything goes fine, gcloud config configurations list, should return the following:
Setting up a GCP Instance
When doing it for the first time, login to your Google Cloud account and follow the steps below
Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
Second time onwards, we do not need to setup an instance from scratch. One can save a machine image of existing instance and clone a new instance. Procedure for cloning is as follows:
Step 1:
Step 2: After Step 1 & 2 of precious section, begin the procedure for creating ‘New VM instance from machine image’.
This also eliminates the need for setting up Docker and NVIDIA dependencies (discussed below).
Fixing IP Address
Ensure that the reserved external IP is correctly assigned to the VM Instance.
Setting up SSH connection
SSH Key configuration
(.venv) ***** ~ % gcloud compute ssh anuj@instance-1 --ssh-key-file ~/********
No zone specified. Using zone [us-central1-a] for instance: [instance-1].
Warning: Permanently added 'compute.****************' (********) to the list of known hosts.
Welcome to Ubuntu 20.04.5 LTS (GNU/Linux 5.15.0-1025-gcp x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
System information as of Thu Dec 15 07:02:26 UTC 2022
System load: 0.07 Processes: 159
Usage of /: 1.9% of 96.73GB Users logged in: 0
Memory usage: 1% IPv4 address for ens5: 10.128.0.12
Swap usage: 0%
0 updates can be applied immediately.
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
Config Setting
% nano ~/.ssh/config
...
Host instance-1
HostName 34.69.84.97
User anuj
These settings allow direct SSH from VS Code without gcloud command.
(.venv) **** ~ % ssh anuj@instance-1
The authenticity of host '34.69.84.97 (34.69.84.97)' can't be established.
***** key fingerprint is SHA256:*****/****/*************.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '34.69.84.97' (*******) to the list of known hosts.
Linux instance-1 5.10.0-19-cloud-amd64 #1 SMP Debian 5.10.149-2 (2022-10-21) x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Dec 15 04:58:03 2022 from 103.181.57.41
SSH Connection through VS code
In a new VS Code window, connect to the configured host (instance-1).
Install CUDA
Follow instructions for CUDA installation. It may take some time. With successful installation it is possible to use nvidia-smi
anuj@instance-1:~$ nvidia-smi
Thu Dec 15 07:16:53 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:00:04.0 Off | 0 |
| N/A 35C P0 38W / 300W | 0MiB / 16160MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Install Docker & docker-compose
Follow instructions here.
Give sudo access
sudo usermod -aG docker $USER
Setting up NVIDIA Container Toolkit
NVIDIA Container Toolkit installation guide.
After the installation, please restart the instance in case sudo docker run --rm --gpus all nvidia/cuda:11.1.1-devel-ubuntu20.04 nvidia-smi throws an error. Successful installation should result in
anuj@instance-1:~/demo$ sudo docker run --rm --gpus all nvidia/cuda:11.1.1-devel-ubuntu20.04 nvidia-smi
Unable to find image 'nvidia/cuda:11.1.1-devel-ubuntu20.04' locally
11.1.1-devel-ubuntu20.04: Pulling from nvidia/cuda
eaead16dc43b: Pull complete
bf6432aaa1f9: Pull complete
4d0885fcd6fe: Pull complete
753b0c7e02bc: Pull complete
9a32602188bd: Pull complete
4f0ddf33eba9: Pull complete
55974925e8e7: Pull complete
24b6db69a8ed: Pull complete
48e30c06025e: Pull complete
Digest: sha256:7bf31dd3390171b85508d2279c498b7db823b523ca7a0b580cbb9067d1f9767c
Status: Downloaded newer image for nvidia/cuda:11.1.1-devel-ubuntu20.04
Thu Dec 15 07:41:39 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:00:04.0 Off | 0 |
| N/A 34C P0 24W / 300W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Building & Running a docker container
Build: sudo docker build -t demo-container -f Dockerfile .
FROM nvidia/cuda:11.1.1-devel-ubuntu20.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-key del 7fa2af80
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/7fa2af80.pub
#### System package (uses default Python 3 version in Ubuntu 20.04)
RUN apt-get update -y && \
apt-get install libgl1 -y \
git python3 python3-dev libpython3-dev python3-pip sudo wget nano tmux cmake g++ gcc curl \
unzip less htop iftop iotop \
libglib2.0-0 libsm6 libxext6 libxrender1 && \
update-alternatives --install /usr/bin/python python /usr/bin/python3 1 && \
update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 1 && \
pip install --upgrade pip && \
pip install gpustat --no-cache-dir
#### OPENMPI
ENV OPENMPI_BASEVERSION=4.1
ENV OPENMPI_VERSION=${OPENMPI_BASEVERSION}.0
RUN mkdir -p /build && \
cd /build && \
wget -q -O - https://download.open-mpi.org/release/open-mpi/v${OPENMPI_BASEVERSION}/openmpi-${OPENMPI_VERSION}.tar.gz | tar xzf - && \
cd openmpi-${OPENMPI_VERSION} && \
./configure --prefix=/usr/local/openmpi-${OPENMPI_VERSION} && \
make -j"$(nproc)" install && \
ln -s /usr/local/openmpi-${OPENMPI_VERSION} /usr/local/mpi && \
# Sanity check:
test -f /usr/local/mpi/bin/mpic++ && \
cd ~ && \
rm -rf /build
# Needs to be in docker PATH if compiling other items & bashrc PATH (later)
ENV PATH=/usr/local/mpi/bin:${PATH} \
LD_LIBRARY_PATH=/usr/local/lib:/usr/local/mpi/lib:/usr/local/mpi/lib64:${LD_LIBRARY_PATH}
# Create a wrapper for OpenMPI to allow running as root by default
RUN mv /usr/local/mpi/bin/mpirun /usr/local/mpi/bin/mpirun.real && \
echo '#!/bin/bash' > /usr/local/mpi/bin/mpirun && \
echo 'mpirun.real --allow-run-as-root --prefix /usr/local/mpi "$@"' >> /usr/local/mpi/bin/mpirun && \
chmod a+x /usr/local/mpi/bin/mpirun
#### Python packages
RUN pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html --no-cache-dir && pip cache purge
COPY requirements.txt .
## Install APEX
RUN pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" git+https://github.com/NVIDIA/apex.git@a651e2c24ecf97cbf367fd3f330df36760e1c597
RUN pip install -r requirements.txt && pip cache purge
WORKDIR /demo
Compose: sudo docker compose -f docker-compose.yml -p demo-container up -d
version: "3"
services:
demo-container:
image: demo-container:latest
volumes:
- ./:/demo-container
build:
context: .
dockerfile: ./Dockerfile
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
tty: true
shm_size: '1gb'
ulimits:
memlock: -1
network_mode: "host"
Run: sudo docker exec -it demo-container-demo-container-1 bash
With this you should be able to enter the container environment from the comfort of VS Code.
Upvote
Anuj Arora
A machine learning by profession, a lover of nature and new experiences.
Related Articles