Step by step tutorial

Install gcloud CLI

Following the instructions at https://cloud.google.com/sdk/docs/install for Google Cloud CLI installation.

Setup a gcloud project

Once the gcloud is enter gcloud projects create demoproject32, followed by gcloud config set project demoproject32 on the terminal to set up a new project configuration.

If everything goes fine, gcloud config configurations list, should return the following:

Setting up a GCP Instance

When doing it for the first time, login to your Google Cloud account and follow the steps below

Step 1:

Step 2:

Step 3:

Pick a name for the instance and the type of GPU (NVIDIA V100, n1-standard-8 CPU here)

Step 4:

Pick a boot disk. For this example I picked Ubuntu 20.04 LTS with 100 GB disk space.

Step 5:

Second time onwards, we do not need to setup an instance from scratch. One can save a machine image of existing instance and clone a new instance. Procedure for cloning is as follows:

Step 1:

Step 2: After Step 1 & 2 of precious section, begin the procedure for creating ‘New VM instance from machine image’.

This also eliminates the need for setting up Docker and NVIDIA dependencies (discussed below).

Fixing IP Address

Reserve an IP address for the created instance

Ensure that the reserved external IP is correctly assigned to the VM Instance.

Setting up SSH connection

SSH Key configuration


(.venv) ***** ~ % gcloud compute ssh anuj@instance-1 --ssh-key-file ~/********
No zone specified. Using zone [us-central1-a] for instance: [instance-1].
Warning: Permanently added 'compute.****************' (********) to the list of known hosts.
Welcome to Ubuntu 20.04.5 LTS (GNU/Linux 5.15.0-1025-gcp x86_64)
 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
  System information as of Thu Dec 15 07:02:26 UTC 2022
  System load:  0.07              Processes:             159
  Usage of /:   1.9% of 96.73GB   Users logged in:       0
  Memory usage: 1%                IPv4 address for ens5: 10.128.0.12
  Swap usage:   0%
0 updates can be applied immediately.
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

Config Setting


% nano ~/.ssh/config
...
Host instance-1
    HostName 34.69.84.97
    User anuj

These settings allow direct SSH from VS Code without gcloud command.


(.venv) **** ~ % ssh anuj@instance-1
The authenticity of host '34.69.84.97 (34.69.84.97)' can't be established.
***** key fingerprint is SHA256:*****/****/*************.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '34.69.84.97' (*******) to the list of known hosts.
Linux instance-1 5.10.0-19-cloud-amd64 #1 SMP Debian 5.10.149-2 (2022-10-21) x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Dec 15 04:58:03 2022 from 103.181.57.41

SSH Connection through VS code

In a new VS Code window, connect to the configured host (instance-1).

Install CUDA

Follow instructions for CUDA installation. It may take some time. With successful installation it is possible to use nvidia-smi


anuj@instance-1:~$ nvidia-smi
Thu Dec 15 07:16:53 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    38W / 300W |      0MiB / 16160MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Install Docker & docker-compose

Follow instructions here.

Give sudo access

sudo usermod -aG docker $USER

Setting up NVIDIA Container Toolkit

NVIDIA Container Toolkit installation guide.

After the installation, please restart the instance in case sudo docker run --rm --gpus all nvidia/cuda:11.1.1-devel-ubuntu20.04 nvidia-smi throws an error. Successful installation should result in


anuj@instance-1:~/demo$ sudo docker run --rm --gpus all nvidia/cuda:11.1.1-devel-ubuntu20.04 nvidia-smi
Unable to find image 'nvidia/cuda:11.1.1-devel-ubuntu20.04' locally
11.1.1-devel-ubuntu20.04: Pulling from nvidia/cuda
eaead16dc43b: Pull complete 
bf6432aaa1f9: Pull complete 
4d0885fcd6fe: Pull complete 
753b0c7e02bc: Pull complete 
9a32602188bd: Pull complete 
4f0ddf33eba9: Pull complete 
55974925e8e7: Pull complete 
24b6db69a8ed: Pull complete 
48e30c06025e: Pull complete 
Digest: sha256:7bf31dd3390171b85508d2279c498b7db823b523ca7a0b580cbb9067d1f9767c
Status: Downloaded newer image for nvidia/cuda:11.1.1-devel-ubuntu20.04
Thu Dec 15 07:41:39 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0    24W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Building & Running a docker container

Build: sudo docker build -t demo-container -f Dockerfile .


FROM nvidia/cuda:11.1.1-devel-ubuntu20.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-key del 7fa2af80
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/7fa2af80.pub
#### System package (uses default Python 3 version in Ubuntu 20.04)
RUN apt-get update -y && \
    apt-get install libgl1 -y \
        git python3 python3-dev libpython3-dev python3-pip sudo wget nano tmux cmake g++ gcc curl \
        unzip less htop iftop iotop \
        libglib2.0-0 libsm6 libxext6 libxrender1 && \
    update-alternatives --install /usr/bin/python python /usr/bin/python3 1 && \
    update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 1 && \
    pip install --upgrade pip && \
    pip install gpustat --no-cache-dir 
#### OPENMPI
ENV OPENMPI_BASEVERSION=4.1
ENV OPENMPI_VERSION=${OPENMPI_BASEVERSION}.0
RUN mkdir -p /build && \
    cd /build && \
    wget -q -O - https://download.open-mpi.org/release/open-mpi/v${OPENMPI_BASEVERSION}/openmpi-${OPENMPI_VERSION}.tar.gz | tar xzf - && \
    cd openmpi-${OPENMPI_VERSION} && \
    ./configure --prefix=/usr/local/openmpi-${OPENMPI_VERSION} && \
    make -j"$(nproc)" install && \
    ln -s /usr/local/openmpi-${OPENMPI_VERSION} /usr/local/mpi && \
    # Sanity check:
    test -f /usr/local/mpi/bin/mpic++ && \
    cd ~ && \
    rm -rf /build
# Needs to be in docker PATH if compiling other items & bashrc PATH (later)
ENV PATH=/usr/local/mpi/bin:${PATH} \
    LD_LIBRARY_PATH=/usr/local/lib:/usr/local/mpi/lib:/usr/local/mpi/lib64:${LD_LIBRARY_PATH}
# Create a wrapper for OpenMPI to allow running as root by default
RUN mv /usr/local/mpi/bin/mpirun /usr/local/mpi/bin/mpirun.real && \
    echo '#!/bin/bash' > /usr/local/mpi/bin/mpirun && \
    echo 'mpirun.real --allow-run-as-root --prefix /usr/local/mpi "$@"' >> /usr/local/mpi/bin/mpirun && \
    chmod a+x /usr/local/mpi/bin/mpirun
#### Python packages
RUN pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html --no-cache-dir  && pip cache purge
COPY requirements.txt .
## Install APEX
RUN pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" git+https://github.com/NVIDIA/apex.git@a651e2c24ecf97cbf367fd3f330df36760e1c597
RUN pip install -r requirements.txt && pip cache purge
WORKDIR /demo

Compose: sudo docker compose -f docker-compose.yml -p demo-container up -d


version: "3"
services:
  demo-container:
    image: demo-container:latest
    volumes:
      - ./:/demo-container
    build:
      context: .
      dockerfile: ./Dockerfile
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    tty: true
    shm_size: '1gb'
    ulimits:
      memlock: -1
    network_mode: "host"

Run: sudo docker exec -it demo-container-demo-container-1 bash

With this you should be able to enter the container environment from the comfort of VS Code.