cft

Using GPU instance on GCP from VS Code

A step by step guide


user

Anuj Arora

12 days ago | 10 min read

Step by step tutorial

Install gcloud CLI

Following the instructions at https://cloud.google.com/sdk/docs/install for Google Cloud CLI installation.

Setup a gcloud project

Once the gcloud is enter gcloud projects create demoproject32, followed by gcloud config set project demoproject32 on the terminal to set up a new project configuration.

If everything goes fine, gcloud config configurations list, should return the following:

Setting up a GCP Instance

When doing it for the first time, login to your Google Cloud account and follow the steps below

Step 1:

Step 2:

Step 3:

Pick a name for the instance and the type of GPU (NVIDIA V100, n1-standard-8 CPU here)

Step 4:

Pick a boot disk. For this example I picked Ubuntu 20.04 LTS with 100 GB disk space.

Step 5:

Create the instance

Second time onwards, we do not need to setup an instance from scratch. One can save a machine image of existing instance and clone a new instance. Procedure for cloning is as follows:

Step 1:

Create machine image

Step 2: After Step 1 & 2 of precious section, begin the procedure for creating ‘New VM instance from machine image’.

This also eliminates the need for setting up Docker and NVIDIA dependencies (discussed below).

Fixing IP Address

Reserve an IP address for the created instance

Ensure that the reserved external IP is correctly assigned to the VM Instance.

Setting up SSH connection

SSH Key configuration

(.venv) ***** ~ % gcloud compute ssh anuj@instance-1 --ssh-key-file ~/********

No zone specified. Using zone [us-central1-a] for instance: [instance-1].

Warning: Permanently added 'compute.****************' (********) to the list of known hosts.

Welcome to Ubuntu 20.04.5 LTS (GNU/Linux 5.15.0-1025-gcp x86_64)

* Documentation: https://help.ubuntu.com

* Management: https://landscape.canonical.com

* Support: https://ubuntu.com/advantage

System information as of Thu Dec 15 07:02:26 UTC 2022

System load: 0.07 Processes: 159

Usage of /: 1.9% of 96.73GB Users logged in: 0

Memory usage: 1% IPv4 address for ens5: 10.128.0.12

Swap usage: 0%

0 updates can be applied immediately.

The programs included with the Ubuntu system are free software;

the exact distribution terms for each program are described in the

individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by

applicable law.

Config Setting

% nano ~/.ssh/config

...

Host instance-1

HostName 34.69.84.97

User anuj

These settings allow direct SSH from VS Code without gcloud command.

(.venv) **** ~ % ssh anuj@instance-1

The authenticity of host '34.69.84.97 (34.69.84.97)' can't be established.

***** key fingerprint is SHA256:*****/****/*************.

This key is not known by any other names

Are you sure you want to continue connecting (yes/no/[fingerprint])? yes

Warning: Permanently added '34.69.84.97' (*******) to the list of known hosts.

Linux instance-1 5.10.0-19-cloud-amd64 #1 SMP Debian 5.10.149-2 (2022-10-21) x86_64

The programs included with the Debian GNU/Linux system are free software;

the exact distribution terms for each program are described in the

individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent

permitted by applicable law.

Last login: Thu Dec 15 04:58:03 2022 from 103.181.57.41

SSH Connection through VS code

In a new VS Code window, connect to the configured host (instance-1).

Install CUDA

Follow instructions for CUDA installation. It may take some time. With successful installation it is possible to use nvidia-smi

anuj@instance-1:~$ nvidia-smi

Thu Dec 15 07:16:53 2022

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

| | | MIG M. |

|===============================+======================+======================|

| 0 Tesla V100-SXM2... Off | 00000000:00:04.0 Off | 0 |

| N/A 35C P0 38W / 300W | 0MiB / 16160MiB | 1% Default |

| | | N/A |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: |

| GPU GI CI PID Type Process name GPU Memory |

| ID ID Usage |

|=============================================================================|

| No running processes found |

+-----------------------------------------------------------------------------+

Install Docker & docker-compose

Follow instructions here.

Give sudo access

sudo usermod -aG docker $USER

Setting up NVIDIA Container Toolkit

NVIDIA Container Toolkit installation guide.

After the installation, please restart the instance in case sudo docker run --rm --gpus all nvidia/cuda:11.1.1-devel-ubuntu20.04 nvidia-smi throws an error. Successful installation should result in

anuj@instance-1:~/demo$ sudo docker run --rm --gpus all nvidia/cuda:11.1.1-devel-ubuntu20.04 nvidia-smi

Unable to find image 'nvidia/cuda:11.1.1-devel-ubuntu20.04' locally

11.1.1-devel-ubuntu20.04: Pulling from nvidia/cuda

eaead16dc43b: Pull complete

bf6432aaa1f9: Pull complete

4d0885fcd6fe: Pull complete

753b0c7e02bc: Pull complete

9a32602188bd: Pull complete

4f0ddf33eba9: Pull complete

55974925e8e7: Pull complete

24b6db69a8ed: Pull complete

48e30c06025e: Pull complete

Digest: sha256:7bf31dd3390171b85508d2279c498b7db823b523ca7a0b580cbb9067d1f9767c

Status: Downloaded newer image for nvidia/cuda:11.1.1-devel-ubuntu20.04

Thu Dec 15 07:41:39 2022

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

| | | MIG M. |

|===============================+======================+======================|

| 0 Tesla V100-SXM2... On | 00000000:00:04.0 Off | 0 |

| N/A 34C P0 24W / 300W | 0MiB / 16160MiB | 0% Default |

| | | N/A |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: |

| GPU GI CI PID Type Process name GPU Memory |

| ID ID Usage |

|=============================================================================|

| No running processes found |

+-----------------------------------------------------------------------------+

Building & Running a docker container

Build: sudo docker build -t demo-container -f Dockerfile .

FROM nvidia/cuda:11.1.1-devel-ubuntu20.04

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-key del 7fa2af80

RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub

RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/7fa2af80.pub

#### System package (uses default Python 3 version in Ubuntu 20.04)

RUN apt-get update -y && \

apt-get install libgl1 -y \

git python3 python3-dev libpython3-dev python3-pip sudo wget nano tmux cmake g++ gcc curl \

unzip less htop iftop iotop \

libglib2.0-0 libsm6 libxext6 libxrender1 && \

update-alternatives --install /usr/bin/python python /usr/bin/python3 1 && \

update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 1 && \

pip install --upgrade pip && \

pip install gpustat --no-cache-dir

#### OPENMPI

ENV OPENMPI_BASEVERSION=4.1

ENV OPENMPI_VERSION=${OPENMPI_BASEVERSION}.0

RUN mkdir -p /build && \

cd /build && \

wget -q -O - https://download.open-mpi.org/release/open-mpi/v${OPENMPI_BASEVERSION}/openmpi-${OPENMPI_VERSION}.tar.gz | tar xzf - && \

cd openmpi-${OPENMPI_VERSION} && \

./configure --prefix=/usr/local/openmpi-${OPENMPI_VERSION} && \

make -j"$(nproc)" install && \

ln -s /usr/local/openmpi-${OPENMPI_VERSION} /usr/local/mpi && \

# Sanity check:

test -f /usr/local/mpi/bin/mpic++ && \

cd ~ && \

rm -rf /build

# Needs to be in docker PATH if compiling other items & bashrc PATH (later)

ENV PATH=/usr/local/mpi/bin:${PATH} \

LD_LIBRARY_PATH=/usr/local/lib:/usr/local/mpi/lib:/usr/local/mpi/lib64:${LD_LIBRARY_PATH}

# Create a wrapper for OpenMPI to allow running as root by default

RUN mv /usr/local/mpi/bin/mpirun /usr/local/mpi/bin/mpirun.real && \

echo '#!/bin/bash' > /usr/local/mpi/bin/mpirun && \

echo 'mpirun.real --allow-run-as-root --prefix /usr/local/mpi "$@"' >> /usr/local/mpi/bin/mpirun && \

chmod a+x /usr/local/mpi/bin/mpirun

#### Python packages

RUN pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html --no-cache-dir && pip cache purge

COPY requirements.txt .

## Install APEX

RUN pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" git+https://github.com/NVIDIA/apex.git@a651e2c24ecf97cbf367fd3f330df36760e1c597

RUN pip install -r requirements.txt && pip cache purge

WORKDIR /demo

Compose: sudo docker compose -f docker-compose.yml -p demo-container up -d

version: "3"

services:

demo-container:

image: demo-container:latest

volumes:

- ./:/demo-container

build:

context: .

dockerfile: ./Dockerfile

deploy:

resources:

reservations:

devices:

- driver: nvidia

count: 1

capabilities: [gpu]

tty: true

shm_size: '1gb'

ulimits:

memlock: -1

network_mode: "host"

Run: sudo docker exec -it demo-container-demo-container-1 bash

With this you should be able to enter the container environment from the comfort of VS Code.

Upvote


user
Created by

Anuj Arora

Machine Learning Engineer


people
Post

Upvote

Downvote

Comment

Bookmark

Share


Related Articles