Try to install CNTK on a Linux machine, if you get it working you are very lucky. I tried it on three - I was unsuccessful each time. A colleague suggested me to try a CNTK docker image. Since I also needed a GPU, I decided to do it on GCP. Why I chose GCP is because it gives you $300 worth of free credits valid for 12 months.

I want to give a rundown of the steps I followed. I also created a few scripts to do almost everything from the terminal.

Google Cloud Platform

There are two modes of working with GCP - GCP dashboard in your browser and the GCP CLI. I’ll give details about the CLI way as after creating a VM you’ll be working through your Terminal only so why not do everything using your Terminal. Although, it’ll be helpful if you go through the Dashboard.

Before following along, create a project using the GCP Dashboard in your browser. This project will contain your VM instances. Go thorough this Creating and Managing Projects page to create your first project.

# Create environment variable for correct distribution
export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)"

# Add the Cloud SDK distribution URI as a package source
echo "deb http://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" | sudo tee \
    -a /etc/apt/sources.list.d/google-cloud-sdk.list

# Import the Google Cloud Platform public key
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

# Update the package list and install the Cloud SDK
sudo apt update && sudo apt-get install google-cloud-sdk

In the first instruction, you might face an error if your distro is not standard Debian/Ubuntu machine. For example, I work on Elementary OS, which is based on Ubuntu, but lsb_release -c -s gave the Elementary OS release name, not the standard Ubuntu release name. So, I found out the Ubuntu release my OS is based on and then manually set the CLOUD_SDK_REPO variable using that name.

To set up everything on your machine run the following command.

gcloud init

Follow the instructions on the terminal. It’ll ask to link your Google account associated with the GCP one. Followed by the default settings like default project, region etc. Select any region you want. If you don’t have any preference then select us-west1-b. This will give you all the GPU options available. Some regions don’t have all the options.

This quick-start GCP Documentation will show you what will happen once you run the command. It’ll also point you to the relevant points if you want some other things to take care of like proxies, etc.

To create and start a new VM instance now, you need to verify your payment mode. While verifying, it’ll do a test transaction (which they say is reverted after a few days; I haven’t received it yet though). This verification is to just ensure that you are legit. This GCP documentation page will help you if there’s any other question.

Now we will create a VM using gcloud from our terminal.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
echo "Current Google Cloud Projects with the linked Google account-"
gcloud projects list

export PROJECT_NAME="gcp-project"
echo "Selected project - "$PROJECT_NAME

# export INSTANCE_NAME="my-fastai-instance"
export INSTANCE_NAME="cntk-docker-vm"

export ZONE="us-west2-b" # budget: "us-west1-b"

# budget: 'type=nvidia-tesla-k80'
export INSTANCE_TYPE="n1-highmem-8" # budget: "n1-highmem-4"

# run "gcloud compute images list" to get a list of images and their families
# or "pytorch-1-0-cpu-experimental" for non-GPU instances
export IMAGE_FAMILY="pytorch-1-0-cu92-experimental"

export HDD="500GB"

gcloud compute instances create $INSTANCE_NAME \
        --zone=$ZONE \
        --image-family=$IMAGE_FAMILY \
        --image-project=deeplearning-platform-release \
        --maintenance-policy=TERMINATE \
        --accelerator="type=nvidia-tesla-p4,count=1" \
        --machine-type=$INSTANCE_TYPE \
        --boot-disk-size=$HDD \
        --metadata="install-nvidia-driver=True" \
        # --preemptible

Line 2 will list all the active projects you have made. Set that project name as done in line 4. If you have a specific zone then set it in line 10, else leave it as is. In line 13 you’ll set the type of GPU (or CPU if budget is low) you want. There are other higher ones as well. In line 19, set the amount of HDD you want. And that’s it. All the variables are set. All these parameters will be used to create a VM for you. You can put the above code segment into a separate script and just automate this step.

There’s also this setting (line 30) for your VM which is important to know about. The option preemptible means that your machine will be stopped after 24 hours of continuous running. It can also be preempted (stopped) with a 30 seconds notice at any time due to high load. This option is for beginners as preemptible instances are cheaper and prevents the extra charges if you forget to stop the instance after using it. The Preemptible VM Instances page will give you further details.

After the successful VM creation you can check its status displayed here - https://console.cloud.google.com/compute/. A green tick will be there to indicate the successful creation and running of the instance. You might get an error about reaching your limit of GPUs or quota. To solve this, you’ll need to request the GPU quota from the quotas menu under IAM & Admin. This SE question will help you out - https://serverfault.com/q/887256. You should have the billing verification done before making a request for the quota increase otherwise it’ll be rejected.

Once you have green tick in the dashboard against your VM you’ll be able to SSH into it.

1
2
3
4
5
6
7
8
9
export PROJECT_NAME="gcp-project"
echo "Selected project - "$PROJECT_NAME
gcloud config set project $PROJECT_NAME

export INSTANCE_NAME="cntk-docker-vm"
export ZONE="us-west2-b" # budget: "us-west1-b"

echo "Logging in..."
gcloud compute ssh --zone=$ZONE jupyter@$INSTANCE_NAME

For the first time, it’ll generate your SSH key followed by a prompt for a login password. In the subsequent runs, it’ll just ask for the password to log you in.

To start/stop the VM from CLI just follow the given commands.

1
2
3
4
5
6
7
8
9
10
11
12
export PROJECT_NAME="microsoft-ai-2018"
echo "Selected project - "$PROJECT_NAME
gcloud config set project microsoft-ai-2018

export INSTANCE_NAME="cntk-docker-trial"
export ZONE="us-west2-b" # budget: "us-west1-b"

echo "Starting..."
gcloud compute instances start --zone=$ZONE $INSTANCE_NAME

echo "Stopping..."
gcloud compute instances stop --zone=$ZONE $INSTANCE_NAME

CNTK Docker

Docker is like a virtual machine software. It runs containers. Containers are application images having all the libraries, tools, configurations needed to run the application. This saves the developer’s time by automatically setting up the development environment and the dependencies.

At first, I tried the Microsoft made CNTK docker image. I tried installing the NVIDIA Docker image having Python 3.5. Sadly, this gave me the similar error I was getting when I tried installing it on my own laptop. What’s the point of providing a Docker image if you’ll get errors even while/after installing it and you have to deal with the same errors which you’d have faced while installing it yourself.

After wasting a few hours because, I thought, I was doing a mistake while working with docker as it was my first time working with it. I deleted my VM instance and created a new one and followed the same steps, but again the same issue. Then my colleague gave me this DL Docker image link - Deepo. I installed the GPU version and it worked on the first try. I thank the Deepo guy(s) for providing the Docker images.

Instruction to pull the docker image-

docker pull ufoym/deepo

Instruction to launch the GPU docker image after installing it is in the following code segment. This code block can also be put into a script to launch the container in one go.

# Enables Deepo to use the GPU from inside a docker container.
nvidia-docker run --rm ufoym/deepo nvidia-smi

# Run the docker image + sharing the data b/w VM and the container
# This will mount the ~/msftai2018 directory at /msftai2018 in the Docker container
nvidia-docker run -it -v ~/working_dir:/working_dir ufoym/deepo bash

The above instructions and further more (Jupyter, CPU version of the image, customization) is listed in the readme of the Deepo repo.