tutorial,

AI - Installing TensorFlow GPU on Ubuntu with apt

中文阅读

Intro

  • Ubuntu 18.04.5 LTS
  • GTX 1070
  • TensorFlow 2.4.1

Software requirements

Pre-installation Actions

GCC

1
2
3
4
5
6
7
8
9
$ gcc --version
Command 'gcc' not found, but can be installed with:
sudo apt install gcc
$ sudo apt install gcc
$ gcc --version
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

NVIDIA package repositories

1
2
3
4
5
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
$ sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
$ sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
$ sudo apt-get update

NVIDIA machine learning

1
2
3
4
$ wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb

$ sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
$ sudo apt-get update

NVIDIA GPU driver

1
$ sudo apt-get install --no-install-recommends nvidia-driver-460

Note: You need to use version 460 here. The official website of TensorFlow writes 450. The actual test fails.

Reboot and check that GPUs are visible using the following command.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ nvidia-smi
Mon Apr  5 16:17:17 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    On   | 00000000:01:00.0  On |                  N/A |
|  0%   48C    P8     9W / 180W |    351MiB /  8111MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       997      G   /usr/lib/xorg/Xorg                 18MiB |
|    0   N/A  N/A      1145      G   /usr/bin/gnome-shell               53MiB |
|    0   N/A  N/A      1353      G   /usr/lib/xorg/Xorg                108MiB |
|    0   N/A  N/A      1495      G   /usr/bin/gnome-shell               83MiB |
|    0   N/A  N/A      1862      G   ...AAAAAAAAA= --shared-files       82MiB |
+-----------------------------------------------------------------------------+

CUDA ToolKit and cuDNN

1
2
3
4
5
6
7
8
9
$ wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
$ sudo apt install ./libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
$ sudo apt-get update

# Install development and runtime libraries (~4GB)
$ sudo apt-get install --no-install-recommends \
    cuda-11-0 \
    libcudnn8=8.0.4.30-1+cuda11.0  \
    libcudnn8-dev=8.0.4.30-1+cuda11.0

TensorRT

1
2
3
$ sudo apt-get install -y --no-install-recommends libnvinfer7=7.1.3-1+cuda11.0 \
    libnvinfer-dev=7.1.3-1+cuda11.0 \
    libnvinfer-plugin7=7.1.3-1+cuda11.0

Miniconda

Download Python 3.8 installation script from https://docs.conda.io/en/latest/miniconda.html.

Make the installation script executable

1
$ chmod +x Miniconda3-latest-Linux-x86_64.sh

Run the installation script

1
$ ./Miniconda3-latest-Linux-x86_64.sh

Restart the terminal and make conda activate.

Virtual environment

Create and activate a virtual environment.

1
2
$ conda create -n tensorflow python=3.8.5
$ conda activate tensorflow

Install TensorFlow

1
$ pip install tensorflow==2.4.1

Verify the install

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ python -c "import tensorflow as tf;print('Num GPUs Available: ', len(tf.config.list_physical_devices('GPU')))"
2021-04-05 16:20:00.426536: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-04-05 16:20:01.170305: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-05 16:20:01.170830: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-04-05 16:20:01.198917: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-05 16:20:01.199497: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1070 computeCapability: 6.1
coreClock: 1.7845GHz coreCount: 15 deviceMemorySize: 7.92GiB deviceMemoryBandwidth: 238.66GiB/s
2021-04-05 16:20:01.199519: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-04-05 16:20:01.201250: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-04-05 16:20:01.201278: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-04-05 16:20:01.201995: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-04-05 16:20:01.202159: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-04-05 16:20:01.203993: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-04-05 16:20:01.204412: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-04-05 16:20:01.204499: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-04-05 16:20:01.204566: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-05 16:20:01.204897: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-05 16:20:01.205168: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
Num GPUs Available:  1

Install JupyterLab and matplotlib

1
$ pip install jupyterlab matplotlib

Run TensorFlow in JupyterLab

1
$ jupyter lab

JupyterLab will be automatically opened in a browser.

Download and import CNN notebook from https://www.tensorflow.org/tutorials/images/cnn.

Restart Kernel and Run All Cells

When the train is running, check GPU processes, you can see ...nvs/tensorflow/bin/python, which means that the GPU is being used to train the model.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ nvidia-smi
Mon Apr  5 16:36:28 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    On   | 00000000:01:00.0  On |                  N/A |
| 23%   54C    P2    72W / 180W |   7896MiB /  8111MiB |     55%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       997      G   /usr/lib/xorg/Xorg                 18MiB |
|    0   N/A  N/A      1145      G   /usr/bin/gnome-shell               73MiB |
|    0   N/A  N/A      1353      G   /usr/lib/xorg/Xorg                136MiB |
|    0   N/A  N/A      1495      G   /usr/bin/gnome-shell               53MiB |
|    0   N/A  N/A      1862      G   ...AAAAAAAAA= --shared-files       99MiB |
|    0   N/A  N/A      3181      C   ...nvs/tensorflow/bin/python     7507MiB |
+-----------------------------------------------------------------------------+

Install VSCode

Go to Official Website to download and install VSCode.

Open VSCode and install Python support.

Open the created ~/tensorflow-notebook/01-hello/hello.ipynb, and select Python as the created virtual environment.

Run TensorFlow in VSCode

Summary

At this point, the deep learning environment has been set up. You can choose to use the command line, JupyterLab or VSCode for development according to your own habits.

Further reading

Reference

CatchZeng
Written by CatchZeng Follow
AI (Machine Learning) and DevOps enthusiast.