https://davidstutz.de/upgrading-cuda-and-installing-cudnn-for-caffe-and-tensorflow/
Check Ubuntu Version:
lsb_release -a
Distributor ID: Ubuntu
Description: Ubuntu 18.04.2 LTS
Release: 18.04
Codename: bionic
If NVIDIA Driver is not installed:
cd ~/Downloads
sudo sh ./NVIDIA-Linux-x86_64-410.104.run
register the kernel module sources with dkms - no
32 bit - no
X config - yes
Check CUDA version:
nvcc --version
Check cuDNN version:
whereis cudnn.h
cudnn: /usr/include/cudnn.h
cat /usr/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 4
#define CUDNN_PATCHLEVEL 2
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
which means the version is 7.4.2.
Uninstalling:
Find the CUDA dir using:
which nvcc
Mine was:
rb@rbhost:~$ which nvcc
/usr/local/cuda-10.0/bin/nvcc
To uninstall the CUDA Toolkit, run the uninstall script in `/usr/local/cuda-10.0/bin`.
Check Ubuntu Version:
lsb_release -a
Distributor ID: Ubuntu
Description: Ubuntu 18.04.2 LTS
Release: 18.04
Codename: bionic
If NVIDIA Driver is not installed:
cd ~/Downloads
sudo sh ./NVIDIA-Linux-x86_64-410.104.run
register the kernel module sources with dkms - no
32 bit - no
X config - yes
nvcc --version
Check cuDNN version:
whereis cudnn.h
cudnn: /usr/include/cudnn.h
cat /usr/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 4
#define CUDNN_PATCHLEVEL 2
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
which means the version is 7.4.2.
Uninstalling:
Find the CUDA dir using:
which nvcc
Mine was:
rb@rbhost:~$ which nvcc
/usr/local/cuda-10.0/bin/nvcc
To uninstall the CUDA Toolkit, run the uninstall script in `/usr/local/cuda-10.0/bin`.
Open explorer. Ctrl+L to enter location: /usr/local/cuda-10.0/bin
cd /usr/local/cuda-10.0/bin
sudo ./uninstall_cuda_10.0.pl
Uninstall cuDNN
apt-cache search libcudnn
or
apt-cache search cudnn
rb@rbhost:~$ apt-cache search cudnn
libcudnn7 - cuDNN runtime libraries
libcudnn7-dev - cuDNN development libraries and headers
libcudnn7-doc - cuDNN documents and samples
sudo apt-get remove packagename
or
sudo dpkg --remove packagename
I did: (order is important)
sudo dpkg --remove libcudnn7-doc
sudo dpkg --remove libcudnn7-dev
sudo dpkg --remove libcudnn7
Also I had added these to bashrc:
export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
$ sudo nano ~/.bashrc
<remove those lines>
To save the changes, please press Ctrl + o, press enter to accept the changes and Ctrl + x to close nano. Then, to reload the .bashrc file with the changes made, please enter the following command to the Terminal:
$ source ~/.bashrc
Before:
rb@rbhost:~$ nvcc
bash: /usr/local/cuda-10.0/bin/nvcc: No such file or directory
After:
rb@rbhost:~$ nvcc
Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit
Installing CUDA 8.0
Seems like it doesn't have one for 18.04. Available only for 14.04 and 16.04.
--override option (to discard compiler verification)
Let's try to install cuda using the 16.04 version, if it doesn't work we'll change ubuntu.
Ok following this: https://askubuntu.com/questions/959835/how-to-remove-cuda-9-0-and-install-cuda-8-0-instead
$ dpkg -l | grep cuda- | awk '{print $2}' | xargs -n1 sudo dpkg --purge
showed an error because maybe I had no packages named cuda-
I tried: dpkg -l | grep cuda- | awk '{print $2}'
No output.
Downloaded: cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb from https://developer.nvidia.com/cuda-80-ga2-download-archive
cd ~/Downloads
$ dpkg --install cuda-repo-ubuntu*-8.0-local*.deb
to
$ sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb
<you can apply the patch here, see later in the post>
$ sudo apt-get update
$ sudo apt-get install cuda-8-0
(use cuda-8-0 not cuda)
Error:
The following packages have unmet dependencies:
cuda-8-0 : Depends: cuda-runtime-8-0 (>= 8.0.61) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
(Probably because driver version is latest? Nvidia Driver)
So do:
sudo apt-get install aptitude
sudo aptitude install cuda-8-0
This will probably install nvidia-340 as well.
CUDA was installed but nvcc didn't work.
need to add cuda path to path
so:
/usr/local/cuda-8.0/bin/nvcc --version
So:
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
$ sudo nano ~/.bashrc
<remove those lines>
To save the changes, please press Ctrl + o, press enter to accept the changes and Ctrl + x to close nano. Then, to reload the .bashrc file with the changes made, please enter the following command to the Terminal:
$ source ~/.bashrc
After this:
rb@rbhost:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
Install cuda patch:
$ sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-cublas-performance-update_8.0.61-1_amd64.deb
$ sudo apt-get update
$ sudo apt-get install cuda-8-0
But you had already installed it before so we need to update.
$ sudo apt-get upgrade cuda-8-0
rb@rbhost:~/Downloads$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
Install cuDNN: We need v5
Download cuDNN v5.1 (Jan 20, 2017), for CUDA 8.0
Like the previous post, we can either download the *.deb files for
- cuDNN Runtime Library for Ubuntu16.04 (Deb)
- cuDNN Developer Library for Ubuntu16.04 (Deb)
- cuDNN Code Samples and User Guide for Ubuntu16.04 (Deb)
or download:
cuDNN v5.1 Library for Linux
Extract, go to the extracted folder and:
sudo cp cudnn.h /usr/local/cuda-8.0/include
sudo cp libcudnn* /usr/local/cuda-8.0/lib64
$ cd ~/Downloads
$ tar xvf cudnn-8.0-linux-x64-v5.1.tgz
$ cd cuda
$ sudo cp include/cudnn.h /usr/local/cuda-8.0/include
$ sudo cp lib64/libcudnn* /usr/local/cuda-8.0/lib64
Check cuDNN version:
$ cat /usr/local/cuda-8.0/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 5
#define CUDNN_MINOR 1
#define CUDNN_PATCHLEVEL 10
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
which means the version is 5.1.10.
nvidia-smi didn't work: Failed to initialize NVML: Driver/library version mismatch
$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 410.104 Tue Feb 5 22:58:30 CST 2019
GCC version: gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)
Rebooting didn't solve the issue - so I tried to uninstall nvidia drivers and install again:
Source: https://stackoverflow.com/questions/43022843/
$ sudo ./NVIDIA-Linux-x86_64-410.104.run --uninstall
$ sudo ./NVIDIA-Linux-x86_64-410.104.run --uninstall
There is no nvidia driver currently installed.
So I did:
$ sudo apt-get purge nvidia*
Open /etc/modprobe.d/blacklist.conf and add blacklist nouveau at the end. Now reboot.
sudo service lightdm stop
Failed to stop lightdm.service: Unit lightdm.service not loaded.
sudo ./NVIDIA-Linux-x86_64-410.104.run
register the kernel module sources with dkms - no
32 bit - no
X config - yes
sudo service lightdm start
Failed to start lightdm.service: Unit lightdm.service not found.
tensorflow showed error so:
updated bashrc
#export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
Now:
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory
Worked!
Compatible CUDA and cuDNN for tensorflow: https://stackoverflow.com/questions/50622525/
http://davidstutz.de/installing-cuda-and-caffe-on-ubuntu-14-04/
So I did:
$ sudo apt-get purge nvidia*
Open /etc/modprobe.d/blacklist.conf and add blacklist nouveau at the end. Now reboot.
sudo service lightdm stop
Failed to stop lightdm.service: Unit lightdm.service not loaded.
register the kernel module sources with dkms - no
32 bit - no
X config - yes
sudo service lightdm start
Failed to start lightdm.service: Unit lightdm.service not found.
tensorflow showed error so:
updated bashrc
#export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
Now:
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory
Probably need to reinstall tensorflow.
apt-cache search cudnn
pip uninstall tf-nightly-gpu
Well seems like tf 1.3 doesn't support cuda8.0 and cudnn5.1
and python 3.7 couldn't do : pip install tensorflow-gpu==1.2.0
so removed the tensorflow environment and did:
$conda remove -n tensorflow --all
$conda create --name tensorflow python=3.5
$pip install tensorflow-gpu==1.2.0Worked!
Compatible CUDA and cuDNN for tensorflow: https://stackoverflow.com/questions/50622525/
http://davidstutz.de/installing-cuda-and-caffe-on-ubuntu-14-04/
Some useful sources: