NVIDIA Jetson Nano 개발자 키트에 TensorFlow 설치

15809 단어 JetsonNano DeepLearning TensorFlow Jetson GPU

설치 절차

NVIDIA는 Jetson 용 TensorFlow pip wheel 패키지를 제공하므로 Jetson Nano에도 TensorFlow를 쉽게 설치할 수 있습니다. NVIDIA 사의 TensorFlow For Jetson Platform 페이지에 인스톨 방법이 해설되고 있으므로 기본적으로는 그 순서대로입니다만, 조금 주의점이 있습니다.

HDF5 설치

$ sudo apt-get install libhdf5-serial-dev hdf5-tools

pip 설치

$ sudo apt-get install python3-pip

NVIDIA 사의 페이지에는 pip 인스톨 후에 pip 에 의한 pip 의 업데이트가 나타나고 있습니다만, 일단 현시점 (2019 년 3 월 30 일)에서는 이것을 실시하지 않는 것이 무난하다고 생각합니다. 내 환경에서는 이것으로 pip를 시작할 수 없습니다.
$pip3 install -U pip
이하의 페이지를 참고로 해 복귀했습니다.
【Ubuntu】pip install –upgrade pip 명령을 실행하면 그 후 ImportError: cannot import name main 이라는 오류가 발생하는 경우의 대응 방법

기타 패키지 설치

$ sudo apt-get install zlib1g-dev zip libjpeg8-dev libhdf5-dev 
$ sudo pip3 install -U numpy grpcio absl-py py-cpuinfo psutil portpicker grpcio six mock requests gast h5py astor termcolor

TensorFlow 설치

$ sudo pip3 install --pre --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v42 tensorflow-gpu

설치 확인

$ python3
Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> print(tf.__version__)
1.13.1
>>> quit()
$

위에서 TensorFlow 설치가 완료되었습니다.

TensorFlow 자습서 코드 작동

Jetson 시리즈는 딥 뉴럴 네트워크를 이용한 추론에 최적의 플랫폼이라고 할 수 있지만, 딥 뉴럴 네트워크의 학습에는 조금 파워 부족입니다. 단, 소규모의 딥 신경망의 학습이라면 괜찮습니다. GPU를 탑재하지 않는 PC보다 빠르다고 생각합니다.
1 에 나타난 다음의 코드를 동작시켜 봅시다.

mnist.py

import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)

이 코드를 mnist.py라는 파일 이름으로 저장하고 다음과 같이 시작합니다.

$ python3 mnist.py

GPU가 인식되고 있음을 알 수 있습니다.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/core.py:143: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
2019-03-30 17:46:10.020449: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2019-03-30 17:46:10.021429: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x3f14f760 executing computations on platform Host. Devices:
2019-03-30 17:46:10.021499: I tensorflow/compiler/xla/service/service.cc:168]   StreamExecutor device (0): <undefined>, <undefined>
2019-03-30 17:46:10.167789: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:965] ARM64 does not support NUMA - returning NUMA node zero
2019-03-30 17:46:10.168088: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x3de91970 executing computations on platform CUDA. Devices:
2019-03-30 17:46:10.168145: I tensorflow/compiler/xla/service/service.cc:168]   StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2019-03-30 17:46:10.168519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
totalMemory: 3.86GiB freeMemory: 532.48MiB
2019-03-30 17:46:10.168575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-03-30 17:46:15.464153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-30 17:46:15.475841: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-03-30 17:46:15.475890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-03-30 17:46:15.476119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 75 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2019-03-30 17:46:16.539100: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10.0 locally

5 에포크의 학습으로 loss: 0.0640 - acc: 0.9819 라는 결과가 되었습니다.

TensorBoard 사용

TensorBoard를 이용한 모델의 시각화도 가능합니다. 약간 코드 변경이 필요합니다.
그러나 배우는 동안 TensorBorad에서 그 모습을 관찰하는 것은 Jetson Nano에게는 너무 무거웠습니다. 학습이 완료된 후 로그를 TensorBoard에서 살펴 보겠습니다.

mnist.py

import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

log_filepath = "./logs/"
tb_cb = tf.keras.callbacks.TensorBoard(log_dir=log_filepath, histogram_freq=1)

model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test), callbacks=[tb_cb])
model.evaluate(x_test, y_test)