生活紀錄: tensorflow

顯示具有 tensorflow 標籤的文章。顯示所有文章

2022年10月25日星期二

tensorflow predict memory leak

記憶體越吃越多，直到系統當機

$ top

可看到 VIRT RES 越來越大

$ jtop

看到 Mem 也隨時間越來越大

查詢目前程式占用的記憶體

import psutil

psutil.Process().memory_info().rss / (1024*1024*1024),

psutil.Process().memory_info().vms / (1024*1024*1024),

查詢目前程式碼使用記憶體狀況

from memory_profiler import profile

@profile(precision=4,stream=open('memory_profiler.log','w+'))

def function()

@profile # 直接在 stdout 輸出

def function()

但是看不出所以然

網路上常說因為 numpy 到 tensor 轉換的原因

state = tf.convert_to_tensor(state)

model.predict(state)

state = tf.convert_to_tensor(state)

model.fit(states)

但是沒有用

垃圾收集

import gc

gc.collect()

但是也沒有用

最後一招，有用

import tensorflow as tf

tf.keras.backend.clear_session()

2022年10月20日星期四

gym tensorflow 衝突

env.render()

出現錯誤

from pyglet.gl import *

File "/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/pyglet/gl/__init__.py", line 243, in <module>

import pyglet.window

File "/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/pyglet/window/__init__.py", line 1897, in <module>

gl._create_shadow_window()

File "/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/pyglet/gl/__init__.py", line 220, in _create_shadow_window

_shadow_window = Window(width=1, height=1, visible=False)

File "/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/pyglet/window/xlib/__init__.py", line 173, in __init__

super(XlibWindow, self).__init__(*args, **kwargs)

File "/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/pyglet/window/__init__.py", line 595, in __init__

config = screen.get_best_config(template_config)

File "/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/pyglet/canvas/base.py", line 192, in get_best_config

configs = self.get_matching_configs(template)

File "/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/pyglet/canvas/xlib.py", line 220, in get_matching_configs

configs = template.match(canvas)

File "/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/pyglet/gl/xlib.py", line 58, in match

have_13 = info.have_version(1, 3)

File "/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/pyglet/gl/glx_info.py", line 86, in have_version

client_version = self.get_client_version().split()[0]

IndexError: list index out of range

解決方案為 env.render() 後才能 import tensorflow

import gym

env = gym.make("CartPole-v0")

env.render()

import tensorflow as tf

tensorflow 在 Xavier 出現 cannot allocate memory in static TLS block 錯誤

其實會出現這個問題是 gym tensorflow 衝突原因

解決這個問題，就部會出現下列問題

Traceback (most recent call last):

File "/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow.py", line 62, in <module>

from tensorflow.python._pywrap_tensorflow_internal import *

ImportError: /home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/tensorflow/python/../../tensorflow_cpu_aws.libs/libgomp-d22c30c5.so.1.0.0: cannot allocate memory in static TLS block

$ vi .bashrc

export LD_PRELOAD=/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/tensorflow/python/../../tensorflow_cpu_aws.libs/libgomp-d22c30c5.so.1.0.0

2022年3月2日星期三

TensorFlow to TensorRT

參考 Developer Guide

參考 Sample Support Guide

參考 TensorRT Github samples

參考 TensorFlow to ONNX

$ pip install onnxruntime

$ pip install -U tf2onnx

python -m tf2onnx.convert \

--saved-model tensorflow-model-path \

--output output.onnx

2022年2月9日星期三

Yolo tiny v4 to tensorflow and tflite

參考 tensorflow-yolov4-tflite

只能使用 tensorflow==2.3.0rc0

不要使用別的版本，也不要用 GPU

視情況修改 core/config.py

__C.YOLO.CLASSES

__C.YOLO.ANCHORS_TINY

for tensorflow format load by tf.saved_model.load()

$ python save_model.py --weights /your_path_to/weights/yolov4-tiny-vehicle-r_final.weights \

--output ./checkpoints/yolov4-tiny-416 \

--input_size 416 --model yolov4 --tiny

$ python convert_tflite.py --weights ./checkpoints/yolov4-tiny-416-tflite \

--output ./checkpoints/yolov4-tiny-416.tflite

for tensorflow tflite load by tf.lite.Interpreter()

$ python save_model.py --weights /your_path_to/weights/yolov4-tiny-vehicle-r_final.weights \

--output ./checkpoints/yolov4-tiny-416-tflite \

--input_size 416 --model yolov4 --tiny --framework tflite

$ python convert_tflite.py --weights ./checkpoints/yolov4-tiny-416-tflite \

--output ./checkpoints/yolov4-tiny-416-fp16.tflite \

--quantize_mode float16

2021年1月21日星期四

Install Tensorflow 1.15 on Ubuntu 18.04

原本 pip3 install tensorflow-gpu=1.15 即可
但發現與 cuda 10.2 不合，需要安裝 cuda 10.0，搭配 cudnn v7.6.5

安裝 CUDA

https://developer.nvidia.com/cuda-downloads

選 Archive of Previous CUDA Releases

選 CUDA Toolkit 10.0, Linux, x86_64, Ubuntu, 18.04, deb(local)

按下 Download

sudo dpkg -i cuda-repo-ubuntu1804-10-0-local-10.0.130-410.48_1.0-1_amd64.deb

sudo apt-key add /var/cuda-repo-10-0-local-10.0.130-410.48/7fa2af80.pub

sudo apt-get update

查詢可安裝版本

apt-cache policy cuda

apt-cache madison cuda

安裝正確版本

sudo apt-get install cuda=10.0.130-1

安裝 CUDNN

https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html

https://developer.nvidia.com/cudnn

選擇 cuDNN v7.6.5 for CUDA 10.0

選擇 cuDNN Library for Linux (x86)

tar -xzvf cudnn-10.0-linux-x64-v7.6.5.32.tgz

sudo cp cuda/include/cudnn*.h /usr/local/cuda-10.0/include

sudo cp cuda/lib64/libcudnn* /usr/local/cuda-10.0/lib64

sudo chmod a+r /usr/local/cuda-10.0/include/cudnn*.h /usr/local-10.0/cuda/lib64/libcudnn*

sudo apt install python3-testresources

sudo apt install python-dev python-pip

mkdir envs; cd envs

python3 -m venv --system-site-packages tensorflow-1.15

source tensorflow-1.15/bin/activate

pip install --upgrade pip

pip install tensorflow-gpu==1.15

2020年11月27日星期五

jetson nano install tensorflow

$ sudo apt-get update

$ sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev liblapack-dev libblas-dev gfortran

$ sudo apt-get install python3-pip

$ sudo pip3 install -U pip testresources setuptools==49.6.0

$ sudo apt-get install virtualenv

$ make envs

$ cd envs

$ sudo pip3 install -U numpy==1.16.1 future==0.18.2 mock==3.0.5 h5py==2.10.0 keras_preprocessing==1.1.1 keras_applications==1.0.8 gast==0.2.2 futures protobuf pybind11

$ python3 -m virtualenv -p python3 tensorflow-2.3.1

$ source tensorflow-2.3.1/bin/activate

$ pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v44 tensorflow==2.3.1+nv20.11

$ python3 -m virtualenv -p python3 tensorflow-1.15.4

$ source tensorflow-1.15.4/bin/activate

$ pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v44 tensorflow==1.15.4+nv20.11

2020年9月25日星期五

accuracy, precision, recall 的理解

預設真假和事實的真假

TP(True Positive): 事實為真，預測為真

FN(False Negative): 事實為真，預測為假

FP(False Positive): 事實為假，預測為真

TN(True Negative): 事實為假，預測為假

Accuuuracy = (TP+TN) / (TP+TN+FP+FN)

正確率：在所有情況中，正確預測的比率

Precision = (TP) / (TP+FP)

精確率：預測為真的情況中，有多少是真

Recall = (TP) / (TP+FN)

召回率：為真的情況下，有多少預測為真

Precision 高, Recall 低：捉到的大部分是真的，但會漏掉真的

Precision 低, Recall 高：真的大部分會被捉到，但會有不少假的

2020年8月27日星期四

學習 How to configure your NVIDIA Jetson Nano for Computer Vision and Deep Learning

參考 How to configure your NVIDIA Jetson Nano for Computer Vision and Deep Learning

更新 system level packages

$ sudo apt-get update

$ sudo apt-get upgrade

安裝 system-level 相關套件

$ sudo apt-get install git

$ sudo apt-get install cmake

$ sudo apt-get install libatlas-base-dev

$ sudo apt-get install gfortran

$ sudo apt-get install libhdf5-serial-dev

$ sudo apt-get install hdf5-tools

$ sudo apt-get install python3-dev

$ sudo apt-get install locate

$ sudo apt-get install libfreetype6-dev

$ sudo apt-get install python3-setuptools

$ sudo apt-get install protobuf-compiler

$ sudo apt-get install libprotobuf-dev

$ sudo apt-get install openssl

$ sudo apt-get install libssl-dev

$ sudo apt-get install libcurl4-openssl-dev

$ sudo apt-get install cython3

$ sudo apt-get install libxml2-dev

$ sudo apt-get install libxslt1-dev

2020年7月8日星期三

https://github.com/openai/gym/blob/master/gym/utils/play.py
env = gym.make("Enduro-v0")
def cb(obs_t, obs_tp1, action, rew, done, info):
return [rew,]
plotter = PlayPlot(cb, hor0zon_timesteps=(30*5), plot_names=["reward"])
play(env, callback=plotter.callback, zoom=4)

https://github.com/openai/gym/blob/master/gym/core.py
https://github.com/openai/gym/tree/master/gym/wrappers
https://github.com/openai/gym/blob/master/gym/wrappers/atari_preprocessing.py
https://github.com/openai/gym/blob/master/gym/envs/__init__.py
register(
id='{}-v0'.format(name),
entry_point='gym.envs.atari:AtariEnv',
kwargs={'game': game, 'obs_type': obs_type, 'repeat_action_probability': 0.25},
max_episode_steps=10000,
nondeterministic=nondeterministic,
)
https://github.com/openai/gym/blob/master/gym/envs/atari/atari_env.py
pip install gym[atari]
self.ale = atari_py.ALEInterface()
reward += self.ale.act(action)

https://github.com/openai/atari-py/tree/master/atari_py
https://github.com/openai/atari-py/blob/master/atari_py/__init__.py
https://github.com/openai/atari-py/blob/master/atari_py/ale_python_interface.py
ale_lib = cdll.LoadLibrary(os.path.join(os.path.dirname(__file__),
'ale_interface/libale_c.so'))
def act(self, action):
return ale_lib.act(self.obj, int(action))
https://github.com/openai/atari-py/blob/master/atari_py/ale_interface/src/ale_interface.cpp
reward_t reward = environment->act(action, PLAYER_B_NOOP);
https://github.com/openai/atari-py/blob/master/atari_py/ale_interface/src/ale_interface.hpp
std::unique_ptr<StellaEnvironment> environment;
https://github.com/openai/atari-py/blob/master/atari_py/ale_interface/src/environment/stella_environment.cpp
reward_t StellaEnvironment::act(Action player_a_action, Action player_b_action)
sum_rewards += oneStepAct(m_player_a_action, m_player_b_action);
reward_t StellaEnvironment::oneStepAct(Action player_a_action, Action player_b_action)
return m_settings->getReward();
https://github.com/openai/atari-py/blob/master/atari_py/ale_interface/src/environment/stella_environment.hpp
RomSettings *m_settings;
https://github.com/openai/atari-py/blob/master/atari_py/ale_interface/src/games/RomSettings.hpp
virtual reward_t getReward() const = 0;
https://github.com/openai/atari-py/blob/master/atari_py/ale_interface/src/games/supported/Enduro.cpp

2019年10月18日星期五

tensorflow 和 cuda cudnn 版本

查看各個版本的對應

查看目前 cuda 版本
cat /usr/local/cuda/version.txt

查看目前 cudnn 版本
grep CUDNN_MAJOR -A 2 /usr/local/cuda/include/cudnn.h

查看工具版本

which nvcc

nvcc --version

查看驅動程式版本
cat /proc/driver/nvidia/version

nvidia-smi

2019年8月14日星期三

使用 TensorRT, 載入 frozen_model.pb 太慢

使用 TensorRT 就是要加快 inference 的速度
沒想到載入模型時，速度超慢

上網查到 extremely long model loading time problem
發現主要原因為 protobuf 使用 python
改用 cpp 才能改善速度

文章上使用 protobuf 3.6.1
但是我的 protobuf 使用 3.8.0
所以把相關參數改成 3.8.0
並把 protobuf 安裝在 python 的 virtualenv 上

sudo /usr/local/cuda-10.0/bin/nvprof --log-file=profile_freeze.log /mnt/XavierSSD/envs/OpenAiGym/bin/python inference.py

2019年6月25日星期二

Nvidia Jetson AGX Xavier Build tensorflow 1.13

參考 Building Tensorflow 1.13 on Jetson Xavier

安裝 bazel
nvidia@jetson-0423418048807:~/XavierSSD/Downloads$ wget https://github.com/bazelbuild/bazel/releases/download/0.19.2/bazel-0.19.2-dist.zip
nvidia@jetson-0423418048807:~/XavierSSD/Downloads$ mkdir bazel
nvidia@jetson-0423418048807:~/XavierSSD/Downloads$ cd bazel

nvidia@jetson-0423418048807:~/XavierSSD/Downloads/bazel$ unzip bazel-0.19.2-dist.zip

nvidia@jetson-0423418048807:~/XavierSSD/Downloads/bazel$ env EXTRA_BAZEL_ARGS="--host_javabase=@local_jdk//:jdk" bash ./compile.sh

nvidia@jetson-0423418048807:~/XavierSSD/Downloads/bazel$ cd ..

nvidia@jetson-0423418048807:~/XavierSSD/Downloads$ mv bazel ~/XavierSSD

nvidia@jetson-0423418048807:~/XavierSSD/Downloads$ cd ../bazel/

nvidia@jetson-0423418048807:~/XavierSSD/bazel$ vi ~/.bashrc

加入下一行到檔案底部，並且也執行一遍

export PATH=~/XavierSSD/bazel/output${PATH:+:${PATH}}

下載 tensorflow

nvidia@jetson-0423418048807:~/XavierSSD/bazel$ cd ..

nvidia@jetson-0423418048807:~/XavierSSD$ git clone https://github.com/tensorflow/tensorflow.git

nvidia@jetson-0423418048807:~/XavierSSD$ cd tensorflow/

設定 git 環境，取得 r1.13 版

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git config --global user.email "name@yahoo.com.tw"

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git config --global user.name "name"

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git checkout r1.13

為 Nvidia Jetson AGX Xavier 修改
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ vi tensorflow/lite/kernels/internal/BUILD

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git add tensorflow/lite/kernels/internal/BUILD

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git commit -m "Update 1"

[r1.13 982e077b2a] Update 1

1 file changed, 3 deletions(-)

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git log

commit 982e077b2a4e2123f7a299dbaf95d97383303d17 (HEAD -> r1.13)

Author: name <name@yahoo.com.tw>

Date: Mon Jun 24 14:49:30 2019 +0800

Update 1

commit 93dd14dce2e8751bcaab0a0eb363d55eb0cc5813 (origin/r1.13)

Author: Mihai Maruseac <mihaimaruseac@google.com>

Date: Tue May 21 10:08:18 2019 -0700

Update png_archive version to 1.6.37

PiperOrigin-RevId: 249272809

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git diff 93dd1 982e0

diff --git a/tensorflow/lite/kernels/internal/BUILD b/tensorflow/lite/kernels/internal/BUILD

index 4be3226938..7226f96fdf 100644

--- a/tensorflow/lite/kernels/internal/BUILD

+++ b/tensorflow/lite/kernels/internal/BUILD

@@ -22,15 +22,12 @@ HARD_FP_FLAGS_IF_APPLICABLE = select({

NEON_FLAGS_IF_APPLICABLE = select({

":arm": [

"-O3",

- "-mfpu=neon",

":armeabi-v7a": [

"-O3",

- "-mfpu=neon",

":armv7a": [

"-O3",

- "-mfpu=neon",

"//conditions:default": [

"-O3",

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ vi third_party/aws/BUILD.bazel

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git add third_party/aws/BUILD.bazel

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git commit -m "Update 2"

[r1.13 a3d6ea2fce] Update 2

1 file changed, 1 insertion(+), 1 deletion(-)

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git log
commit a3d6ea2fce8fff7bcf74ee52cd77074416d24bf2 (HEAD -> r1.13)
Author: mark <ingrenn@yahoo.com.tw>
Date: Mon Jun 24 14:56:09 2019 +0800

Update 2

commit 982e077b2a4e2123f7a299dbaf95d97383303d17
Author: mark <ingrenn@yahoo.com.tw>
Date: Mon Jun 24 14:49:30 2019 +0800

Update 1

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git diff 982e0 a3d6ea
diff --git a/third_party/aws/BUILD.bazel b/third_party/aws/BUILD.bazel
index 5426f79e46..e08f8fc108 100644
--- a/third_party/aws/BUILD.bazel
+++ b/third_party/aws/BUILD.bazel
@@ -24,7 +24,7 @@ cc_library(
"@org_tensorflow//tensorflow:raspberry_pi_armeabi": glob([
"aws-cpp-sdk-core/source/platform/linux-shared/*.cpp",
]),
- "//conditions:default": [],
+ "//conditions:default": glob(["aws-cpp-sdk-core/source/platform/linux-shared/*.cpp",]),
}) + glob([
"aws-cpp-sdk-core/include/**/*.h",
"aws-cpp-sdk-core/source/*.cpp",
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ vi third_party/gpus/crosstool/BUILD.tpl
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git add third_party/gpus/cro
sstool/BUILD.tpl
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git commit -m "Update 3"
[r1.13 65ad3b64e5] Update 3
1 file changed, 1 insertion(+)
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git log
commit 65ad3b64e5f16b3496628bee800fabf825a7c1ce (HEAD -> r1.13)
Author: mark <ingrenn@yahoo.com.tw>
Date: Mon Jun 24 15:04:22 2019 +0800

Update 3

commit a3d6ea2fce8fff7bcf74ee52cd77074416d24bf2
Author: mark <ingrenn@yahoo.com.tw>
Date: Mon Jun 24 14:56:09 2019 +0800

Update 2

commit 982e077b2a4e2123f7a299dbaf95d97383303d17
Author: mark <ingrenn@yahoo.com.tw>
Date: Mon Jun 24 14:49:30 2019 +0800

Update 1

commit 93dd14dce2e8751bcaab0a0eb363d55eb0cc5813 (origin/r1.13)
Author: Mihai Maruseac <mihaimaruseac@google.com>
Date: Tue May 21 10:08:18 2019 -0700

Update png_archive version to 1.6.37
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git diff a3d6ea 65ad3
diff --git a/third_party/gpus/crosstool/BUILD.tpl b/third_party/gpus/crosstool/BUILD.tpl
index db76306ffb..184cd35b87 100644
--- a/third_party/gpus/crosstool/BUILD.tpl
+++ b/third_party/gpus/crosstool/BUILD.tpl
@@ -24,6 +24,7 @@ cc_toolchain_suite(
"x64_windows|msvc-cl": ":cc-compiler-windows",
"x64_windows": ":cc-compiler-windows",
"arm": ":cc-compiler-local",
+ "aarch64": ":cc-compiler-local",
"k8": ":cc-compiler-local",
"piii": ":cc-compiler-local",
"ppc": ":cc-compiler-local",
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$
修改完成

安裝所需版本的 g++ 和 gcc
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ sudo apt-get install g++-5
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ sudo apt-get install gcc-5

設定編譯環境
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ ./configure
Extracting Bazel installation...
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.19.2- (@non-git) installed.
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3

Found possible Python library paths:
/usr/lib/python3.6/dist-packages
/usr/lib/python3/dist-packages
/usr/local/lib/python3.6/dist-packages
Please input the desired Python library path to use. Default is [/usr/lib/python3.6/dist-packages]

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]:

Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.0

Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.3

Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: /usr/lib/aarch64-linux-gnu

Do you wish to build TensorFlow with TensorRT support? [y/N]: y
TensorRT support will be enabled for TensorFlow.

Please specify the location where TensorRT is installed. [Default is /usr/lib/aarch64-linux-gnu]:

Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]:

Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,7.0]: 7.2

Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: /usr/bin/gcc-5

Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]:

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
--config=gdr # Build with GDR support.
--config=verbs # Build with libverbs support.
--config=ngraph # Build with Intel nGraph support.
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
Preconfigured Bazel build configs to DISABLE default on features:
--config=noaws # Disable AWS S3 filesystem support.
--config=nogcp # Disable GCP support.
--config=nohdfs # Disable HDFS support.
--config=noignite # Disable Apacha Ignite support.
--config=nokafka # Disable Apache Kafka support.
--config=nonccl # Disable NVIDIA NCCL support.
Configuration finished
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$

建立 tensorflow 安裝資料
此處會執行很久，甚至會報錯，如找不到 numpy 等
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ bazel build --config=opt --config=nonccl //tensorflow/tools/pip_package:build_pip_package --incompatible_remove_native_http_archive=false --verbose_failures --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"
產生 tensorflow-1.13.1-cp36-cp36m-linux_aarch64.whl
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ sudo bazel-bin/tensorflow/tools/pip_package/build_pip_package ../

移除舊的 tensorflow，並安裝新的
nvidia@jetson-0423418048807:~$ source XavierSSD/envs/tensorflow/bin/activate
(tensorflow) nvidia@jetson-0423418048807:~$ cd XavierSSD/tensorflow/
((tensorflow) nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ pip3 uninstall tensorflow-gpu
(tensorflow) nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ pip3 install ../tensorflow-1.13.1-cp36-cp36m-linux_aarch64.whl
(tensorflow) nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ deactivate
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$

建立 tensorflow c++ 的 shared library libtensorflow_cc.so
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ bazel build --config=opt --config=nonccl //tensorflow:libtensorflow_cc.so --incompatible_remove_native_http_archive=false --verbose_failures --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ ls -al bazel-bin/tensorflow/
libtensorflow_cc.so
-r-xr-xr-x 1 nvidia nvidia 303026864 Jun 25 12:22 bazel-bin/tensorflow/libtensorflow_cc.so
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ mkdir tensorflow/cc/example

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ vi tensorflow/cc/example/example.cc


// tensorflow/cc/example/example.cc

#include "tensorflow/cc/client/client_session.h"
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/tensor.h"

int main() {
  using namespace tensorflow;
  using namespace tensorflow::ops;
  Scope root = Scope::NewRootScope();
  // Matrix A = [3 2; -1 0]
  auto A = Const(root, { {3.f, 2.f}, {-1.f, 0.f} });
  // Vector b = [3 5]
  auto b = Const(root, { {3.f, 5.f} });
  // v = Ab^T
  auto v = MatMul(root.WithOpName("v"), A, b, MatMul::TransposeB(true));
  std::vector<Tensor> outputs;
  ClientSession session(root);
  // Run and fetch v
  TF_CHECK_OK(session.Run({v}, &outputs));
  // Expect outputs[0] == [19; -3]
  LOG(INFO) << outputs[0].matrix<float>();
  return 0;
}

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ vi tensorflow/cc/example/BUILD


load("//tensorflow:tensorflow.bzl", "tf_cc_binary")

tf_cc_binary(
    name = "example",
    srcs = ["example.cc"],
    deps = [
        "//tensorflow/cc:cc_ops",
        "//tensorflow/cc:client_session",
        "//tensorflow/core:tensorflow",
    ],
)

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$

編譯範例 example.cc

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ bazel build -c opt //tensorflow/cc/example:example

跑了好久，測試執行
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ bazel-bin/tensorflow/cc/exam
ple/example
2019-06-25 16:10:03.922559: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:965] ARM64 does not support NUMA - returning NUMA node zero
2019-06-25 16:10:03.922959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.5
pciBusID: 0000:00:00.0
totalMemory: 15.45GiB freeMemory: 8.57GiB
2019-06-25 16:10:03.923054: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-25 16:10:03.924346: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-25 16:10:03.924409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-06-25 16:10:03.924452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-06-25 16:10:03.925152: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8340 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2019-06-25 16:10:07.582973: I tensorflow/cc/example/example.cc:22] 19
-3
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$

nvidia@jetson-0423418048807:~/XavierSSD$ git clone https://github.com/bitbionic/

keras-to-tensorflow.git
Cloning into 'keras-to-tensorflow'...
remote: Enumerating objects: 3719, done.
remote: Total 3719 (delta 0), reused 0 (delta 0), pack-reused 3719
Receiving objects: 100% (3719/3719), 227.81 MiB | 786.00 KiB/s, done.
Resolving deltas: 100% (7/7), done.
Checking out files: 100% (3688/3688), done.
nvidia@jetson-0423418048807:~/XavierSSD$
nvidia@jetson-0423418048807:~/XavierSSD$ cd keras-to-tensorflow
nvidia@jetson-0423418048807:~/XavierSSD/keras-to-tensorflow$ vi main.c
因為編譯時會有兩種錯誤
data.ToString(); 改成 std::string(data);
tensorflow::StringPiece(file_name).ends_with(".png")
改成
tensorflow::str_util::EndsWith(file_name, ".png")

nvidia@jetson-0423418048807:~/XavierSSD/keras-to-tensorflow$
nvidia@jetson-0423418048807:~/XavierSSD/keras-to-tensorflow$ g++-5 -std=gnu++11 -c ./main.cpp -D_GLIBCXX_USE_CXX11_ABI=0 -I../tensorflow -I../tensorflow/bazel-tensorflow/external/eigen_archive -I../tensorflow/bazel-tensorflow/external/protobuf_archive/src -I../tensorflow/bazel-tensorflow/external/com_google_absl -I../tensorflow/bazel-genfiles

2018年12月7日星期五

EAST Tesseract 效能測試

EAST: An Efficient and Accurate Scene Text Detector

參考 OpenCV OCR and text recognition with Tesseract

發現 opencv 使用硬體加速是
net = cv2.dnn.readNet(args["east"])
net.setPreferableTarget(cv2.dnn.DNN_TARGET_OPENCL);

而 OPENCL 並不是 NVDIA 的 CUDA 是 Intel(GPU)

參考 A tensorflow implementation of EAST text detector 的 eval.py

使用了 tensorflow 的 gpu(CUDA)

效能的卻比較好

640x480 從 400ms 到 340ms

但 Tsseract 沒有加速，只能加速 EAST

build tensorflow 1.10

參考 build tensorflow 1.11 from source in visual studio

(base) D:\TensorFlowB>conda env list
(base) D:\TensorFlowB>conda env remove -n tensorflow-1.10
(base) D:\TensorFlowB>conda create -n tensorflow-1.10 pip python=3.6
(base) D:\TensorFlowB>activate tensorflow-1.10
(tensorflow-1.10) D:\TensorFlowB>pip install six numpy wheel protobuf absl-py
(tensorflow-1.10) D:\TensorFlowB>pip install keras_applications==1.0.5 --no-deps
(tensorflow-1.10) D:\TensorFlowB>pip install keras_preprocessing==1.0.3 --no-deps

D:\TensorFlowB>git clone https://github.com/tensorflow/tensorflow.git tensorflow-1.10

D:\TensorFlowB>cd tensorflow-1.10

D:\TensorFlowB\tensorflow-1.10>git checkout r1.10

D:\TensorFlowB\tensorflow-1.10>git pull origin master

D:\TensorFlowB\tensorflow-1.10>bazel clean

D:\TensorFlowB\tensorflow-1.10>python ./configure.py

修改 D:/TensorFlowB/tensorflow-1.10/tensorflow/contrib/cmake/CMakeLists.txt


if (tensorflow_OPTIMIZE_FOR_NATIVE_ARCH)
  include(CheckCXXCompilerFlag)
  CHECK_CXX_COMPILER_FLAG("-march=native" COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
  if (COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native")
  else()
    CHECK_CXX_COMPILER_FLAG("/arch:AVX2" COMPILER_OPT_ARCH_AVX_SUPPORTED)
    if(COMPILER_OPT_ARCH_AVX_SUPPORTED)
      set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX2")
      add_definitions(-D__AVX2__)
    endif()
  endif()
endif()

D:/TensorFlowB/tensorflow-1.10/tensorflow/contrib/cmake
選 Visual Studio 14 2015 Win64
Optional toolset to use, 輸入 "host=x64"
tensorflow_BUILD_SHARED_LIB v
tensorflow_ENABLE_GPU v

問題與解答
D:\TensorFlowB\tensorflow-1.10\tensorflow\stream_executor\dnn.pb.h
This file was generated by a newer version of protoc
which is incompatible with your Protocol Buffer headers.
Please update your headers.

D:\TensorFlowB\tensorflow-1.10\tensorflow\contrib\cmake\external\protobuf.cmake

set(PROTOBUF_TAG v3.6.0)

set(PROTOBUF_TAG v3.6.1)

D:\TensorFlowB\tensorflow-1.10\tensorflow\workspace.bzl


  tf_http_archive(
      name = "protobuf_archive",
      urls = [
          "https://mirror.bazel.build/github.com/google/protobuf/archive/v3.6.1.tar.gz",
          "https://github.com/google/protobuf/archive/v3.6.1.tar.gz",
      ],
      sha256 = "3d4e589d81b2006ca603c1ab712c9715a76227293032d05b26fca603f90b3f5b",
      strip_prefix = "protobuf-3.6.1",
  )
  tf_http_archive(
      name = "eigen_archive",
      urls = [
          "https://mirror.bazel.build/bitbucket.org/eigen/eigen/get/fd6845384b86.tar.gz",
          "https://bitbucket.org/eigen/eigen/get/fd6845384b86.tar.gz",
      ],
      sha256 = "d956415d784fa4e42b6a2a45c32556d6aec9d0a3d8ef48baee2522ab762556a9",
      strip_prefix = "eigen-eigen-fd6845384b86",
      build_file = clean_dep("//third_party:eigen.BUILD"),
      patch_file = clean_dep("//third_party:eigen_half.patch"),
  )

download eigen_half.patch to D:\TensorFlowB\tensorflow-1.10\third_party\eigen_half.patch

build project tf_python_build_pip_package 產生
EXEC : error : [WinError 5] 存取被拒。:
'build\\bdist.win-amd64\\wheel\\tensorflow_gpu-1.10.1.data\\purelib\\tensorflow\\include\\tensorflow\\stream_executor\\dnn.pb.h'
改變 dnn.pb.h 檔案屬性
使用 Administrator 開啟 vs2015 x64 Native Tools Command Prompt
D:\TensorFlowB\build-1.10>D:\Anaconda3\envs\tensorflow-1.10\python.exe D:/TensorFlowB/build-1.10/tf_python/setup.py bdist_wheel --project_name tensorflow_gpu
另外開啟 Anaconda
(base) D:\TensorFlowB\build-1.10\tf_python>activate tensorflow-1.10
(tensorflow-1.10) D:\TensorFlowB\build-1.10\tf_python>pip install dist\tensorflow_gpu-1.10.1-cp36-cp36m-win_amd64.whl

bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
失敗，沒有到解決方法
c:\users\mark\appdata\local\temp\nvcc_inter_files_tmp_dir\depthwise_conv_op_gpu.cu.compute_70.cudafe1.stub.c(3):
fatal error C1083: Cannot open include file: 'depthwise_conv_op_gpu.cu.fatbin.c': No such file or directory

2018年11月28日星期三

build tensorflow 1.11 from source in visual studio

先說重點
目前 windows 下 GPU 的版本來到 tensorflow_gpu-1.12.0 使用 Bazel
但是發現目前的 Bazel 產生的 library 不能在 Visual Studio 中使用
退到 tensorflow_gpu-1.11 使用 Cmake 建立 library

另外只能建立 Release 版本，並使用 RelWithDebInfo 版本，取代 Debug 版本
但只有 Release 能成功

library 建立起來後，程式可以編譯，可以執行，但結果是錯的

開啟 Anaconda Prompt
(base) D:\>conda create -n tensorflow-1.11 pip python=3.6
(base) D:\>activate tensorflow-1.11
(tensorflow-1.11) D:\>pip install six numpy wheel
(tensorflow-1.11) D:\>pip install keras_applications==1.0.5 --no-deps
(tensorflow-1.11) D:\>pip install keras_preprocessing==1.0.3 --no-deps

http://www.msys2.org/
下載 msys2-x86_64-20180531.exe

開啟 msys2/MinGW 64-bit

$ pacman -Syu

$ pacman -Su

$ pacman -S git patch unzip

安裝 Bazel
https://github.com/bazelbuild/bazel/releases
下載 bazel-0.18.1-windows-x86_64.exe
rename bazel-0.18.1-windows-x86_64.exe bazel.exe
move bazel.exe D:\msys64\usr\bin
add PATH D:\msys64\usr\bin

安裝 JDK 8
下載 jdk-8u191-windows-x64.exe
add JAVA_HOME C:\Program Files\Java\jdk1.8.0_191

copy cudnn-9.0-windows10-x64-v7\cuda\* to

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0

下載 swigwin-3.0.12.zip

解壓縮於 D:\TensorFlowB\swigwin-3.0.12

開啟 VS3215 x64 Native Tools Command Prompt

D:\TensorFlowB>git clone https://github.com/tensorflow/tensorflow.git tensorflow-1.11

D:\TensorFlowB>cd tensorflow-1.11

D:\TensorFlowB\tensorflow-1.11>git checkout r1.11

D:\TensorFlowB\tensorflow-1.11>python ./configure.py

Please specify the location of python. [Default is D:\Anaconda3\python.exe]:

Please input the desired Python library path to use. Default is [D:\Anaconda3\lib\site-packages]

Do you wish to build TensorFlow with nGraph support? [y/N]:

Do you wish to build TensorFlow with CUDA support? [y/N]: y

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]:

Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v9.0]:

Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]:

Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v9.0]:

Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,7.0]:

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is /arch:AVX]: /arch:AVX2

Would you like to override eigen strong inline for some C++ compilation to reduce the compilation time? [Y/n]:

修改 D:/TensorFlowB/tensorflow-1.11/tensorflow/contrib/cmake/CMakeLists.txt 增加 AVX2 功能


if (tensorflow_OPTIMIZE_FOR_NATIVE_ARCH)
  include(CheckCXXCompilerFlag)
  CHECK_CXX_COMPILER_FLAG("-march=native" COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
  if (COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native")
  else()
    CHECK_CXX_COMPILER_FLAG("/arch:AVX2" COMPILER_OPT_ARCH_AVX_SUPPORTED)
    if(COMPILER_OPT_ARCH_AVX_SUPPORTED)
      set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX2")
      add_definitions(-D__AVX2__)
    endif()
  endif()
endif()

參考 Add abseil_cpp cmake dependence. 修改
D:/TensorFlowB/tensorflow-1.11/tensorflow/contrib/cmake/CMakeLists.txt
增加 tensorflow/contrib/cmake/external/abseil_cpp.cmake

增加 tensorflow/contrib/cmake/modules/FindAbseilCpp.cmake
以免出現找不到 absl/strings/string_view.h 錯誤

add_definitions(-DGOOGLE_CUDA=1 -DTF_EXTRA_CUDA_CAPABILITIES=3.5,3.7,5.2,6.0,6.1,7.0)

修改 D:\TensorFlowB\tensorflow-1.11\tensorflow\contrib\cmake\external\eigen.cmake

option(eigen_PATCH_FILE "Patch file to apply to eigen" OFF)
set(eigen_PATCH_FILE "D:/TensorFlowB/eigen_half.patch")
修改 D:\TensorFlowB\tensorflow-1.11\bazel-tensorflow\tensorflow\workspace.bzl


  tf_http_archive(
      name = "eigen_archive",
      build_file = clean_dep("//third_party:eigen.BUILD"),
      patch_file = clean_dep("//third_party:eigen_half.patch"),
  )

下載 https://github.com/amsokol/tensorflow-windows-build-tutorial/blob/master/eigen_half.patch
置於 D:/TensorFlowB/eigen_half.patch
之後會修改
D:\TensorFlowB\build-1.11\eigen\src\eigen\Eigen\src\Core\arch\CUDA\Half.h
D:\TensorFlowB\build-1.11\external\eigen_archive\Eigen\src\Core\arch\CUDA\Half.h
避免下列錯誤


Error  more than one instance of overloaded function "__hadd" matches the argument list: tf_core_gpu_kernels d:\tensorflowb\build-1.11\external\eigen_archive\eigen\src\Core\arch\CUDA\Half.h 212

CMake
source: D:/TensorFlowB/tensorflow-1.11/tensorflow/contrib/cmake

build: D:/TensorFlowB/build-1.11

Configure

Visual Studio 14 2015 Win64

Optional toolset: host=x64

SWIG_EXECUTABLE=D:/TensorFlowB/swigwin-3.0.12/swig.exe

tensorflow_BUILD_SHARED_LIB=v

tensorflow_ENABLE_GPU=v
eigen_PATCH_FILE=v

cmake 使用 message() debug

以 Administrator 開啟 Visual Studio 2015

開啟 D:\TensorFlowB\build-1.11\tensorflow.sln

換成 Release 版本

開啟下列專案的屬性設定

_beam_search_ops, _gru_ops, _lstm_ops, _nearest_neighbor_ops, _periodic_resample_op

Property Pages/Configuration Properties/Linker/Input/Additional Dependencies

\pywrap_tensorflow_internal.lib 改為 Release\pywrap_tensorflow_internal.lib

出現 cuda_kernel_helper.h 找不到 cuda_fp16.h


Severity Code Description Project File Line Suppression State
Error C1083 Cannot open include file: 'cuda/include/cuda_fp16.h': No such file or directory _beam_search_ops D:\TensorFlowB\tensorflow-1.11\tensorflow\core\util\cuda_kernel_helper.h 24

開啟 D:\TensorFlowB\tensorflow-1.11\tensorflow\core\util\cuda_kernel_helper.h

//#include "cuda/include/cuda_fp16.h"
#include "cuda_fp16.h"

出現錯誤


Severity Code Description Project File Line Suppression State
Error LNK2019 unresolved external symbol "class absl::uint128 __cdecl absl::operator%(class absl::uint128,class absl::uint128)" (??Labsl@@YA?AVuint128@0@V10@0@Z) referenced in function "private: void __cdecl absl::str_format_internal::`anonymous namespace'::ConvertedIntInfo::UnsignedToStringRight(class absl::uint128,struct absl::str_format_internal::ConversionChar)" (??$UnsignedToStringRight@Vuint128@absl@@@ConvertedIntInfo@?A0x0d227ec7@str_format_internal@absl@@AEAAXVuint128@3@UConversionChar@23@@Z) tf_tutorials_example_trainer D:\TensorFlowB\build-1.11\arg.obj 1

Linker/Input/Additional Dependencies 加入
abseil_cpp\src\abseil_cpp_build\absl\numeric\Release\absl_int128.lib

最後是 estimator_python_api 和 tf_python_api 失敗

需要開啟 VS2015 的 Tools/Options/Projects and Solutions/Build and Run

MSBuild project build output verbosity: Normal 才能看到訊息

修改 D:\TensorFlowB\build-1.11\tf_python_api.vcxproj


from "C:\Program Files\CMake\bin\cmake.exe" -E env PYTHONPATH=D:/TensorFlowB/build-1.11/tf_python "" D:/Anaconda3/python.exe D:/TensorFlowB/build-1.11/tf_python/tensorflow/python/tools/api/generator/create_python_api.py --root_init_template=D:/TensorFlowB/build-1.11/tf_python/tensorflow/api_template.__init__.py --apidir=D:/TensorFlowB/build-1.11/tf_python/tensorflow --package=tensorflow.python --apiname=tensorflow D:/TensorFlowB/tensorflow-1.11/api_init_files_list.txt
to "C:\Program Files\CMake\bin\cmake.exe" -E env PYTHONPATH=D:/TensorFlowB/build-1.11/tf_python D:/Anaconda3/python.exe D:/TensorFlowB/build-1.11/tf_python/tensorflow/python/tools/api/generator/create_python_api.py --root_init_template=D:/TensorFlowB/build-1.11/tf_python/tensorflow/api_template.__init__.py --apidir=D:/TensorFlowB/build-1.11/tf_python/tensorflow --package=tensorflow.python --apiname=tensorflow D:/TensorFlowB/tensorflow-1.11/api_init_files_list.txt

copy D:\TensorFlowB\tensorflow-1.11\tensorflow\tools\docs
to D:\TensorFlowB\build-1.11\tf_python\tensorflow\tools\docs
copy D:\TensorFlowB\tensorflow-1.11\tensorflow\python\distribute
to D:\TensorFlowB\build-1.11\tf_python\tensorflow\python\distribute

build tf_python_build_pip_package
產生 D:\TensorFlowB\build-1.11\tf_python\dist\tensorflow_gpu-1.11.0-cp36-cp36m-win_amd64.whl

(tensorflow-1.11) D:\TensorFlowB\build-1.11>pip install tf_python\dist\tensorflow_gpu-1.11.0-cp36-cp36m-win_amd64.whl

使用 bazel
D:\TensorFlowB\tensorflow-1.11>bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
執行很久後可以發現
D:\TensorFlowB\tensorflow-1.11\bazel-out\x64_windows-opt\bin\tensorflow\tools\pip_package\simple_console_for_windows.zip
產生失敗，size=0

D:\TensorFlowB\tensorflow-1.11>cd bazel-out/x64_windows-opt/bin/tensorflow/tools/pip_package
edit simple_console_for_windows.zip-0.params
刪除有 .zip 的每一行

執行
D:\TensorFlowB\tensorflow-1.11\bazel-tensorflow>external\bazel_tools\tools\zip\zipper\zipper.exe vcC bazel-out/x64_windows-opt/bin/tensorflow/tools/pip_package/simple_console_for_windows.zip @bazel-out/x64_windows-opt/bin/tensorflow/tools/pip_package/simple_console_for_windows.zip-0.params
D:\TensorFlowB\tensorflow-1.11\bazel-tensorflow>cd ..
D:\TensorFlowB\tensorflow-1.11>bazel-bin\tensorflow\tools\pip_package\build_pip_package ..\tensorflow_pkg
安裝

(tensorflow-1.11) D:\TensorFlowB>pip install tensorflow_pkg\tensorflow-1.11.0-cp36-cp36m-win_amd64.whl

2018年11月6日星期二

EAST text detector

git clone https://github.com/argman/EAST EAST
下載 east_icdar2015_resnet_v1_50_rbox.zip 從
https://drive.google.com/open?id=0B3APw5BZJ67ETHNPaU9xUkVoV0U

Open
VS2015 x64 Native Tools Command Prompt
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC>
D:\OpenCV_4\OpenCV OCR\EAST\lanms>activate tensorflow
(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST\lanms>python --version
Python 3.5.5 :: Anaconda, Inc.D:\OpenCV_4\OpenCV OCR\EAST\lanms>cl adaptor.cpp .\include\clipper\clipper.cpp /I .\include /I "D:\Anaconda3\include" /LD /Fe:adaptor.pyd /link/LIBPATH:"D:\Anaconda3\libs"

Edit lanms/__init__.py 註解掉下兩行

#if subprocess.call(['make', '-C', BASE_DIR]) != 0: # return value

# raise RuntimeError('Cannot compile lanms: {}'.format(BASE_DIR))

Edit run_demo_server.py

change

parser.add_argument('--checkpoint-path', default=checkpoint_path)

parser.add_argument('--checkpoint_path', default=checkpoint_path)

並註解掉

#ret.update(get_host_info())

期間因為 tensorflow 使用 python3.5

但是程式使用的 python36.lib, 所以移除掉 tensorflow, 重新安裝

查詢已安裝的模組，等待環境完成，重新安裝(沒有使用)

pip freeze>requirements.txt

pip install -r requirements.txt

查詢已安裝的模組

(tensorflow) D:\>conda list

(base) D:\>conda env remove -n tensorflow

(base) D:\>conda create -n tensorflow pip python=3.6

(base) D:\>activate tensorflow

(tensorflow) D:\>pip install --ignore-installed --upgrade tensorflow-gpu

(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>pip install opencv-python

(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>pip install opencv-contrib-python

(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>pip install --ignore-installed --upgrade tensorflow-gpu

(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>pip install scipy

(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>pip install matplotlib

(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>pip install Flask

(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>conda install shapely

(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>python run_demo_server.py --checkpoint_path="..\east_icdar2015_resnet_v1_50_rbox"

2018年7月9日星期一

CUDA 安裝失敗

CUDA 安裝失敗，通常是由於 Visual Studio Integration 失敗
所以透過自訂安裝，跳過不安裝 Visual Studio Integration, 可以安裝成功
Installer Type 要選擇 exe(local)

而 Visual Studio Integration 的安裝方式如下：
1. 使得可以編譯 CUDA 程式
注意安裝 CUDA 時的路徑，拷貝出 CUDAVisualStudioIntegration 目錄夾
將 D:\CUDAVisualStudioIntegration\extras\visual_studio_integration\MSBuildExtensions
目錄下所有檔案拷貝至
C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\BuildCustomizations
2. 使得 Visual Studio 可以新建 CUDA 專案
將目錄
D:\CUDAVisualStudioIntegration\extras\visual_studio_integration\CudaProjectVsWizards
拷貝至
C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE\Extensions
3. 安裝
D:\CUDAVisualStudioIntegration\NVIDIA_Nsight_Visual_Studio_Edition_Win64_5.4.0.17229.msi

2018年7月3日星期二

tensorflow audio recognition 之 SpeechActivity.java

分為 record thread 和 recognize thread, 兩個 thread 依靠 recordingBuffer 交換資料
兩者速度不會一致，所以 recognize thread 可能重複 recognize, 也可能漏

short[] recordingBuffer = new short[RECORDING_LENGTH];
int recordingOffset = 0;

private void record() {
int numberRead = record.read(audioBuffer, 0, audioBuffer.length);
int maxLength = recordingBuffer.length;
int newRecordingOffset = recordingOffset + numberRead;
//int secondCopyLength = Math.max(0, newRecordingOffset - maxLength);
if (newRecordingOffset > maxLength) {
secondCopyLength = newRecordingOffset - maxLength;
} else {
secondCopyLength = 0;
}
int firstCopyLength = numberRead - secondCopyLength;
System.arraycopy(audioBuffer, 0, recordingBuffer, recordingOffset, firstCopyLength);
System.arraycopy(audioBuffer, firstCopyLength, recordingBuffer, 0, secondCopyLength);
recordingOffset = newRecordingOffset % maxLength;
}

private void recognize() {
int maxLength = recordingBuffer.length;
int firstCopyLength = maxLength - recordingOffset;
int secondCopyLength = recordingOffset;
System.arraycopy(recordingBuffer, recordingOffset, inputBuffer, 0, firstCopyLength);
System.arraycopy(recordingBuffer, 0, inputBuffer, firstCopyLength, secondCopyLength);
}

Yolo

目錄 data/img

檔案 data/obj.data
classes= 2
train = data/train.txt
valid = data/train.txt
names = data/obj.names (相對於執行檔目錄)
backup = backup/

檔案 data/obj.names
air
bird

檔案 data/train.txt
data/img/air1.jpg
data/img/air2.jpg
data/img/air3.jpg

檔案 yolo-obj.cfg
(測試用)
batch=1
subdivisions=1
(訓練用)
batch=64
subdivisions=1, (視記憶體大小修改，記憶體小則使用64)
修改所有 [yolo] 層內的
classes =
修改所有 [yolo] 前一個 [convolutional] 層內的
filters = (classes + 5) * 3

標記
yolo_mark.exe data/img data/train.txt data/obj.names

訓練
darknet.exe detector train data/obj.data yolo-obj.cfg darknet19_448.conv.23
obj.data 內的 backup 指定輸出 weights 存放位置
darknet19_448.conv.23: 其實就是 weights, 要接續中斷的訓練時，則改為新產生的 weights
-dont_show: 不顯示 Loss-Window

檢測訓練結果(IoU, mAP)
darknet.exe detector map data/obj.data yolo-obj.cfg backup\yolo-objj_7000.weights

COCO Yolo v3(4GB GPU): yolov3.cfg, yolov3.weights
COCO Yolo v3 tiny(1GB GPU): yolov3-tiny.cfg, yo.ov3-tiny.weights
COCO Yolo v2(4GB GPU): yolov2.cfg, yolov2.weights
VOC Yolo v2(4GB GPU): yolo-voc.cfg, yolo-voc.weights
COCO Yolo v2 tiny(1GB GPU): yolov2-tiny.cfg, yolov2-tiny.weights
VOC Yolo v2 tiny(1GB GPU): yolov2-tiny-voc.cfg, yolov2-tiny-voc.weights
以上似乎是訓練時的需求，檢測或分類時似乎沒那麼大的需求

darknet.exe 參數
-i <index>, 指定 GPU, 可用 nvidia-smi.exe 查詢
-nogpu, 不使用 GPU
-thresh <val>, 預設為 0.25
-c <num>, OpenCV 影像, 預設為 0
-ext_output, 輸出物件位置
detector test, 相片
detector demo, 影片
detector train, 訓練
detector map, 檢測訓練結果
classifier predict, 分類

./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg
./darknet detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights data/dog.jpg
以上兩個命令一樣

使用命令取得 yolov3-tiny.conv.15
darknet.exe partial cfg/yolov3-tiny.cfg yolov3-tiny.weights yolov3-tiny.conv.15 15

如何增進物件檢測
訓練前：
.cfg 檔內的 random=1
增加 .cfg 檔內的 width, height (須為 32 的倍數)
執行下列命令，重新計算 anchors, 更改 .cfg 檔內的 anchors
darknet.exe detector calc_anchors voc.data -num_of_clusters 9 -width 416 -height 416
小心標註相片內的物件，每一物件都要標註，而且不要標錯
每個物件最好有 2000 以上的影像，包含有不同的大小、角度、光線、背景等
不要被檢出的物件要在相片內，而且不能被標註

訓練時相片和標註檔的對映
darknet.c
int main(int argc, char **argv)
>run_detector(argc, argv);
detector.c
void run_detector(int argc, char **argv)
>train_detector(datacfg, cfg, weights, gpus, ngpus, clear, dont_show);
void train_detector(char *datacfg, char *cfgfile, char *weightfile, int *gpus, int ngpus, int clear, int dont_show)
> pthread_t load_thread = load_data(args);
data.c
pthread_t load_data(load_args args)
>if(pthread_create(&thread, 0, load_threads, ptr)) error("Thread creation failed");
void *load_threads(void *ptr)
>threads[i] = load_data_in_thread(args);
if(pthread_create(&thread, 0, load_thread, ptr)) error("Thread creation failed");
void *load_thread(void *ptr)
>*a.d = load_data_detection(a.n, a.paths, a.m, a.w, a.h, a.c, a.num_boxes, a.classes, a.flip, a.jitter, a.hue, a.saturation, a.exposure, a.small_object);
data load_data_detection(int n, char **paths, int m, int w, int h, int c, int boxes, int classes, int use_flip, float jitter, float hue, float saturation, float exposure, int small_object)
>fill_truth_detection(filename, boxes, d.y.vals[i], classes, flip, dx, dy, 1./sx, 1./sy, small_object, w, h);
void fill_truth_detection(char *path, int num_boxes, float *truth, int classes, int flip, float dx, float dy, float sx, float sy, int small_object, int net_w, int net_h)
>replace_image_to_label(path, labelpath);
utils.c
void replace_image_to_label(char *input_path, char *output_path)

在相片上標註偵測出的物件
image.c
void draw_detections_cv_v3(IplImage* show_img, detection *dets, int num, float thresh, char **names, image **alphabet, int classes, int ext_output)

network.c
將 image 轉成 network
float *network_predict(network net, float *input)
從 network 中取得 detection
detection *get_network_boxes(network *net, int w, int h, float thresh, float hier, int *map, int relative, int *num, int letter)

訂閱：文章 (Atom)

生活紀錄

網頁

2022年10月25日星期二

tensorflow predict memory leak

2022年10月20日星期四

gym tensorflow 衝突

tensorflow 在 Xavier 出現 cannot allocate memory in static TLS block 錯誤

2022年3月2日星期三

TensorFlow to TensorRT

2022年2月9日星期三

Yolo tiny v4 to tensorflow and tflite

2021年1月21日星期四

Install Tensorflow 1.15 on Ubuntu 18.04

2020年11月27日星期五

jetson nano install tensorflow

2020年9月25日星期五

accuracy, precision, recall 的理解

2020年8月27日星期四

學習 How to configure your NVIDIA Jetson Nano for Computer Vision and Deep Learning

2020年7月8日星期三

互動式 OpenAI gym

2019年10月18日星期五

tensorflow 和 cuda cudnn 版本

2019年8月14日星期三

使用 TensorRT, 載入 frozen_model.pb 太慢

2019年6月25日星期二

Nvidia Jetson AGX Xavier Build tensorflow 1.13

2018年12月7日星期五

EAST Tesseract 效能測試

build tensorflow 1.10

2018年11月28日星期三

build tensorflow 1.11 from source in visual studio

2018年11月6日星期二

EAST text detector

2018年7月9日星期一

CUDA 安裝失敗

2018年7月3日星期二

tensorflow audio recognition 之 SpeechActivity.java

Yolo

網頁

2022年10月25日 星期二

2022年10月20日 星期四

2022年3月2日 星期三

2022年2月9日 星期三

2021年1月21日 星期四

2020年11月27日 星期五

2020年9月25日 星期五

2020年8月27日 星期四

2020年7月8日 星期三

2019年10月18日 星期五

2019年8月14日 星期三

2019年6月25日 星期二

2018年12月7日 星期五

2018年11月28日 星期三

2018年11月6日 星期二

2018年7月9日 星期一

2018年7月3日 星期二

2022年10月25日星期二

2022年10月20日星期四

2022年3月2日星期三

2022年2月9日星期三

2021年1月21日星期四

2020年11月27日星期五

2020年9月25日星期五

2020年8月27日星期四

2020年7月8日星期三

2019年10月18日星期五

2019年8月14日星期三

2019年6月25日星期二

2018年12月7日星期五

2018年11月28日星期三

2018年11月6日星期二

2018年7月9日星期一

2018年7月3日星期二