網頁

顯示具有 tensorflow 標籤的文章。 顯示所有文章
顯示具有 tensorflow 標籤的文章。 顯示所有文章

2022年10月25日 星期二

tensorflow predict memory leak

記憶體越吃越多,直到系統當機
$ top
可看到 VIRT RES 越來越大
$ jtop
看到 Mem 也隨時間越來越大

查詢目前程式占用的記憶體
import psutil
psutil.Process().memory_info().rss / (1024*1024*1024),
psutil.Process().memory_info().vms / (1024*1024*1024),

查詢目前程式碼使用記憶體狀況
from memory_profiler import profile
@profile(precision=4,stream=open('memory_profiler.log','w+'))
def function()
@profile # 直接在 stdout 輸出
def function()
但是看不出所以然

網路上常說因為 numpy 到 tensor 轉換的原因
state = tf.convert_to_tensor(state)
model.predict(state)
state = tf.convert_to_tensor(state)
model.fit(states)
但是沒有用

垃圾收集
import gc
gc.collect()
但是也沒有用

最後一招,有用
import tensorflow as tf
tf.keras.backend.clear_session()

2022年10月20日 星期四

gym tensorflow 衝突

env.render()
出現錯誤
    from pyglet.gl import *
  File "/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/pyglet/gl/__init__.py", line 243, in <module>
    import pyglet.window
  File "/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/pyglet/window/__init__.py", line 1897, in <module>
    gl._create_shadow_window()
  File "/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/pyglet/gl/__init__.py", line 220, in _create_shadow_window
    _shadow_window = Window(width=1, height=1, visible=False)
  File "/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/pyglet/window/xlib/__init__.py", line 173, in __init__
    super(XlibWindow, self).__init__(*args, **kwargs)
  File "/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/pyglet/window/__init__.py", line 595, in __init__
    config = screen.get_best_config(template_config)
  File "/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/pyglet/canvas/base.py", line 192, in get_best_config
    configs = self.get_matching_configs(template)
  File "/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/pyglet/canvas/xlib.py", line 220, in get_matching_configs
    configs = template.match(canvas)
  File "/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/pyglet/gl/xlib.py", line 58, in match
    have_13 = info.have_version(1, 3)
  File "/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/pyglet/gl/glx_info.py", line 86, in have_version
    client_version = self.get_client_version().split()[0]
IndexError: list index out of range


解決方案為 env.render() 後才能 import tensorflow
import gym
env = gym.make("CartPole-v0")
env.render()
import tensorflow as tf

tensorflow 在 Xavier 出現 cannot allocate memory in static TLS block 錯誤

其實會出現這個問題是 gym tensorflow 衝突 原因
解決這個問題,就部會出現下列問題

Traceback (most recent call last):
  File "/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow.py", line 62, in <module>
    from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: /home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/tensorflow/python/../../tensorflow_cpu_aws.libs/libgomp-d22c30c5.so.1.0.0: cannot allocate memory in static TLS block

$ vi .bashrc
export LD_PRELOAD=/home/UserName/envs/tf2.10.0/lib/python3.8/site-packages/tensorflow/python/../../tensorflow_cpu_aws.libs/libgomp-d22c30c5.so.1.0.0

2022年3月2日 星期三

TensorFlow to TensorRT

$ pip install onnxruntime
$ pip install -U tf2onnx
python -m tf2onnx.convert \
--saved-model tensorflow-model-path \
--output output.onnx

2022年2月9日 星期三

Yolo tiny v4 to tensorflow and tflite

參考 tensorflow-yolov4-tflite

只能使用 tensorflow==2.3.0rc0
不要使用別的版本,也不要用 GPU

視情況修改 core/config.py
__C.YOLO.CLASSES
__C.YOLO.ANCHORS_TINY

for tensorflow format load by tf.saved_model.load()
$ python save_model.py --weights /your_path_to/weights/yolov4-tiny-vehicle-r_final.weights \
--output ./checkpoints/yolov4-tiny-416 \
--input_size 416 --model yolov4 --tiny
$ python convert_tflite.py --weights ./checkpoints/yolov4-tiny-416-tflite \
--output ./checkpoints/yolov4-tiny-416.tflite

for tensorflow tflite load  by tf.lite.Interpreter()
$ python save_model.py --weights /your_path_to/weights/yolov4-tiny-vehicle-r_final.weights \
--output ./checkpoints/yolov4-tiny-416-tflite \
--input_size 416 --model yolov4 --tiny --framework tflite
$ python convert_tflite.py --weights ./checkpoints/yolov4-tiny-416-tflite \
--output ./checkpoints/yolov4-tiny-416-fp16.tflite \
--quantize_mode float16

2021年1月21日 星期四

Install Tensorflow 1.15 on Ubuntu 18.04

原本 pip3 install tensorflow-gpu=1.15 即可
但發現與 cuda 10.2 不合, 需要安裝 cuda 10.0,搭配 cudnn v7.6.5

安裝 CUDA
https://developer.nvidia.com/cuda-downloads
選 Archive of Previous CUDA Releases
選 CUDA Toolkit 10.0, Linux, x86_64, Ubuntu, 18.04, deb(local)
按下 Download
sudo dpkg -i cuda-repo-ubuntu1804-10-0-local-10.0.130-410.48_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-0-local-10.0.130-410.48/7fa2af80.pub
sudo apt-get update
查詢可安裝版本
apt-cache policy cuda
apt-cache madison cuda
安裝正確版本
sudo apt-get install cuda=10.0.130-1

安裝 CUDNN
https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html
https://developer.nvidia.com/cudnn
選擇 cuDNN v7.6.5 for CUDA 10.0
選擇 cuDNN Library for Linux (x86)
tar -xzvf cudnn-10.0-linux-x64-v7.6.5.32.tgz
sudo cp cuda/include/cudnn*.h /usr/local/cuda-10.0/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda-10.0/lib64
sudo chmod a+r /usr/local/cuda-10.0/include/cudnn*.h /usr/local-10.0/cuda/lib64/libcudnn*

sudo apt install python3-testresources
sudo apt install python-dev python-pip
mkdir envs; cd envs
python3 -m venv --system-site-packages tensorflow-1.15
source tensorflow-1.15/bin/activate
pip install --upgrade pip
pip install tensorflow-gpu==1.15

2020年11月27日 星期五

jetson nano install tensorflow

$ sudo apt-get update
$ sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev liblapack-dev libblas-dev gfortran
$ sudo apt-get install python3-pip
$ sudo pip3 install -U pip testresources setuptools==49.6.0
$ sudo apt-get install virtualenv
$ make envs
$ cd envs
$ sudo pip3 install -U numpy==1.16.1 future==0.18.2 mock==3.0.5 h5py==2.10.0 keras_preprocessing==1.1.1 keras_applications==1.0.8 gast==0.2.2 futures protobuf pybind11

$ python3 -m virtualenv -p python3 tensorflow-2.3.1
$ source tensorflow-2.3.1/bin/activate
$ pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v44 tensorflow==2.3.1+nv20.11

$ python3 -m virtualenv -p python3 tensorflow-1.15.4
$ source tensorflow-1.15.4/bin/activate
$ pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v44 tensorflow==1.15.4+nv20.11

2020年9月25日 星期五

accuracy, precision, recall 的理解

預設真假 和 事實的真假
TP(True Positive): 事實為真,預測為真
FN(False Negative): 事實為真,預測為假
FP(False Positive): 事實為假,預測為真
TN(True Negative): 事實為假,預測為假

Accuuuracy = (TP+TN) / (TP+TN+FP+FN)
正確率:在所有情況中,正確預測的比率

Precision = (TP) / (TP+FP)
精確率:預測為真的情況中,有多少是真

Recall = (TP) / (TP+FN)
召回率:為真的情況下,有多少預測為真

Precision 高, Recall 低:捉到的大部分是 真的,但會漏掉 真的
Precision 低, Recall 高:真的 大部分會被捉到,但會有不少 假的

2020年8月27日 星期四

學習 How to configure your NVIDIA Jetson Nano for Computer Vision and Deep Learning

參考 How to configure your NVIDIA Jetson Nano for Computer Vision and Deep Learning

更新 system level packages
$ sudo apt-get update
$ sudo apt-get upgrade

安裝 system-level 相關套件
$ sudo apt-get install git
$ sudo apt-get install cmake
$ sudo apt-get install libatlas-base-dev
$ sudo apt-get install gfortran
$ sudo apt-get install libhdf5-serial-dev
$ sudo apt-get install hdf5-tools
$ sudo apt-get install python3-dev
$ sudo apt-get install locate
$ sudo apt-get install libfreetype6-dev
$ sudo apt-get install python3-setuptools
$ sudo apt-get install protobuf-compiler
$ sudo apt-get install libprotobuf-dev
$ sudo apt-get install openssl
$ sudo apt-get install libssl-dev
$ sudo apt-get install libcurl4-openssl-dev
$ sudo apt-get install cython3
$ sudo apt-get install libxml2-dev
$ sudo apt-get install libxslt1-dev

2020年7月8日 星期三

互動式 OpenAI gym

https://github.com/openai/gym/blob/master/gym/utils/play.py
env = gym.make("Enduro-v0")
def cb(obs_t, obs_tp1, action, rew, done, info):
    return [rew,]
plotter = PlayPlot(cb, hor0zon_timesteps=(30*5), plot_names=["reward"])
play(env, callback=plotter.callback, zoom=4)

https://github.com/openai/gym/blob/master/gym/core.py
https://github.com/openai/gym/tree/master/gym/wrappers
https://github.com/openai/gym/blob/master/gym/wrappers/atari_preprocessing.py
https://github.com/openai/gym/blob/master/gym/envs/__init__.py
        register(
            id='{}-v0'.format(name),
            entry_point='gym.envs.atari:AtariEnv',
            kwargs={'game': game, 'obs_type': obs_type, 'repeat_action_probability': 0.25},
            max_episode_steps=10000,
            nondeterministic=nondeterministic,
        )
https://github.com/openai/gym/blob/master/gym/envs/atari/atari_env.py
    pip install gym[atari]
    self.ale = atari_py.ALEInterface()
    reward += self.ale.act(action)

https://github.com/openai/atari-py/tree/master/atari_py
https://github.com/openai/atari-py/blob/master/atari_py/__init__.py
https://github.com/openai/atari-py/blob/master/atari_py/ale_python_interface.py
    ale_lib = cdll.LoadLibrary(os.path.join(os.path.dirname(__file__),
                                            'ale_interface/libale_c.so'))
    def act(self, action):
        return ale_lib.act(self.obj, int(action))
https://github.com/openai/atari-py/blob/master/atari_py/ale_interface/src/ale_interface.cpp
    reward_t reward = environment->act(action, PLAYER_B_NOOP);
https://github.com/openai/atari-py/blob/master/atari_py/ale_interface/src/ale_interface.hpp
std::unique_ptr<StellaEnvironment> environment;
https://github.com/openai/atari-py/blob/master/atari_py/ale_interface/src/environment/stella_environment.cpp
reward_t StellaEnvironment::act(Action player_a_action, Action player_b_action)
    sum_rewards += oneStepAct(m_player_a_action, m_player_b_action);
reward_t StellaEnvironment::oneStepAct(Action player_a_action, Action player_b_action)
    return m_settings->getReward();
https://github.com/openai/atari-py/blob/master/atari_py/ale_interface/src/environment/stella_environment.hpp
    RomSettings *m_settings;
https://github.com/openai/atari-py/blob/master/atari_py/ale_interface/src/games/RomSettings.hpp
    virtual reward_t getReward() const = 0;
https://github.com/openai/atari-py/blob/master/atari_py/ale_interface/src/games/supported/Enduro.cpp

2019年10月18日 星期五

tensorflow 和 cuda cudnn 版本

查看各個版本的對應

查看目前 cuda 版本
cat /usr/local/cuda/version.txt

查看目前 cudnn 版本
grep CUDNN_MAJOR -A 2 /usr/local/cuda/include/cudnn.h
查看工具版本
which nvcc
nvcc --version

查看驅動程式版本
cat /proc/driver/nvidia/version

nvidia-smi

2019年8月14日 星期三

使用 TensorRT, 載入 frozen_model.pb 太慢

使用 TensorRT 就是要加快 inference 的速度
沒想到載入模型時,速度超慢

上網查到 extremely long model loading time problem
發現主要原因為 protobuf 使用 python
改用 cpp 才能改善速度

文章上使用 protobuf 3.6.1
但是我的 protobuf 使用 3.8.0
所以把相關參數改成 3.8.0
並把 protobuf 安裝在 python 的 virtualenv 上

sudo /usr/local/cuda-10.0/bin/nvprof --log-file=profile_freeze.log /mnt/XavierSSD/envs/OpenAiGym/bin/python inference.py


2019年6月25日 星期二

Nvidia Jetson AGX Xavier Build tensorflow 1.13

參考 Building Tensorflow 1.13 on Jetson Xavier

安裝 bazel
nvidia@jetson-0423418048807:~/XavierSSD/Downloads$ wget https://github.com/bazelbuild/bazel/releases/download/0.19.2/bazel-0.19.2-dist.zip
nvidia@jetson-0423418048807:~/XavierSSD/Downloads$ mkdir bazel
nvidia@jetson-0423418048807:~/XavierSSD/Downloads$ cd bazel
nvidia@jetson-0423418048807:~/XavierSSD/Downloads/bazel$ unzip bazel-0.19.2-dist.zip
nvidia@jetson-0423418048807:~/XavierSSD/Downloads/bazel$ env EXTRA_BAZEL_ARGS="--host_javabase=@local_jdk//:jdk" bash ./compile.sh
nvidia@jetson-0423418048807:~/XavierSSD/Downloads/bazel$ cd ..
nvidia@jetson-0423418048807:~/XavierSSD/Downloads$ mv bazel ~/XavierSSD
nvidia@jetson-0423418048807:~/XavierSSD/Downloads$ cd ../bazel/
nvidia@jetson-0423418048807:~/XavierSSD/bazel$ vi ~/.bashrc
加入下一行到檔案底部,並且也執行一遍
export PATH=~/XavierSSD/bazel/output${PATH:+:${PATH}}

下載 tensorflow
nvidia@jetson-0423418048807:~/XavierSSD/bazel$ cd ..
nvidia@jetson-0423418048807:~/XavierSSD$ git clone https://github.com/tensorflow/tensorflow.git
nvidia@jetson-0423418048807:~/XavierSSD$ cd tensorflow/

設定 git 環境,取得 r1.13 版
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git config --global user.email "name@yahoo.com.tw"
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git config --global user.name "name"
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git checkout r1.13

為 Nvidia Jetson AGX Xavier 修改
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ vi tensorflow/lite/kernels/internal/BUILD
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git add tensorflow/lite/kernels/internal/BUILD
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git commit -m "Update 1"
[r1.13 982e077b2a] Update 1
 1 file changed, 3 deletions(-)
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git log
commit 982e077b2a4e2123f7a299dbaf95d97383303d17 (HEAD -> r1.13)
Author: name <name@yahoo.com.tw>
Date:   Mon Jun 24 14:49:30 2019 +0800

    Update 1

commit 93dd14dce2e8751bcaab0a0eb363d55eb0cc5813 (origin/r1.13)
Author: Mihai Maruseac <mihaimaruseac@google.com>
Date:   Tue May 21 10:08:18 2019 -0700

    Update png_archive version to 1.6.37

    PiperOrigin-RevId: 249272809

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git diff 93dd1 982e0
diff --git a/tensorflow/lite/kernels/internal/BUILD b/tensorflow/lite/kernels/internal/BUILD
index 4be3226938..7226f96fdf 100644
--- a/tensorflow/lite/kernels/internal/BUILD
+++ b/tensorflow/lite/kernels/internal/BUILD
@@ -22,15 +22,12 @@ HARD_FP_FLAGS_IF_APPLICABLE = select({
 NEON_FLAGS_IF_APPLICABLE = select({
     ":arm": [
         "-O3",
-        "-mfpu=neon",
     ],
     ":armeabi-v7a": [
         "-O3",
-        "-mfpu=neon",
     ],
     ":armv7a": [
         "-O3",
-        "-mfpu=neon",
     ],
     "//conditions:default": [
         "-O3",
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ vi third_party/aws/BUILD.bazel
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git add third_party/aws/BUILD.bazel
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git commit -m "Update 2"
[r1.13 a3d6ea2fce] Update 2
 1 file changed, 1 insertion(+), 1 deletion(-)
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git log
commit a3d6ea2fce8fff7bcf74ee52cd77074416d24bf2 (HEAD -> r1.13)
Author: mark <ingrenn@yahoo.com.tw>
Date:   Mon Jun 24 14:56:09 2019 +0800

    Update 2

commit 982e077b2a4e2123f7a299dbaf95d97383303d17
Author: mark <ingrenn@yahoo.com.tw>
Date:   Mon Jun 24 14:49:30 2019 +0800

    Update 1

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git diff 982e0 a3d6ea
diff --git a/third_party/aws/BUILD.bazel b/third_party/aws/BUILD.bazel
index 5426f79e46..e08f8fc108 100644
--- a/third_party/aws/BUILD.bazel
+++ b/third_party/aws/BUILD.bazel
@@ -24,7 +24,7 @@ cc_library(
         "@org_tensorflow//tensorflow:raspberry_pi_armeabi": glob([
             "aws-cpp-sdk-core/source/platform/linux-shared/*.cpp",
         ]),
-        "//conditions:default": [],
+        "//conditions:default": glob(["aws-cpp-sdk-core/source/platform/linux-shared/*.cpp",]),
     }) + glob([
         "aws-cpp-sdk-core/include/**/*.h",
         "aws-cpp-sdk-core/source/*.cpp",
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ vi third_party/gpus/crosstool/BUILD.tpl
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git add third_party/gpus/cro
sstool/BUILD.tpl
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git commit -m "Update 3"
[r1.13 65ad3b64e5] Update 3
 1 file changed, 1 insertion(+)
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git log
commit 65ad3b64e5f16b3496628bee800fabf825a7c1ce (HEAD -> r1.13)
Author: mark <ingrenn@yahoo.com.tw>
Date:   Mon Jun 24 15:04:22 2019 +0800

    Update 3

commit a3d6ea2fce8fff7bcf74ee52cd77074416d24bf2
Author: mark <ingrenn@yahoo.com.tw>
Date:   Mon Jun 24 14:56:09 2019 +0800

    Update 2

commit 982e077b2a4e2123f7a299dbaf95d97383303d17
Author: mark <ingrenn@yahoo.com.tw>
Date:   Mon Jun 24 14:49:30 2019 +0800

    Update 1

commit 93dd14dce2e8751bcaab0a0eb363d55eb0cc5813 (origin/r1.13)
Author: Mihai Maruseac <mihaimaruseac@google.com>
Date:   Tue May 21 10:08:18 2019 -0700

    Update png_archive version to 1.6.37
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ git diff a3d6ea 65ad3
diff --git a/third_party/gpus/crosstool/BUILD.tpl b/third_party/gpus/crosstool/BUILD.tpl
index db76306ffb..184cd35b87 100644
--- a/third_party/gpus/crosstool/BUILD.tpl
+++ b/third_party/gpus/crosstool/BUILD.tpl
@@ -24,6 +24,7 @@ cc_toolchain_suite(
         "x64_windows|msvc-cl": ":cc-compiler-windows",
         "x64_windows": ":cc-compiler-windows",
         "arm": ":cc-compiler-local",
+        "aarch64": ":cc-compiler-local",
         "k8": ":cc-compiler-local",
         "piii": ":cc-compiler-local",
         "ppc": ":cc-compiler-local",
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$
修改完成

安裝所需版本的 g++ 和 gcc
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ sudo apt-get install g++-5
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ sudo apt-get install gcc-5

設定編譯環境
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ ./configure
Extracting Bazel installation...
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.19.2- (@non-git) installed.
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3


Found possible Python library paths:
  /usr/lib/python3.6/dist-packages
  /usr/lib/python3/dist-packages
  /usr/local/lib/python3.6/dist-packages
Please input the desired Python library path to use.  Default is [/usr/lib/python3.6/dist-packages]

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]:


Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.0


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.3


Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: /usr/lib/aarch64-linux-gnu


Do you wish to build TensorFlow with TensorRT support? [y/N]: y
TensorRT support will be enabled for TensorFlow.

Please specify the location where TensorRT is installed. [Default is /usr/lib/aarch64-linux-gnu]:


Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]:


Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,7.0]: 7.2


Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: /usr/bin/gcc-5


Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]:


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
        --config=mkl            # Build with MKL support.
        --config=monolithic     # Config for mostly static monolithic build.
        --config=gdr            # Build with GDR support.
        --config=verbs          # Build with libverbs support.
        --config=ngraph         # Build with Intel nGraph support.
        --config=dynamic_kernels        # (Experimental) Build kernels into separate shared objects.
Preconfigured Bazel build configs to DISABLE default on features:
        --config=noaws          # Disable AWS S3 filesystem support.
        --config=nogcp          # Disable GCP support.
        --config=nohdfs         # Disable HDFS support.
        --config=noignite       # Disable Apacha Ignite support.
        --config=nokafka        # Disable Apache Kafka support.
        --config=nonccl         # Disable NVIDIA NCCL support.
Configuration finished
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$

建立 tensorflow 安裝資料
此處會執行很久,甚至會報錯,如找不到 numpy 等
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ bazel build --config=opt --config=nonccl //tensorflow/tools/pip_package:build_pip_package --incompatible_remove_native_http_archive=false --verbose_failures --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"
產生 tensorflow-1.13.1-cp36-cp36m-linux_aarch64.whl
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ sudo bazel-bin/tensorflow/tools/pip_package/build_pip_package ../


移除舊的 tensorflow,並安裝新的
nvidia@jetson-0423418048807:~$ source XavierSSD/envs/tensorflow/bin/activate
(tensorflow) nvidia@jetson-0423418048807:~$ cd XavierSSD/tensorflow/
((tensorflow) nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ pip3 uninstall tensorflow-gpu
(tensorflow) nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ pip3 install ../tensorflow-1.13.1-cp36-cp36m-linux_aarch64.whl
(tensorflow) nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ deactivate
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$

建立 tensorflow c++ 的 shared library libtensorflow_cc.so
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ bazel build --config=opt --config=nonccl //tensorflow:libtensorflow_cc.so --incompatible_remove_native_http_archive=false --verbose_failures --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ ls -al bazel-bin/tensorflow/
libtensorflow_cc.so
-r-xr-xr-x 1 nvidia nvidia 303026864 Jun 25 12:22 bazel-bin/tensorflow/libtensorflow_cc.so
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$

nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ mkdir tensorflow/cc/example
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ vi tensorflow/cc/example/example.cc

// tensorflow/cc/example/example.cc

#include "tensorflow/cc/client/client_session.h"
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/tensor.h"

int main() {
  using namespace tensorflow;
  using namespace tensorflow::ops;
  Scope root = Scope::NewRootScope();
  // Matrix A = [3 2; -1 0]
  auto A = Const(root, { {3.f, 2.f}, {-1.f, 0.f} });
  // Vector b = [3 5]
  auto b = Const(root, { {3.f, 5.f} });
  // v = Ab^T
  auto v = MatMul(root.WithOpName("v"), A, b, MatMul::TransposeB(true));
  std::vector<Tensor> outputs;
  ClientSession session(root);
  // Run and fetch v
  TF_CHECK_OK(session.Run({v}, &outputs));
  // Expect outputs[0] == [19; -3]
  LOG(INFO) << outputs[0].matrix<float>();
  return 0;
}
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ vi tensorflow/cc/example/BUILD

load("//tensorflow:tensorflow.bzl", "tf_cc_binary")

tf_cc_binary(
    name = "example",
    srcs = ["example.cc"],
    deps = [
        "//tensorflow/cc:cc_ops",
        "//tensorflow/cc:client_session",
        "//tensorflow/core:tensorflow",
    ],
)
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$

編譯範例 example.cc
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ bazel build -c opt //tensorflow/cc/example:example
跑了好久,測試執行
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$ bazel-bin/tensorflow/cc/exam
ple/example
2019-06-25 16:10:03.922559: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:965] ARM64 does not support NUMA - returning NUMA node zero
2019-06-25 16:10:03.922959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.5
pciBusID: 0000:00:00.0
totalMemory: 15.45GiB freeMemory: 8.57GiB
2019-06-25 16:10:03.923054: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-25 16:10:03.924346: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-25 16:10:03.924409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2019-06-25 16:10:03.924452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2019-06-25 16:10:03.925152: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8340 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2019-06-25 16:10:07.582973: I tensorflow/cc/example/example.cc:22] 19
-3
nvidia@jetson-0423418048807:~/XavierSSD/tensorflow$


nvidia@jetson-0423418048807:~/XavierSSD$ git clone https://github.com/bitbionic/
keras-to-tensorflow.git
Cloning into 'keras-to-tensorflow'...
remote: Enumerating objects: 3719, done.
remote: Total 3719 (delta 0), reused 0 (delta 0), pack-reused 3719
Receiving objects: 100% (3719/3719), 227.81 MiB | 786.00 KiB/s, done.
Resolving deltas: 100% (7/7), done.
Checking out files: 100% (3688/3688), done.
nvidia@jetson-0423418048807:~/XavierSSD$
nvidia@jetson-0423418048807:~/XavierSSD$ cd keras-to-tensorflow
nvidia@jetson-0423418048807:~/XavierSSD/keras-to-tensorflow$ vi main.c
因為編譯時會有兩種錯誤
data.ToString(); 改成 std::string(data);
tensorflow::StringPiece(file_name).ends_with(".png")
改成
tensorflow::str_util::EndsWith(file_name, ".png")

nvidia@jetson-0423418048807:~/XavierSSD/keras-to-tensorflow$
nvidia@jetson-0423418048807:~/XavierSSD/keras-to-tensorflow$ g++-5 -std=gnu++11 -c ./main.cpp -D_GLIBCXX_USE_CXX11_ABI=0     -I../tensorflow     -I../tensorflow/bazel-tensorflow/external/eigen_archive     -I../tensorflow/bazel-tensorflow/external/protobuf_archive/src     -I../tensorflow/bazel-tensorflow/external/com_google_absl     -I../tensorflow/bazel-genfiles




2018年12月7日 星期五

EAST Tesseract 效能測試

EAST: An Efficient and Accurate Scene Text Detector

參考 OpenCV OCR and text recognition with Tesseract

發現 opencv 使用硬體加速是
net = cv2.dnn.readNet(args["east"])
net.setPreferableTarget(cv2.dnn.DNN_TARGET_OPENCL);
而 OPENCL 並不是 NVDIA 的 CUDA 是 Intel(GPU)

使用了 tensorflow 的 gpu(CUDA)
效能的卻比較好
640x480 從 400ms 到 340ms

但 Tsseract 沒有加速,只能加速 EAST

build tensorflow 1.10

參考 build tensorflow 1.11 from source in visual studio

(base) D:\TensorFlowB>conda env list
(base) D:\TensorFlowB>conda env remove -n tensorflow-1.10
(base) D:\TensorFlowB>conda create -n tensorflow-1.10 pip python=3.6
(base) D:\TensorFlowB>activate tensorflow-1.10
(tensorflow-1.10) D:\TensorFlowB>pip install six numpy wheel protobuf absl-py
(tensorflow-1.10) D:\TensorFlowB>pip install keras_applications==1.0.5 --no-deps
(tensorflow-1.10) D:\TensorFlowB>pip install keras_preprocessing==1.0.3 --no-deps

D:\TensorFlowB>git clone https://github.com/tensorflow/tensorflow.git tensorflow-1.10
D:\TensorFlowB>cd tensorflow-1.10
D:\TensorFlowB\tensorflow-1.10>git checkout r1.10
D:\TensorFlowB\tensorflow-1.10>git pull origin master
D:\TensorFlowB\tensorflow-1.10>bazel clean
D:\TensorFlowB\tensorflow-1.10>python ./configure.py

修改 D:/TensorFlowB/tensorflow-1.10/tensorflow/contrib/cmake/CMakeLists.txt

if (tensorflow_OPTIMIZE_FOR_NATIVE_ARCH)
  include(CheckCXXCompilerFlag)
  CHECK_CXX_COMPILER_FLAG("-march=native" COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
  if (COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native")
  else()
    CHECK_CXX_COMPILER_FLAG("/arch:AVX2" COMPILER_OPT_ARCH_AVX_SUPPORTED)
    if(COMPILER_OPT_ARCH_AVX_SUPPORTED)
      set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX2")
      add_definitions(-D__AVX2__)
    endif()
  endif()
endif()

D:/TensorFlowB/tensorflow-1.10/tensorflow/contrib/cmake
選 Visual Studio 14 2015 Win64
Optional toolset to use, 輸入 "host=x64"
tensorflow_BUILD_SHARED_LIB v
tensorflow_ENABLE_GPU v

問題與解答
D:\TensorFlowB\tensorflow-1.10\tensorflow\stream_executor\dnn.pb.h
This file was generated by a newer version of protoc
which is incompatible with your Protocol Buffer headers.
Please update your headers.

D:\TensorFlowB\tensorflow-1.10\tensorflow\contrib\cmake\external\protobuf.cmake
set(PROTOBUF_TAG v3.6.0)
set(PROTOBUF_TAG v3.6.1)

D:\TensorFlowB\tensorflow-1.10\tensorflow\workspace.bzl

  tf_http_archive(
      name = "protobuf_archive",
      urls = [
          "https://mirror.bazel.build/github.com/google/protobuf/archive/v3.6.1.tar.gz",
          "https://github.com/google/protobuf/archive/v3.6.1.tar.gz",
      ],
      sha256 = "3d4e589d81b2006ca603c1ab712c9715a76227293032d05b26fca603f90b3f5b",
      strip_prefix = "protobuf-3.6.1",
  )
  tf_http_archive(
      name = "eigen_archive",
      urls = [
          "https://mirror.bazel.build/bitbucket.org/eigen/eigen/get/fd6845384b86.tar.gz",
          "https://bitbucket.org/eigen/eigen/get/fd6845384b86.tar.gz",
      ],
      sha256 = "d956415d784fa4e42b6a2a45c32556d6aec9d0a3d8ef48baee2522ab762556a9",
      strip_prefix = "eigen-eigen-fd6845384b86",
      build_file = clean_dep("//third_party:eigen.BUILD"),
      patch_file = clean_dep("//third_party:eigen_half.patch"),
  )

download eigen_half.patch to D:\TensorFlowB\tensorflow-1.10\third_party\eigen_half.patch

build project tf_python_build_pip_package 產生
EXEC : error : [WinError 5] 存取被拒。:
'build\\bdist.win-amd64\\wheel\\tensorflow_gpu-1.10.1.data\\purelib\\tensorflow\\include\\tensorflow\\stream_executor\\dnn.pb.h'
改變 dnn.pb.h 檔案屬性
使用 Administrator 開啟 vs2015 x64 Native Tools Command Prompt
D:\TensorFlowB\build-1.10>D:\Anaconda3\envs\tensorflow-1.10\python.exe D:/TensorFlowB/build-1.10/tf_python/setup.py bdist_wheel --project_name tensorflow_gpu
另外開啟 Anaconda
(base) D:\TensorFlowB\build-1.10\tf_python>activate tensorflow-1.10
(tensorflow-1.10) D:\TensorFlowB\build-1.10\tf_python>pip install dist\tensorflow_gpu-1.10.1-cp36-cp36m-win_amd64.whl

bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
失敗,沒有到解決方法
c:\users\mark\appdata\local\temp\nvcc_inter_files_tmp_dir\depthwise_conv_op_gpu.cu.compute_70.cudafe1.stub.c(3):
fatal error C1083: Cannot open include file: 'depthwise_conv_op_gpu.cu.fatbin.c': No such file or directory

2018年11月28日 星期三

build tensorflow 1.11 from source in visual studio

先說重點
目前 windows 下 GPU 的版本來到 tensorflow_gpu-1.12.0 使用 Bazel
但是發現目前的 Bazel 產生的 library 不能在 Visual Studio 中使用
退到 tensorflow_gpu-1.11 使用 Cmake 建立 library

另外只能建立 Release 版本,並使用 RelWithDebInfo 版本,取代 Debug 版本
但只有 Release 能成功

library 建立起來後,程式可以編譯,可以執行,但結果是錯的

開啟 Anaconda Prompt
(base) D:\>conda create -n tensorflow-1.11 pip python=3.6
(base) D:\>activate tensorflow-1.11
(tensorflow-1.11) D:\>pip install six numpy wheel
(tensorflow-1.11) D:\>pip install keras_applications==1.0.5 --no-deps
(tensorflow-1.11) D:\>pip install keras_preprocessing==1.0.3 --no-deps

http://www.msys2.org/
下載 msys2-x86_64-20180531.exe
開啟 msys2/MinGW 64-bit
$ pacman -Syu
$ pacman -Su
$ pacman -S git patch unzip

安裝 Bazel
https://github.com/bazelbuild/bazel/releases
下載 bazel-0.18.1-windows-x86_64.exe
rename bazel-0.18.1-windows-x86_64.exe bazel.exe
move bazel.exe D:\msys64\usr\bin
add PATH D:\msys64\usr\bin

安裝 JDK 8
下載 jdk-8u191-windows-x64.exe
add JAVA_HOME C:\Program Files\Java\jdk1.8.0_191

copy cudnn-9.0-windows10-x64-v7\cuda\* to
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0

下載 swigwin-3.0.12.zip
解壓縮於 D:\TensorFlowB\swigwin-3.0.12

開啟 VS3215 x64 Native Tools Command Prompt
D:\TensorFlowB>git clone https://github.com/tensorflow/tensorflow.git tensorflow-1.11
D:\TensorFlowB>cd tensorflow-1.11
D:\TensorFlowB\tensorflow-1.11>git checkout r1.11
D:\TensorFlowB\tensorflow-1.11>python ./configure.py
Please specify the location of python. [Default is D:\Anaconda3\python.exe]:
Please input the desired Python library path to use.  Default is [D:\Anaconda3\lib\site-packages]
Do you wish to build TensorFlow with nGraph support? [y/N]:
Do you wish to build TensorFlow with CUDA support? [y/N]: y
Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]:
Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v9.0]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]:
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v9.0]:
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,7.0]:
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is /arch:AVX]: /arch:AVX2
Would you like to override eigen strong inline for some C++ compilation to reduce the compilation time? [Y/n]:

修改 D:/TensorFlowB/tensorflow-1.11/tensorflow/contrib/cmake/CMakeLists.txt 增加 AVX2 功能

if (tensorflow_OPTIMIZE_FOR_NATIVE_ARCH)
  include(CheckCXXCompilerFlag)
  CHECK_CXX_COMPILER_FLAG("-march=native" COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
  if (COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native")
  else()
    CHECK_CXX_COMPILER_FLAG("/arch:AVX2" COMPILER_OPT_ARCH_AVX_SUPPORTED)
    if(COMPILER_OPT_ARCH_AVX_SUPPORTED)
      set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX2")
      add_definitions(-D__AVX2__)
    endif()
  endif()
endif()

參考 Add abseil_cpp cmake dependence. 修改
D:/TensorFlowB/tensorflow-1.11/tensorflow/contrib/cmake/CMakeLists.txt
增加 tensorflow/contrib/cmake/external/abseil_cpp.cmake
增加 tensorflow/contrib/cmake/modules/FindAbseilCpp.cmake
以免出現找不到 absl/strings/string_view.h 錯誤

    add_definitions(-DGOOGLE_CUDA=1 -DTF_EXTRA_CUDA_CAPABILITIES=3.5,3.7,5.2,6.0,6.1,7.0)


修改 D:\TensorFlowB\tensorflow-1.11\tensorflow\contrib\cmake\external\eigen.cmake
option(eigen_PATCH_FILE "Patch file to apply to eigen" OFF)
set(eigen_PATCH_FILE "D:/TensorFlowB/eigen_half.patch")
修改 D:\TensorFlowB\tensorflow-1.11\bazel-tensorflow\tensorflow\workspace.bzl

  tf_http_archive(
      name = "eigen_archive",
      build_file = clean_dep("//third_party:eigen.BUILD"),
      patch_file = clean_dep("//third_party:eigen_half.patch"),
  )

下載 https://github.com/amsokol/tensorflow-windows-build-tutorial/blob/master/eigen_half.patch
置於 D:/TensorFlowB/eigen_half.patch
之後會修改
D:\TensorFlowB\build-1.11\eigen\src\eigen\Eigen\src\Core\arch\CUDA\Half.h
D:\TensorFlowB\build-1.11\external\eigen_archive\Eigen\src\Core\arch\CUDA\Half.h
避免下列錯誤

Error  more than one instance of overloaded function "__hadd" matches the argument list: tf_core_gpu_kernels d:\tensorflowb\build-1.11\external\eigen_archive\eigen\src\Core\arch\CUDA\Half.h 212 

CMake
source: D:/TensorFlowB/tensorflow-1.11/tensorflow/contrib/cmake
build: D:/TensorFlowB/build-1.11
Configure
Visual Studio 14 2015 Win64
Optional toolset: host=x64
SWIG_EXECUTABLE=D:/TensorFlowB/swigwin-3.0.12/swig.exe
tensorflow_BUILD_SHARED_LIB=v
tensorflow_ENABLE_GPU=v
eigen_PATCH_FILE=v

cmake 使用 message() debug

以 Administrator 開啟 Visual Studio 2015
開啟 D:\TensorFlowB\build-1.11\tensorflow.sln
換成 Release 版本
開啟下列專案的屬性設定
_beam_search_ops, _gru_ops, _lstm_ops, _nearest_neighbor_ops, _periodic_resample_op
Property Pages/Configuration Properties/Linker/Input/Additional Dependencies
\pywrap_tensorflow_internal.lib 改為 Release\pywrap_tensorflow_internal.lib

出現 cuda_kernel_helper.h 找不到 cuda_fp16.h

Severity Code Description Project File Line Suppression State
Error C1083 Cannot open include file: 'cuda/include/cuda_fp16.h': No such file or directory _beam_search_ops D:\TensorFlowB\tensorflow-1.11\tensorflow\core\util\cuda_kernel_helper.h 24 

開啟 D:\TensorFlowB\tensorflow-1.11\tensorflow\core\util\cuda_kernel_helper.h
//#include "cuda/include/cuda_fp16.h"
#include "cuda_fp16.h"

出現錯誤

Severity Code Description Project File Line Suppression State
Error LNK2019 unresolved external symbol "class absl::uint128 __cdecl absl::operator%(class absl::uint128,class absl::uint128)" (??Labsl@@YA?AVuint128@0@V10@0@Z) referenced in function "private: void __cdecl absl::str_format_internal::`anonymous namespace'::ConvertedIntInfo::UnsignedToStringRight(class absl::uint128,struct absl::str_format_internal::ConversionChar)" (??$UnsignedToStringRight@Vuint128@absl@@@ConvertedIntInfo@?A0x0d227ec7@str_format_internal@absl@@AEAAXVuint128@3@UConversionChar@23@@Z) tf_tutorials_example_trainer D:\TensorFlowB\build-1.11\arg.obj 1 

Linker/Input/Additional Dependencies 加入
abseil_cpp\src\abseil_cpp_build\absl\numeric\Release\absl_int128.lib

最後是 estimator_python_api 和 tf_python_api 失敗
需要開啟 VS2015 的 Tools/Options/Projects and Solutions/Build and Run
MSBuild project build output verbosity: Normal 才能看到訊息

修改 D:\TensorFlowB\build-1.11\tf_python_api.vcxproj

from "C:\Program Files\CMake\bin\cmake.exe" -E env PYTHONPATH=D:/TensorFlowB/build-1.11/tf_python "" D:/Anaconda3/python.exe D:/TensorFlowB/build-1.11/tf_python/tensorflow/python/tools/api/generator/create_python_api.py --root_init_template=D:/TensorFlowB/build-1.11/tf_python/tensorflow/api_template.__init__.py --apidir=D:/TensorFlowB/build-1.11/tf_python/tensorflow --package=tensorflow.python --apiname=tensorflow D:/TensorFlowB/tensorflow-1.11/api_init_files_list.txt
to "C:\Program Files\CMake\bin\cmake.exe" -E env PYTHONPATH=D:/TensorFlowB/build-1.11/tf_python D:/Anaconda3/python.exe D:/TensorFlowB/build-1.11/tf_python/tensorflow/python/tools/api/generator/create_python_api.py --root_init_template=D:/TensorFlowB/build-1.11/tf_python/tensorflow/api_template.__init__.py --apidir=D:/TensorFlowB/build-1.11/tf_python/tensorflow --package=tensorflow.python --apiname=tensorflow D:/TensorFlowB/tensorflow-1.11/api_init_files_list.txt

copy D:\TensorFlowB\tensorflow-1.11\tensorflow\tools\docs
to D:\TensorFlowB\build-1.11\tf_python\tensorflow\tools\docs
copy D:\TensorFlowB\tensorflow-1.11\tensorflow\python\distribute
to D:\TensorFlowB\build-1.11\tf_python\tensorflow\python\distribute

build tf_python_build_pip_package
產生 D:\TensorFlowB\build-1.11\tf_python\dist\tensorflow_gpu-1.11.0-cp36-cp36m-win_amd64.whl
(tensorflow-1.11) D:\TensorFlowB\build-1.11>pip install tf_python\dist\tensorflow_gpu-1.11.0-cp36-cp36m-win_amd64.whl


使用 bazel
D:\TensorFlowB\tensorflow-1.11>bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
執行很久後可以發現
D:\TensorFlowB\tensorflow-1.11\bazel-out\x64_windows-opt\bin\tensorflow\tools\pip_package\simple_console_for_windows.zip
產生失敗,size=0

D:\TensorFlowB\tensorflow-1.11>cd bazel-out/x64_windows-opt/bin/tensorflow/tools/pip_package
edit simple_console_for_windows.zip-0.params
刪除有 .zip 的每一行

執行
D:\TensorFlowB\tensorflow-1.11\bazel-tensorflow>external\bazel_tools\tools\zip\zipper\zipper.exe vcC bazel-out/x64_windows-opt/bin/tensorflow/tools/pip_package/simple_console_for_windows.zip @bazel-out/x64_windows-opt/bin/tensorflow/tools/pip_package/simple_console_for_windows.zip-0.params
D:\TensorFlowB\tensorflow-1.11\bazel-tensorflow>cd ..
D:\TensorFlowB\tensorflow-1.11>bazel-bin\tensorflow\tools\pip_package\build_pip_package ..\tensorflow_pkg
安裝
(tensorflow-1.11) D:\TensorFlowB>pip install tensorflow_pkg\tensorflow-1.11.0-cp36-cp36m-win_amd64.whl

2018年11月6日 星期二

EAST text detector

git clone https://github.com/argman/EAST EAST
下載 east_icdar2015_resnet_v1_50_rbox.zip 從
https://drive.google.com/open?id=0B3APw5BZJ67ETHNPaU9xUkVoV0U

Open
VS2015 x64 Native Tools Command Prompt
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC>
D:\OpenCV_4\OpenCV OCR\EAST\lanms>activate tensorflow
(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST\lanms>python --version
Python 3.5.5 :: Anaconda, Inc.D:\OpenCV_4\OpenCV OCR\EAST\lanms>cl adaptor.cpp .\include\clipper\clipper.cpp /I .\include /I "D:\Anaconda3\include" /LD /Fe:adaptor.pyd /link/LIBPATH:"D:\Anaconda3\libs"

Edit lanms/__init__.py 註解掉下兩行
#if subprocess.call(['make', '-C', BASE_DIR]) != 0:  # return value
#    raise RuntimeError('Cannot compile lanms: {}'.format(BASE_DIR))

Edit run_demo_server.py
change
parser.add_argument('--checkpoint-path', default=checkpoint_path)
to
parser.add_argument('--checkpoint_path', default=checkpoint_path)
並註解掉
        #ret.update(get_host_info())

期間因為 tensorflow 使用 python3.5
但是程式使用的 python36.lib, 所以移除掉 tensorflow, 重新安裝

查詢已安裝的模組,等待環境完成,重新安裝(沒有使用)
pip freeze>requirements.txt
pip install -r requirements.txt

查詢已安裝的模組
(tensorflow) D:\>conda list

(base) D:\>conda env remove -n tensorflow
(base) D:\>conda create -n tensorflow pip python=3.6
(base) D:\>activate tensorflow
(tensorflow) D:\>pip install --ignore-installed --upgrade tensorflow-gpu
(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>pip install opencv-python
(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>pip install opencv-contrib-python
(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>pip install --ignore-installed --upgrade tensorflow-gpu
(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>pip install scipy
(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>pip install matplotlib
(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>pip install Flask
(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>conda install shapely
(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>python run_demo_server.py --checkpoint_path="..\east_icdar2015_resnet_v1_50_rbox"

2018年7月9日 星期一

CUDA 安裝失敗

CUDA 安裝失敗,通常是由於 Visual Studio Integration 失敗
所以透過自訂安裝,跳過不安裝 Visual Studio Integration, 可以安裝成功
Installer Type 要選擇 exe(local)

而 Visual Studio Integration 的安裝方式如下:
1. 使得可以編譯 CUDA 程式
注意安裝 CUDA 時的路徑,拷貝出 CUDAVisualStudioIntegration 目錄夾
將 D:\CUDAVisualStudioIntegration\extras\visual_studio_integration\MSBuildExtensions
目錄下所有檔案拷貝至
C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\BuildCustomizations
2. 使得 Visual Studio 可以新建 CUDA 專案
將目錄
D:\CUDAVisualStudioIntegration\extras\visual_studio_integration\CudaProjectVsWizards
拷貝至
C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE\Extensions
3. 安裝
D:\CUDAVisualStudioIntegration\NVIDIA_Nsight_Visual_Studio_Edition_Win64_5.4.0.17229.msi

2018年7月3日 星期二

tensorflow audio recognition 之 SpeechActivity.java

分為 record thread 和 recognize thread, 兩個 thread 依靠 recordingBuffer 交換資料
兩者速度不會一致,所以 recognize thread 可能重複 recognize, 也可能漏

short[] recordingBuffer = new short[RECORDING_LENGTH];
int recordingOffset = 0;

private void record() {
  int numberRead = record.read(audioBuffer, 0, audioBuffer.length);
  int maxLength = recordingBuffer.length;
  int newRecordingOffset = recordingOffset + numberRead;
  //int secondCopyLength = Math.max(0, newRecordingOffset - maxLength);
  if (newRecordingOffset > maxLength) {
    secondCopyLength = newRecordingOffset - maxLength;
  } else {
    secondCopyLength = 0;
  }
  int firstCopyLength = numberRead - secondCopyLength;
  System.arraycopy(audioBuffer, 0, recordingBuffer, recordingOffset, firstCopyLength);
  System.arraycopy(audioBuffer, firstCopyLength, recordingBuffer, 0, secondCopyLength);
  recordingOffset = newRecordingOffset % maxLength;
}

private void recognize() {
  int maxLength = recordingBuffer.length;
  int firstCopyLength = maxLength - recordingOffset;
  int secondCopyLength = recordingOffset;
  System.arraycopy(recordingBuffer, recordingOffset, inputBuffer, 0, firstCopyLength);
  System.arraycopy(recordingBuffer, 0, inputBuffer, firstCopyLength, secondCopyLength);
}

Yolo

目錄 data/img

檔案 data/obj.data
classes= 2
train  = data/train.txt
valid  = data/train.txt
names = data/obj.names (相對於執行檔目錄)
backup = backup/

檔案 data/obj.names
air
bird

檔案 data/train.txt
data/img/air1.jpg
data/img/air2.jpg
data/img/air3.jpg

檔案 yolo-obj.cfg
(測試用)
batch=1
subdivisions=1
(訓練用)
batch=64
subdivisions=1, (視記憶體大小修改,記憶體小則使用64)
修改所有 [yolo] 層內的
classes =
修改所有 [yolo] 前一個 [convolutional] 層內的
filters = (classes + 5) * 3

標記
yolo_mark.exe data/img data/train.txt data/obj.names

訓練
darknet.exe detector train data/obj.data yolo-obj.cfg darknet19_448.conv.23
obj.data 內的 backup 指定輸出 weights 存放位置
darknet19_448.conv.23: 其實就是 weights, 要接續中斷的訓練時,則改為新產生的 weights
-dont_show: 不顯示 Loss-Window

檢測訓練結果(IoU, mAP)
darknet.exe detector map data/obj.data yolo-obj.cfg backup\yolo-objj_7000.weights

COCO Yolo v3(4GB GPU): yolov3.cfg, yolov3.weights
COCO Yolo v3 tiny(1GB GPU): yolov3-tiny.cfg, yo.ov3-tiny.weights
COCO Yolo v2(4GB GPU): yolov2.cfg, yolov2.weights
VOC Yolo v2(4GB GPU): yolo-voc.cfg, yolo-voc.weights
COCO Yolo v2 tiny(1GB GPU): yolov2-tiny.cfg, yolov2-tiny.weights
VOC Yolo v2 tiny(1GB GPU): yolov2-tiny-voc.cfg, yolov2-tiny-voc.weights
以上似乎是訓練時的需求,檢測或分類時似乎沒那麼大的需求

darknet.exe 參數
-i <index>, 指定 GPU, 可用 nvidia-smi.exe 查詢
-nogpu, 不使用 GPU
-thresh <val>, 預設為 0.25
-c <num>, OpenCV 影像, 預設為 0
-ext_output, 輸出物件位置
detector test, 相片
detector demo, 影片
detector train, 訓練
detector map, 檢測訓練結果
classifier predict, 分類

./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg
./darknet detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights data/dog.jpg
以上兩個命令一樣

使用命令取得 yolov3-tiny.conv.15
darknet.exe partial cfg/yolov3-tiny.cfg yolov3-tiny.weights yolov3-tiny.conv.15 15

如何增進物件檢測
訓練前:
.cfg 檔內的 random=1
增加 .cfg 檔內的 width, height (須為 32 的倍數)
執行下列命令,重新計算 anchors, 更改 .cfg 檔內的 anchors
darknet.exe detector calc_anchors voc.data -num_of_clusters 9 -width 416 -height 416
小心標註相片內的物件,每一物件都要標註,而且不要標錯
每個物件最好有 2000 以上的影像,包含有不同的大小、角度、光線、背景等
不要被檢出的物件要在相片內,而且不能被標註

訓練時相片和標註檔的對映
darknet.c
int main(int argc, char **argv)
>run_detector(argc, argv);
detector.c
void run_detector(int argc, char **argv)
>train_detector(datacfg, cfg, weights, gpus, ngpus, clear, dont_show);
void train_detector(char *datacfg, char *cfgfile, char *weightfile, int *gpus, int ngpus, int clear, int dont_show)
> pthread_t load_thread = load_data(args);
data.c
pthread_t load_data(load_args args)
>if(pthread_create(&thread, 0, load_threads, ptr)) error("Thread creation failed");
void *load_threads(void *ptr)
>threads[i] = load_data_in_thread(args);
if(pthread_create(&thread, 0, load_thread, ptr)) error("Thread creation failed");
void *load_thread(void *ptr)
>*a.d = load_data_detection(a.n, a.paths, a.m, a.w, a.h, a.c, a.num_boxes, a.classes, a.flip, a.jitter, a.hue, a.saturation, a.exposure, a.small_object);
data load_data_detection(int n, char **paths, int m, int w, int h, int c, int boxes, int classes, int use_flip, float jitter, float hue, float saturation, float exposure, int small_object)
>fill_truth_detection(filename, boxes, d.y.vals[i], classes, flip, dx, dy, 1./sx, 1./sy, small_object, w, h);
void fill_truth_detection(char *path, int num_boxes, float *truth, int classes, int flip, float dx, float dy, float sx, float sy, int small_object, int net_w, int net_h)
>replace_image_to_label(path, labelpath);
utils.c
void replace_image_to_label(char *input_path, char *output_path)

在相片上標註偵測出的物件
image.c
void draw_detections_cv_v3(IplImage* show_img, detection *dets, int num, float thresh, char **names, image **alphabet, int classes, int ext_output)

network.c
將 image 轉成 network
float *network_predict(network net, float *input)
從 network 中取得 detection
detection *get_network_boxes(network *net, int w, int h, float thresh, float hier, int *map, int relative, int *num, int letter)