網頁

2023年11月10日 星期五

Fine-tuning Whisper in a Google Colab

參考 https://research.google.com/colaboratory/local-runtimes.html
使得 colab 可以用 local 的 cpu 和 gpu
文件說可以使用 docker 或 jupyter
但只有 jupyter 成功

建立 huggingface 帳號,並且登入
開啟 https://huggingface.co/settings/tokens
按下 New token
選擇 Role(有 read 和 write)
按下 copy
在執行下列命令時,貼上 token

$ huggingface-cli login

    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) Y
Token is valid (permission: read).
$ huggingface-cli login

    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) Y
Token is valid (permission: write).

2023年10月26日 星期四

Ubuntu 之 錄音 與 撥放, 使用 arecord aplay ffmpeg

列出錄音設備
$ arecord -l
**** List of CAPTURE Hardware Devices ****
card 0: PCH [HDA Intel PCH], device 0: ALCS1200A Analog [ALCS1200A Analog]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 0: PCH [HDA Intel PCH], device 2: ALCS1200A Alt Analog [ALCS1200A Alt Analog]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
指定 card:0, device 0, 使用16000採樣,錄音10秒
$ arecord -Dhw:0,0 -d 10 -f cd -r 16000 -c 2 -t wav test.wav
Recording WAVE 'test.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Stereo
Warning: rate is not accurate (requested = 16000Hz, got = 44100Hz)
         please, try the plug plugin 
$ arecord -D mono --device=hw:0,0 -d 10 -f cd -r 16000 -c 2 -t wav test.wav
Recording WAVE 'test.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Stereo
Warning: rate is not accurate (requested = 16000Hz, got = 44100Hz)
         please, try the plug plugin 

列出播放設備
$ aplay -l
**** List of PLAYBACK Hardware Devices ****
card 0: PCH [HDA Intel PCH], device 0: ALCS1200A Analog [ALCS1200A Analog]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 0: PCH [HDA Intel PCH], device 1: ALCS1200A Digital [ALCS1200A Digital]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 0: PCH [HDA Intel PCH], device 3: HDMI 0 [HDMI 0]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
從錄音設備直接撥出
$ arecord -Dhw:0,0 -d 10 -f cd -r 16000 | aplay -Dhw:0,0 -r 16000
Recording WAVE 'stdin' : Signed 16 bit Little Endian, Rate 16000 Hz, Stereo
Warning: rate is not accurate (requested = 16000Hz, got = 44100Hz)
         please, try the plug plugin 
Playing WAVE 'stdin' : Signed 16 bit Little Endian, Rate 44100 Hz, Stereo


安裝 ffmpeg
$ sudo add-apt-repository ppa:savoury1/ffmpeg4  
$ sudo apt-cache policy ffmpeg  
$ sudo apt-get install ffmpeg  
$ ffmpeg -version  
$ sudo add-apt-repository --remove ppa:savoury1/ffmpeg4  

參考 https://ffmpeg.org/ffmpeg-devices.html
列出裝置
$ ffmpeg -devices
Devices:
 D. = Demuxing supported
 .E = Muxing supported
 --
 DE alsa            ALSA audio output
  E caca            caca (color ASCII art) output device
 DE fbdev           Linux framebuffer
 D  iec61883        libiec61883 (new DV1394) A/V input device
 D  jack            JACK Audio Connection Kit
 D  kmsgrab         KMS screen capture
 D  lavfi           Libavfilter virtual input device

$ cat /proc/asound/cards
 0 [PCH            ]: HDA-Intel - HDA Intel PCH
                      HDA Intel PCH at 0xa7230000 irq 148
 1 [NVidia         ]: HDA-Intel - HDA NVidia
                      HDA NVidia at 0xa5080000 irq 17

錄音
$ ffmpeg -f alsa -i hw:0 test.wav

2023年10月11日 星期三

安裝 Ubuntu 20.04

$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install ssh

$ sudo vi /etc/fstab
#中間的空格要使用 tab
ip:/share_folder /mnt/mount_folder nfs defaults,bg 0 0
$ cd /mnt
$ sudo mkdir QNAP_A QNAP_B
$ sudo mount -a

$ mkdir -p ~/.config/autostart
$ cp /usr/share/applications/vino-server.desktop ~/.config/autostart/
$ gsettings set org.gnome.Vino prompt-enabled false
$ gsettings set org.gnome.Vino require-encryption false
$ gsettings set org.gnome.Vino authentication-methods "['vnc']"
$ gsettings set org.gnome.Vino vnc-password $(echo -n 'ChangeToYourPasswd'|base64)
$ sudo vi /etc/gdm3/custom.conf
WaylandEnable=false
AutomaticLoginEnable = true
AutomaticLogin = UserLoginName
$ vi vino.sh
DISP=`ps -u $(id -u) -o pid= | \
    while read pid; do
        cat /proc/$pid/environ 2>/dev/null | tr '\0' '\n' | grep '^DISPLAY=:'
    done | grep -o ':[0-9]*' | sort -u`
echo $DISP
/usr/lib/vino/vino-server --display=$DISP
$ chmod +x vino.sh

依據 使用最新版本的 driver
CUDA Toolkit and Corresponding Driver Versions
https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
dGPU Setup for Ubuntu
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_Quickstart.html
Ubuntu 20.04
GStreamer 1.16.3
NVIDIA driver 525.125.06
CUDA 12.1
TensorRT 8.5.3.1

$ sudo ubuntu-drivers devices
$ sudo apt-get install nvidia-driver-535
$ sudo reboot
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb
$ sudo dpkg -i cuda-keyring_1.1-1_all.deb
$ sudo apt-get update
$ sudo apt-get -y install cuda-12-2
$ sudo apt-get -y install cuda-12-1

安裝 cuDNN
參考 https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html
到 2.2. Downloading cuDNN for Linux(https://developer.nvidia.com/cudnn)
下載 Local Install for Ubuntu18.04 x86_64(Deb)
$ sudo apt-get install zlib1g
$ sudo dpkg -i cudnn-local-repo-ubuntu2004-8.9.5.29_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2004-8.9.5.29/cudnn-local-98C06E99-keyring.gpg /usr/share/keyrings/
$ sudo apt-get update
$ apt list -a libcudnn8
$ sudo apt-get install libcudnn8=8.9.5.29-1+cuda12.2
$ sudo apt-get install libcudnn8-dev=8.9.5.29-1+cuda12.2
$ sudo apt-get install libcudnn8-samples=8.9.5.29-1+cuda12.2
$ update-alternatives --display libcudnn
$ cp -r /usr/src/cudnn_samples_v8/ .
$ cd cudnn_samples_v8/mnistCUDNN/
$ sudo apt-get install libfreeimage3 libfreeimage-dev
$ make clean && make
$ ./mnistCUDNN
...
Test passed!

安裝 TensorRT 8.6.1
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-861/install-guide/index.html
$ sudo apt-get install python3-pip
$ sudo apt-get install python3.8.venv
$ python3 -m venv envs/tensorrt
$ source envs/tensorrt/bin/activate
$ pip3 install --upgrade pip
$ python3 -m pip install --extra-index-url https://pypi.nvidia.com tensorrt_libs
$ python3 -m pip install --extra-index-url https://pypi.nvidia.com tensorrt_bindings
$ python3 -m pip install --upgrade tensorrt
$ python3 -m pip install --upgrade tensorrt_lean
$ python3 -m pip install --upgrade tensorrt_dispatch
測試  TensorRT Python
$ python3
>>> import tensorrt
>>> print(tensorrt.__version__)
>>> assert tensorrt.Builder(tensorrt.Logger())
>>> import tensorrt_lean as trt
>>> print(trt.__version__)
>>> assert trt.Builder(trt.Logger())
>>> import tensorrt_dispatch as trt
>>> print(trt.__version__)
>>> assert trt.Builder(trt.Logger())

連結 https://developer.nvidia.com/tensorrt 按 GET STARTED
連結 https://developer.nvidia.com/tensorrt-getting-started 按 DOWNLOAD NOW
選擇 TensorRT 8
選擇 TensorRT 8.6 GA
TensorRT 8.6 GA for Ubuntu 20.04 and CUDA 12.0 and 12.1 DEB local repo Package
$ sudo dpkg -i nv-tensorrt-local-repo-ubuntu2004-8.6.1-cuda-12.0_1.0-1_amd64.deb
$ sudo cp /var/nv-tensorrt-local-repo-ubuntu2004-8.6.1-cuda-12.0/nv-tensorrt-local-9A1EDFBA-keyring.gpg /usr/share/keyrings/
$ sudo apt-get update
$ sudo apt-get install tensorrt
$ sudo apt-get install libnvinfer-lean8
$ sudo apt-get install libnvinfer-vc-plugin8
$ sudo apt-get install python3-libnvinfer-lean
$ sudo apt-get install python3-libnvinfer-dispatch
$ python3 -m pip install numpy
$ sudo apt-get install python3-libnvinfer-dev
$ python3 -m pip install protobuf
$ sudo apt-get install uff-converter-tf
$ python3 -m pip install numpy onnx
$ sudo apt-get install onnx-graphsurgeon
確認安裝
$ dpkg-query -W tensorrt
tensorrt        8.6.1.6-1+cuda12.0

安裝 DeepStream
$ sudo apt-get install libssl1.1
$ sudo apt-get install libgstreamer1.0-0
$ sudo apt-get install gstreamer1.0-tools
$ sudo apt-get install gstreamer1.0-plugins-good
$ sudo apt-get install gstreamer1.0-plugins-bad
$ sudo apt-get install gstreamer1.0-plugins-ugly
$ sudo apt-get install gstreamer1.0-libav
$ sudo apt-get install libgstreamer-plugins-base1.0-dev
$ sudo apt-get install libgstrtspserver-1.0-0
$ sudo apt-get install libjansson4
$ sudo apt-get install libyaml-cpp-dev
$ sudo apt-get install libjsoncpp-dev
$ sudo apt-get install protobuf-compiler
$ sudo apt-get install gcc
$ sudo apt-get install make
$ sudo apt-get install git
$ sudo apt-get install python3

$ git clone https://github.com/edenhill/librdkafka.git
$ cd librdkafka
$ git reset --hard 7101c2310341ab3f4675fc565f64f0967e135a6a
$ ./configure
$ make
$ sudo make install
$ sudo mkdir -p /opt/nvidia/deepstream/deepstream-6.3/lib
$ sudo cp /usr/local/lib/librdkafka* /opt/nvidia/deepstream/deepstream-6.3/lib

https://catalog.ngc.nvidia.com/orgs/nvidia/resources/deepstream
下載 deepstream-6.3_6.3.0-1_arm64.deb
$ wget --content-disposition 'https://api.ngc.nvidia.com/v2/resources/nvidia/deepstream/versions/6.3/files/deepstream-6.3_6.3.0-1_amd64.deb'
$ sudo apt-get install ./deepstream-6.3_6.3.0-1_amd64.deb
$ cd /opt/nvidia/deepstream/deepstream-6.3/samples/configs/deepstream-app
$ deepstream-app -c source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt 

安裝 Docker
https://docs.docker.com/engine/install/ubuntu/
$ sudo apt-get update
$ sudo apt-get install ca-certificates curl gnupg
$ sudo install -m 0755 -d /etc/apt/keyrings
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
$ sudo chmod a+r /etc/apt/keyrings/docker.gpg
$ echo \
  "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
$ sudo docker run --rm hello-world

安裝 NVIDIA Container Toolkit
參考 https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list \
  && \
    sudo apt-get update
$ sudo apt-get install -y nvidia-container-toolkit
$ sudo nvidia-ctk runtime configure --runtime=docker
$ sudo systemctl restart docker
$ sudo groupadd docker
$ sudo usermod -a -G docker $USER
$ docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

安裝 NGC CLI
參考 https://ngc.nvidia.com/setup/installers/cli
$ wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.30.1/files/ngccli_linux.zip -O ngccli_linux.zip && unzip ngccli_linux.zip
$ find ngc-cli/ -type f -exec md5sum {} + | LC_ALL=C sort | md5sum -c ngc-cli.md5
$ sha256sum ngccli_linux.zip
$ chmod u+x ngc-cli/ngc
$ echo "export PATH=\"\$PATH:$(pwd)/ngc-cli\"" >> ~/.bash_profile && source ~/.bash_profile
$ ngc config set
# 直接 enter 即可
$ docker login nvcr.io
Username: $oauthtoken
Password: <Your API Key>

用 Docker 開發 DeepStream 6.3
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_docker_containers.html
$ sudo docker pull nvcr.io/nvidia/deepstream:6.3-gc-triton-devel
$ export DISPLAY=:0
$ xhost +
$ docker run -it --rm --net=host --gpus all -e DISPLAY=$DISPLAY --device /dev/snd -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/deepstream:6.3-gc-triton-devel
# cd samples/configs/deepstream-app
# deepstream-app -c source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt 
# exit
$ sudo docker ps -a
$ sudo docker stop container_id
$ sudo docker rm container_id
$ sudo docker image list
$ sudo docker image rm image_id

2023年9月28日 星期四

Ubuntu 安裝 exiv2, 開發 exif 相關程式

參考 https://github.com/Exiv2/exiv2/tree/main
參考 https://github.com/Exiv2/exiv2/tree/main#PlatformLinux

download cmake from https://cmake.org/download/
$ tar xvfz cmake-3.27.6.tar.gz
$ cd cmake-3.27.6
$ sudo apt-get install libssl-dev
$ ./bootstrap
$ make -j4
$ sudo make install

$ git clone https://github.com/Exiv2/exiv2.git
$ cd exiv2
$ sudo apt-get install --yes build-essential ccache clang cmake git google-mock libbrotli-dev libcurl4-openssl-dev libexpat1-dev libgtest-dev libinih-dev libssh-dev libxml2-utils libz-dev python3 zlib1g-dev
$ cmake -S . -B build -G "Unix Makefiles"
$ cmake --build build
$ ctest --test-dir build --verbose
$ sudo cmake --install build

$ g++ -o exifprint exifprint.cpp -lexiv2

2023年8月28日 星期一

使用 sudo 不輸入密碼

增加可以 reboot 的 myuser

$ sudo deluser myuser
$ adduser myuser
$ sudo gpasswd -a myuser sudo
$ echo "myuser ALL = NOPASSWD: /usr/sbin/reboot" | sudo tee /etc/sudoers.d/60_myuser
$ sudo chmod 0440 /etc/sudoers.d/60_myuser

若不幸輸入錯字,會產生如下錯誤
>>> /etc/sudoers: syntax error near line 24 <<<
sudo: parse error in /etc/sudoers near line 24
sudo: no valid sudoers sources found, quitting
sudo: unable to initialize policy plugin

使用下列方法修復
$ pkexec visudo
What now? 會停在此處,別怕按下 Enter
Options are:
  (e)dit sudoers file again
  e(x)it without saving changes to sudoers file
  (Q)uit and save changes to sudoers file (DANGER!)

使用 ssh 不用輸入密碼

hostA$ ssh-keygen
hostA$ ssh-copy-id "user@hostB -p 22"
hostA$ ssh user@hostB "command.sh arg1 arg2"

2023年7月18日 星期二

gstreamer fpsdisplaysink videorate

$ export URI=rtsp://root:A1234567@192.168.112.202:554/live1s1.sdp
$ export GST_DEBUG=fpsdisplaysink:5

利用 videotestsrc 測試 fpsdisplaysink
$ gst-launch-1.0 videotestsrc ! 'video/x-raw,width=1280,height=720,framerate=60/1' ! videoconvert ! fpsdisplaysink text-overlay=true

fpsdisplaysink 不使用 text-overlay
$ gst-launch-1.0 rtspsrc location=$URI protocols=tcp+udp ! application/x-rtp, media=video ! decodebin  ! nvvideoconvert ! nvegltransform ! fpsdisplaysink text-overlay=0 video-sink=nveglglessink

fpsdisplaysink 使用 text-overlay
$ gst-launch-1.0 rtspsrc location=$URI protocols=tcp+udp ! application/x-rtp, media=video ! decodebin  ! nvvideoconvert ! fpsdisplaysink text-overlay=1 video-sink=autovideosink

利用 videorate 設定 framerate
$ gst-launch-1.0 rtspsrc location=$URI protocols=tcp+udp ! application/x-rtp, media=video ! decodebin ! nvvideoconvert ! videorate ! video/x-raw,framerate=60/1 ! nvvideoconvert ! fpsdisplaysink text-overlay=1 video-sink=autovideosink

加入 rtpjitterbuffer,  但好像沒用
$ gst-launch-1.0 rtspsrc location=$URI protocols=tcp+udp ! application/x-rtp, media=video ! rtpjitterbuffer latency=0 ! decodebin  ! nvvideoconvert ! fpsdisplaysink text-overlay=1 video-sink=autovideosink

不顯示, 但從 log 中可查看出 fps
$ gst-launch-1.0 rtspsrc location=$URI protocols=tcp+udp ! application/x-rtp, media=video ! decodebin  ! nvvideoconvert ! fpsdisplaysink text-overlay=0 video-sink=fakesink
輸出
:00:02.590692019 1665816 0xffff6001d700 DEBUG         fpsdisplaysink fpsdisplaysink.c:372:display_current_fps:<fpsdisplaysink0> Updated max-fps to 1.102534
0:00:02.590778644 1665816 0xffff6001d700 DEBUG         fpsdisplaysink fpsdisplaysink.c:376:display_current_fps:<fpsdisplaysink0> Updated min-fps to 1.102534


$ gst-launch-1.0 rtspsrc location=$URI protocols=tcp+udp ! application/x-rtp, media=video ! decodebin  ! nvvideoconvert ! videorate ! video/x-raw,framerate=60/1 ! nvvideoconvert ! fpsdisplaysink text-overlay=0 video-sink=fakesink

2023年7月14日 星期五

ubuntu 多網卡之 default route

$ cd /etc/NetworkManager/system-connections/
編輯相對網卡的檔案, 在 [ipv4] 下, 加入 route
$ sudo vi 'Wired connection 1.nmconnection'
[ipv4]
route1=0.0.0.0/0,192.168.0.254,1

2023年7月12日 星期三

install pytorch in ubuntu


$ python3 -m venv pytorch
$ source pytorch/bin/activate
$ pip3 install --upgrade --no-cache-dir pip
$ sudo update-alternatives --config cuda
$ pip3 install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio==0.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

2023年7月5日 星期三

DeepStream 之 nvinfer(primary mode) 執行 classifier, 使用在 ROI 上

試了很久, 無法利用 nvdspreprocess 的 ROI 放在 nvinfer 之前
發現直接使用 nvvideoconvert 的 ROI 可以正常運作

參數設定 src-crop="left:top:width:height", 如
g_object_set(G_OBJECT(pre_proc), "src-crop", "50:0:320:240", NULL);


2023年7月3日 星期一

YOLOv8 and TensorRT

參考 YOLOv8 GitHub 官網

1. 下載 DeepStream-Yolo, Ultralytics YOLOv8
git clone https://github.com/marcoslucianops/DeepStream-Yolo.git
git clone https://github.com/ultralytics/ultralytics.git /mnt/Data/DeepStream/DeepStream-Yolo/ultralytics

2. 建立 deepstream_yolo docker container
docker_run.sh
xhost +
docker run --name='deepstream_yolo' --gpus all -it --net=host --privileged \
  -v /tmp/.X11-unix:/tmp/.X11-unix \
  -v /etc/localtime:/etc/localtime \
  -v /mnt/Data/DeepStream/DeepStream-Yolo/DeepStream-Yolo:/home/DeepStream-Yolo \
  -v /mnt/Data/DeepStream/DeepStream-Yolo/ultralytics:/home/ultralytics \
  -v /mnt/Data/DeepStream/DeepStream-Yolo/read_me:/home/read_me \
  -v /mnt/Data/DeepStream/DeepStream-Yolo/datasets:/home/datasets \
  -v /mnt/CT1000SSD/ImageData/Light:/home/Light \
  -e DISPLAY=$DISPLAY \
  -w /home/read_me \
  nvcr.io/nvidia/deepstream:6.2-devel
  
3. 在 Docker 內, 安裝 DeepStream-Yolo
apt-get install build-essential
/opt/nvidia/deepstream/deepstream/user_additional_install.sh
cd /home/DeepStream-Yolo
CUDA_VER=11.8 make -C nvdsinfer_custom_impl_Yolo

4. 在 Docker 內, 安裝 Ultralytics YOLOv8
#python3 -m pip install --upgrade pip
pip3 install --upgrade pip
pip3 install protobuf numpy
cd /home/ultralytics
#pip install -e .
pip3 install -r requirements.txt
python3 setup.py install
pip3 install onnx onnxsim onnxruntime

5. 在 Docker 內, 下載,轉換,測試 yolov8s.pt, yolov8s.pt 模型
cd /home/ultralytics
cp /home/DeepStream-Yolo/utils/export_yoloV8.py /home/ultralytics
wget https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s.pt
wget https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt
python3 export_yoloV8.py -w yolov8s.pt --dynamic
python3 export_yoloV8.py -w yolov8n.pt --dynamic
cp yolov8s.onnx labels.txt /home/DeepStream-Yolo
cp yolov8n.onnx labels.txt /home/DeepStream-Yolo

6. 移除 deepstream_yolo container
$ docker container rm deepstream_yolo

7. 重新進入 Docker
docker_attach.sh
xhost +
docker start deepstream_yolo
docker attach deepstream_yolo

8. 轉換模型格式為 onnx
yolov8n.py
from ultralytics import YOLO

# Load a model
model = YOLO("yolov8n.yaml")  # build a new model from scratch
model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)

# Use the model
model.train(data="coco128.yaml", epochs=3)  # train the model
metrics = model.val()  # evaluate model performance on the validation set
results = model("https://ultralytics.com/images/bus.jpg")  # predict on an image
path = model.export(format="onnx")  # export the model to ONNX format

執行 python3 yolov8n.py 出現下列錯誤
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
修正方式
$ sudo systemctl stop docker
取得 container id
$ docker inspect deepstream_yolo | grep Id
"Id": "???????"
編輯 container 的 ShmSize
$ sudo vi /var/lib/docker/containers/your_container_id/hostconfig.json
"ShmSize":8589934592
$ sudo systemctl restart docker
$ ./docker_attach.sh

9. 在 DeepStream 中測試 onnx 模型
# cd /home/DeepStream-Yolo

# vi config_infer_primary_yoloV8.txt
onnx-file=yolov8s.onnx
onnx-file=yolov8n.onnx

# vi deepstream_app_config.txt
uri=file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4
uri=rtsp://root:A1234567@192.168.0.107:554/live1s1.sdp
live-source=0
live-source=1
config-file=config_infer_primary.txt
config-file=config_infer_primary_yoloV8.txt
file-loop=0
file-loop=1

# deepstream-app -c deepstream_app_config.txt

10. 準備自己的圖形資料, PASCAL VOC(LabelImg 產生的 xml) 格式轉換成 txt
prepare_detect.py
import cv2
import os
import random
import re
import xml.etree.ElementTree as ET

import numpy as np

LIGHT_CLASSES_LIST = [
    'forward_right',
    'others',
    'red',
    'red_left',
    'yellow',
    ]
        
def save_false_positives(img_org, iName, xName, tag, classIdx, 
        clip_x0, clip_y0, clip_x1, clip_y1):
    img_new = img_org[clip_y0:clip_y1, clip_x0:clip_x1]
    fPath, fName = os.path.split(iName)
    fName, fExt = os.path.splitext(fName)
    fName = fName + tag + fExt
    rndPaths = ['train', 'val', 'test']
    rndPath = random.choices(rndPaths, weights=(8,1,1))[0]
    iName = os.path.join('/home/datasets/Light/images', rndPath, fName)
    cv2.imwrite(iName, img_new)
        
def convert_box(size, box):
    dw, dh = 1. / size[0], 1. / size[1]
    x, y, w, h = (box[0] + box[1]) / 2.0 - 1, (box[2] + box[3]) / 2.0 - 1, box[1] - box[0], box[3] - box[2]
    return x * dw, y * dh, w * dw, h * dh
          
def save_file(img_org, iName, xName, tag, classIdx, 
        p0x, p0y, p1x, p1y, p2x, p2y, p3x, p3y, 
        img_w, img_h, xmin, ymin, xmax, ymax,
        clip_x0, clip_y0, clip_x1, clip_y1):
    img_new = img_org[clip_y0:clip_y1, clip_x0:clip_x1]
    fPath, fName = os.path.split(iName)
    fName, fExt = os.path.splitext(fName)
    fName = fName + tag + fExt
    rndPaths = ['train', 'val', 'test']
    rndPath = random.choices(rndPaths, weights=(8,1,1))[0]
    iName = os.path.join('/home/datasets/Light/images', rndPath, fName)
    cv2.imwrite(iName, img_new)
    
    w = clip_x1 - clip_x0
    h = clip_y1 - clip_y0
    xmin = xmin - clip_x0
    ymin = ymin - clip_y0
    xmax = xmax - clip_x0
    ymax = ymax - clip_y0
    bb = convert_box((w, h), (xmin, xmax, ymin, ymax))
    fPath, fName = os.path.split(xName)
    fName, fExt = os.path.splitext(fName)
    fName = fName + tag + '.txt'
    tName = os.path.join('/home/datasets/Light/labels', rndPath, fName)
    with open(tName, 'w') as f:
        f.write(" ".join([str(a) for a in (classIdx, *bb)]) + '\n')
        
def gen_img_yolo(iName, xName):
    tree = ET.parse(open(xName))
    root = tree.getroot()
    img_w = int(root.find('size').find('width').text)
    img_h = int(root.find('size').find('height').text)
    for idx, object in enumerate(root.findall('object')):
        name = object.find('name').text
        classIdx = LIGHT_CLASSES_LIST.index(name)
        #print(classIdx, name)
        bndbox = object.find('bndbox')
        p0x = int(bndbox.find('p0x').text)
        p0y = int(bndbox.find('p0y').text)
        p1x = int(bndbox.find('p1x').text)
        p1y = int(bndbox.find('p1y').text)
        p2x = int(bndbox.find('p2x').text)
        p2y = int(bndbox.find('p2y').text)
        p3x = int(bndbox.find('p3x').text)
        p3y = int(bndbox.find('p3y').text)
        xmin = int(bndbox.find('xmin').text)
        ymin = int(bndbox.find('ymin').text)
        xmax = int(bndbox.find('xmax').text)
        ymax = int(bndbox.find('ymax').text)
        if xmin != p0x or xmin != p3x or ymin != p0y or ymin != p1y or \
                xmax != p1x or xmax != p2x or ymax != p2y or ymax != p3y:
            print('error:bndbox', xName)
            exit()
        if idx > 0:
            print('error:object', xName)
            exit()
    img_org = cv2.imread(iName)
    if img_org.shape[0] != img_h or img_org.shape[1] != img_w:
        print(img_org.shape, (img_h, img_w))
        exit()
    img = np.copy(img_org)

    clip_x0 = random.randrange(0, int(xmin*0.5))
    clip_y0 = random.randrange(0, int(ymin*0.5))
    clip_x1 = random.randrange(int(xmax + (img_w-xmax)*0.5), img_w+1)
    clip_y1 = random.randrange(int(ymax + (img_h-ymax)*0.5), img_h+1)
    save_file(img_org, iName, xName, '', classIdx, 
            p0x, p0y, p1x, p1y, p2x, p2y, p3x, p3y, 
            img_w, img_h, xmin, ymin, xmax, ymax,
            clip_x0, clip_y0, clip_x1, clip_y1)
    ratio = (xmax - xmin) / img_w
    if ratio < 0.3:
        clip_x0 = random.randrange(int(xmin*0.3), int(xmin*0.8))
        clip_y0 = random.randrange(int(ymin*0.3), int(ymin*0.8))
        clip_x1 = random.randrange(int(xmax + (img_w-xmax)*0.2), int(xmax + (img_w-xmax)*0.7))
        clip_y1 = random.randrange(int(ymax + (img_h-ymax)*0.2), int(ymax + (img_h-ymax)*0.7))
        save_file(img_org, iName, xName, '_a', classIdx, 
                p0x, p0y, p1x, p1y, p2x, p2y, p3x, p3y, 
                img_w, img_h, xmin, ymin, xmax, ymax,
                clip_x0, clip_y0, clip_x1, clip_y1)
        clip_x0 = random.randrange(int(xmin*0.5), int(xmin*0.9))
        clip_y0 = random.randrange(int(ymin*0.5), int(ymin*0.9))
        clip_x1 = random.randrange(int(xmax + (img_w-xmax)*0.1), int(xmax + (img_w-xmax)*0.5))
        clip_y1 = random.randrange(int(ymax + (img_h-ymax)*0.1), int(ymax + (img_h-ymax)*0.5))
        save_file(img_org, iName, xName, '_b', classIdx, 
                p0x, p0y, p1x, p1y, p2x, p2y, p3x, p3y, 
                img_w, img_h, xmin, ymin, xmax, ymax,
                clip_x0, clip_y0, clip_x1, clip_y1)
        if xmin > (img_w - xmax):
            if ymin > (img_h - ymax):
                clip_x0 = random.randrange(0, int(xmin*0.8))
                clip_y0 = random.randrange(0, int(ymin*0.8))
                clip_x1 = random.randrange(int(xmin), int(xmin+(xmax-xmin)*0.8))
                clip_y1 = random.randrange(int(ymin), int(ymin+(ymax-ymin)*0.8))
                root.remove(object)
                save_false_positives(img_org, iName, xName, '_f0', classIdx, 
                        clip_x0, clip_y0, clip_x1, clip_y1)
            else:
                clip_x0 = random.randrange(0, int(xmin*0.8))
                clip_y0 = random.randrange(int(ymin+(ymax-ymin)*0.2), int(ymax))
                clip_x1 = random.randrange(int(xmin), int(xmin + (xmax-xmin)*0.8))
                clip_y1 = random.randrange(int(ymax+(img_h-ymax)*0.2), img_h)
                root.remove(object)
                save_false_positives(img_org, iName, xName, '_f1', classIdx, 
                        clip_x0, clip_y0, clip_x1, clip_y1)
        else:
            if ymin > (img_h - ymax):
                clip_x0 = random.randrange(int(xmin+(xmax-xmin)*0.2), int(xmax))
                clip_y0 = random.randrange(0, int(ymin*0.8))
                clip_x1 = random.randrange(int(xmax + (img_w-xmax)*0.2), img_w)
                clip_y1 = random.randrange(int(ymin), int(ymin+(ymax-ymin)*0.8))
                root.remove(object)
                save_false_positives(img_org, iName, xName, '_f2', classIdx, 
                        clip_x0, clip_y0, clip_x1, clip_y1)
            else:
                clip_x0 = random.randrange(int(xmin+(xmax-xmin)*0.2), int(xmax))
                clip_y0 = random.randrange(int(ymin+(ymax-ymin)*0.2), int(ymax))
                clip_x1 = random.randrange(int(xmax + (img_w-xmax)*0.2), img_w)
                clip_y1 = random.randrange(int(ymax+(img_h-ymax)*0.2), img_h)
                root.remove(object)
                save_false_positives(img_org, iName, xName, '_f3', classIdx, 
                        clip_x0, clip_y0, clip_x1, clip_y1)
    elif ratio < 0.7:
        clip_x0 = random.randrange(int(xmin*0.1), int(xmin*0.7))
        clip_y0 = random.randrange(int(ymin*0.1), int(ymin*0.7))
        clip_x1 = random.randrange(int(xmax + (img_w-xmax)*0.3), int(xmax + (img_w-xmax)*0.9))
        clip_y1 = random.randrange(int(ymax + (img_h-ymax)*0.3), int(ymax + (img_h-ymax)*0.9))
        save_file(img_org, iName, xName, '_c', classIdx, 
                p0x, p0y, p1x, p1y, p2x, p2y, p3x, p3y, 
                img_w, img_h, xmin, ymin, xmax, ymax,
                clip_x0, clip_y0, clip_x1, clip_y1)
        if xmin > (img_w - xmax):
            if ymin > (img_h - ymax):
                clip_x0 = random.randrange(0, int(xmin*0.8))
                clip_y0 = random.randrange(0, int(ymin*0.8))
                clip_x1 = random.randrange(int(xmin), int(xmin+(xmax-xmin)*0.8))
                clip_y1 = random.randrange(int(ymin), int(ymin+(ymax-ymin)*0.8))
                root.remove(object)
                save_false_positives(img_org, iName, xName, '_f4', classIdx, 
                        clip_x0, clip_y0, clip_x1, clip_y1)
            else:
                clip_x0 = random.randrange(0, int(xmin*0.8))
                clip_y0 = random.randrange(int(ymin+(ymax-ymin)*0.2), int(ymax))
                clip_x1 = random.randrange(int(xmin), int(xmin + (xmax-xmin)*0.8))
                clip_y1 = random.randrange(int(ymax+(img_h-ymax)*0.2), img_h)
                root.remove(object)
                save_false_positives(img_org, iName, xName, '_f5', classIdx, 
                        clip_x0, clip_y0, clip_x1, clip_y1)
        else:
            if ymin > (img_h - ymax):
                clip_x0 = random.randrange(int(xmin+(xmax-xmin)*0.2), int(xmax))
                clip_y0 = random.randrange(0, int(ymin*0.8))
                clip_x1 = random.randrange(int(xmax + (img_w-xmax)*0.2), img_w)
                clip_y1 = random.randrange(int(ymin), int(ymin+(ymax-ymin)*0.8))
                root.remove(object)
                save_false_positives(img_org, iName, xName, '_f6', classIdx, 
                        clip_x0, clip_y0, clip_x1, clip_y1)
            else:
                clip_x0 = random.randrange(int(xmin+(xmax-xmin)*0.2), int(xmax))
                clip_y0 = random.randrange(int(ymin+(ymax-ymin)*0.2), int(ymax))
                clip_x1 = random.randrange(int(xmax + (img_w-xmax)*0.2), img_w)
                clip_y1 = random.randrange(int(ymax+(img_h-ymax)*0.2), img_h)
                root.remove(object)
                save_false_positives(img_org, iName, xName, '_f7', classIdx, 
                        clip_x0, clip_y0, clip_x1, clip_y1)
    elif ratio < 1.0:
        pass
    return

def recursive_folder(path):
    files = os.listdir(path)
    files.sort()
    for file in files:
        fullName = os.path.join(path, file)
        if os.path.isfile(fullName):
            fPath, fName = os.path.split(fullName)
            fName, fExt = os.path.splitext(fName)
            if fExt in ('.jpg'):
                xPath = fPath + '.xml'
                xName = fName + '.xml'
                xFName = os.path.join(xPath, xName)
                if os.path.isfile(xFName):
                    gen_img_yolo(fullName, xFName)
                else:
                    print(xFName)
        else:
            recursive_folder(fullName)

def main():
    recursive_folder('/home/Light')

if __name__ == '__main__':
    main()

11. 訓練自己的模型
from ultralytics import YOLO

# Load a model
model = YOLO('yolov8n.pt')  # load a pretrained model (recommended for training)

# Train the model
model.train(data='VOC.yaml', epochs=100, imgsz=640)

12. 查詢 onnx 模型的輸出輸入層
import onnx
model = onnx.load('yolov8n.onnx')
g_in = model.graph.input
g_out = model.graph.output


2023年6月21日 星期三

如何在 python 中使用 tao 產生的 yolov4 模型

參考 Nvidia TAO Computer Vision Sample Workflows 產生 yolov4-tiny 模型
此 模型 與 一般產生 的 模型 不一樣
一般產生的模型 參考 tensorrt_demos 即可在 python 中使用

只能將 
tao yolo_v4_tiny export 出的模型(.etlt)用於 DeepStream
而由 
tao  converter 產生的 trt.engine 不能使用於 DeepStream 也不能在 python 中使用

有說明如何使用 tao 產生的模型在 Triton 伺服器上
在 yolov3_postprocessor.py 中發現 tao 產生的 yolo
已經將輸出的 NMS 處理過, 並將內容置於 
BatchNMS(-1,1): 偵測出的數量
BatchNMS_1(-1,200,4): 座標
BatchNMS_2(-1,200): 信心
BatchNMS_3(-1,200): 類別
輸入的方式也有改變
cv2 讀出的圖 不需 cvtColor, 也不用除以 255.0
只需將 BHWC 轉成 BCHW
img = img.transpose((2, 0, 1)).astype(np.float32)

tao 的執行是在 docker 中,所以很難除錯
發現下列命令,可以直接進入 docker 中,執行 python, 查詢版本環境等
docker run -it --rm --gpus all \
  -v "/mnt/Data/tao/yolo_v4_tiny_1.4.1":"/workspace/tao-experiments" \
  -v "/mnt/Data/TensorRT/tensorrt_demos":"/workspace/tensorrt_demos" \
  -v "/mnt/CT1000SSD/ImageData/Light":"/workspace/Light" \
  nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5 \
  bash

將模型轉換成 TensorRT 除了使用
!tao converter -k $KEY \
                   -p Input,1x3x416x416,8x3x416x416,16x3x416x416 \
                   -e $USER_EXPERIMENT_DIR/export/trt.engine \
                   -t fp32 \
                   $USER_EXPERIMENT_DIR/export/yolov4_cspdarknet_tiny_epoch_$EPOCH.etlt
外, 也可使用
!tao-deploy yolo_v4_tiny gen_trt_engine \
  -m $USER_EXPERIMENT_DIR/export/yolov4_cspdarknet_tiny_epoch_$EPOCH.etlt \
  -e $SPECS_DIR/yolo_v4_tiny_retrain_kitti.txt \
  -k $KEY \
  --data_type fp32 \
  --batch_size 1 \
  --engine_file $USER_EXPERIMENT_DIR/export/yolov4_tao_deplay.trt
但若是要在不同平台上轉換
參考 TAO Converter 下載安裝,並執行轉換
./tao-converter_v4.0.0_trt8.5.1.7 \
  -k nvidia_tlt \
  -p Input,1x3x416x416,2x3x416x416,4x3x416x416 \
  -e yolo_v4_tiny_1.4.1/yolo_v4_tiny/export/yolov4_tao_converter_fp32.engine \
  -t fp32 \
  yolo_v4_tiny_1.4.1/yolo_v4_tiny/export/yolov4_cspdarknet_tiny_epoch_080.etlt

參考 tensorrt_demos 修改 utils/yolo_with_plugins.py, 改名成 triton_yolo_with_plugins.py 如下
"""yolo_with_plugins.py
Implementation of TrtYOLO class with the yolo_layer plugins.
"""
from __future__ import print_function
import ctypes
import numpy as np
import cv2
import tensorrt as trt
import pycuda.driver as cuda

try:
    ctypes.cdll.LoadLibrary('./plugins/libyolo_layer.so')
except OSError as e:
    raise SystemExit('ERROR: failed to load ./plugins/libyolo_layer.so.  '
                     'Did you forget to do a "make" in the "./plugins/" '
                     'subdirectory?') from e

def _preprocess_yolo(img, input_shape, letter_box=False):
    """Preprocess an image before TRT YOLO inferencing.
    # Args
        img: int8 numpy array of shape (img_h, img_w, 3)
        input_shape: a tuple of (H, W)
        letter_box: boolean, specifies whether to keep aspect ratio and
                    create a "letterboxed" image for inference
    # Returns
        preprocessed img: float32 numpy array of shape (3, H, W)
    """
    if letter_box:
        img_h, img_w, _ = img.shape
        new_h, new_w = input_shape[0], input_shape[1]
        offset_h, offset_w = 0, 0
        if (new_w / img_w) <= (new_h / img_h):
            new_h = int(img_h * new_w / img_w)
            offset_h = (input_shape[0] - new_h) // 2
        else:
            new_w = int(img_w * new_h / img_h)
            offset_w = (input_shape[1] - new_w) // 2
        resized = cv2.resize(img, (new_w, new_h))
        img = np.full((input_shape[0], input_shape[1], 3), 127, dtype=np.uint8)
        img[offset_h:(offset_h + new_h), offset_w:(offset_w + new_w), :] = resized
    else:
        img = cv2.resize(img, (input_shape[1], input_shape[0]))

    #img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = img.transpose((2, 0, 1)).astype(np.float32)
    #img /= 255.0
    return img

class HostDeviceMem(object):
    """Simple helper data class that's a little nicer to use than a 2-tuple."""
    def __init__(self, host_mem, device_mem):
        self.host = host_mem
        self.device = device_mem

    def __str__(self):
        return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)

    def __repr__(self):
        return self.__str__()

def get_input_shape(engine):
    """Get input shape of the TensorRT YOLO engine."""
    binding = engine[0]
    assert engine.binding_is_input(binding)
    binding_dims = engine.get_binding_shape(binding)
    if len(binding_dims) == 4:
        return tuple(binding_dims[2:])
    elif len(binding_dims) == 3:
        return tuple(binding_dims[1:])
    else:
        raise ValueError('bad dims of binding %s: %s' % (binding, str(binding_dims)))

def allocate_buffers(engine, context):
    """Allocates all host/device in/out buffers required for an engine."""
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()
    for binding in engine:
        binding_dims = engine.get_binding_shape(binding)
        binding_dtype = engine.get_tensor_dtype(binding)
        binding_format = engine.get_tensor_format_desc(binding)
        binding_loc = engine.get_tensor_location(binding)
        binding_mode = engine.get_tensor_mode(binding)
        binding_shape = engine.get_tensor_shape(binding)
        binding_shape_inference = engine.is_shape_inference_io(binding)
        print('binding_dims:{} {} {}'.format(binding, binding_dims, binding_dtype))
        print('  {}'.format(binding_format))
        print('  {} {} {} {}'.format(binding_loc, binding_mode, binding_shape, binding_shape_inference))
        size = trt.volume(binding_dims)
        if size < 0: size *= -1;
        print('  size:{}'.format(size))
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings.
        bindings.append(int(device_mem))
        # Append to the appropriate list.
        if engine.binding_is_input(binding):
            #binding_pro_shape = engine.get_profile_shape(0, binding)
            #print('  {}'.format(binding_pro_shape))
            if binding_dims[0] == -1:
                alloc_dims = np.copy(binding_dims)
                alloc_dims[0] = 1
                context.set_binding_shape(0, alloc_dims)
            inputs.append(HostDeviceMem(host_mem, device_mem))
        else:
            outputs.append(HostDeviceMem(host_mem, device_mem))
    return inputs, outputs, bindings, stream

def do_inference(context, bindings, inputs, outputs, stream, batch_size=1):
    """do_inference (for TensorRT 6.x or lower)
    This function is generalized for multiple inputs/outputs.
    Inputs and outputs are expected to be lists of HostDeviceMem objects.
    """
    # Transfer input data to the GPU.
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
    # Run inference.
    context.execute_async(batch_size=batch_size,
                          bindings=bindings,
                          stream_handle=stream.handle)
    # Transfer predictions back from the GPU.
    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
    # Synchronize the stream
    stream.synchronize()
    # Return only the host outputs.
    return [out.host for out in outputs]

def do_inference_v2(context, bindings, inputs, outputs, stream):
    """do_inference_v2 (for TensorRT 7.0+)
    This function is generalized for multiple inputs/outputs for full
    dimension networks.
    Inputs and outputs are expected to be lists of HostDeviceMem objects.
    """
    # Transfer input data to the GPU.
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
    # Run inference.
    context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
    # Transfer predictions back from the GPU.
    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
    # Synchronize the stream
    stream.synchronize()
    # Return only the host outputs.
    return [out.host for out in outputs]

class TrtYOLO(object):
    """TrtYOLO class encapsulates things needed to run TRT YOLO."""
    def _load_engine(self):
        TRTbin = 'yolo/%s.trt' % self.model
        TRTbin = self.model
        with open(TRTbin, 'rb') as f, trt.Runtime(self.trt_logger) as runtime:
            return runtime.deserialize_cuda_engine(f.read())

    def __init__(self, model, category_num=80, letter_box=False, cuda_ctx=None):
        """Initialize TensorRT plugins, engine and conetxt."""
        self.model = model
        self.category_num = category_num
        self.letter_box = letter_box
        self.cuda_ctx = cuda_ctx
        if self.cuda_ctx:
            self.cuda_ctx.push()

        self.inference_fn = do_inference if trt.__version__[0] < '7' \
                                         else do_inference_v2
        self.trt_logger = trt.Logger(trt.Logger.INFO)
        # add for errors
        # IPluginCreator not found in Plugin Registry
        # getPluginCreator could not find plugin: BatchedNMSDynamic_TRT version: 1
        # Serialization assertion plan->header.magicTag == rt::kPLAN_MAGIC_TAG failed
        trt.init_libnvinfer_plugins(self.trt_logger, namespace="")
        self.engine = self._load_engine()

        self.input_shape = get_input_shape(self.engine)

        try:
            self.context = self.engine.create_execution_context()
            self.inputs, self.outputs, self.bindings, self.stream = \
                allocate_buffers(self.engine, self.context)
        except Exception as e:
            raise RuntimeError('fail to allocate CUDA resources') from e
        finally:
            if self.cuda_ctx:
                self.cuda_ctx.pop()

    def __del__(self):
        """Free CUDA memories."""
        del self.outputs
        del self.inputs
        del self.stream

    def detect(self, img, letter_box=None):
        """Detect objects in the input image."""
        letter_box = self.letter_box if letter_box is None else letter_box
        img_h, img_w, _ = img.shape
        img_resized = _preprocess_yolo(img, self.input_shape, letter_box)
        #print(img_resized.shape, img_resized.dtype)

        # Set host input to the image. The do_inference() function
        # will copy the input to the GPU before executing.
        self.inputs[0].host = np.ascontiguousarray(img_resized)
        if self.cuda_ctx:
            self.cuda_ctx.push()
        trt_outputs = self.inference_fn(
            context=self.context,
            bindings=self.bindings,
            inputs=self.inputs,
            outputs=self.outputs,
            stream=self.stream)
        if self.cuda_ctx:
            self.cuda_ctx.pop()

        y_pred = [i.reshape(1, -1,)[:1] for i in trt_outputs]
        keep_k, boxes, scores, cls_id = y_pred
        #print(keep_k.shape)
        #print(boxes.shape)
        keep_k[0,0] = 1
        locs = np.empty((0,4), dtype=np.uint)
        cids = np.empty((0,1), dtype=np.uint)
        confs = np.empty((0,1), dtype=np.float32)
        for idx, k in enumerate(keep_k.reshape(-1)):
            mul = np.array([img_w,img_h,img_w,img_h])
            loc = boxes[idx].reshape(-1, 4)[:k] * mul
            loc = loc.astype(np.uint)
            cid = cls_id[idx].reshape(-1, 1)[:k]
            cid = cid.astype(np.uint)
            conf = scores[idx].reshape(-1, 1)[:k]
            locs = np.concatenate((locs, loc), axis=0)
            cids = np.concatenate((cids, cid), axis=0)
            confs = np.concatenate((confs, conf), axis=0)
        #print(locs.shape, cids.shape, confs.shape)
        #print(locs, cids, confs)
        return locs, confs, cids

下列程式使用上列的程式
import cv2
import numpy as np
import tensorrt as trt
import pycuda.autoinit # This is needed for initializing CUDA driver
import pycuda.driver as cuda
from utils.triton_yolo_with_plugins import TrtYOLO

#MODEL_PATH = '/workspace/tao-experiments/yolo_v4_tiny/export/yolov4_tao_convert.engine'
MODEL_PATH = '/workspace/tao-experiments/yolo_v4_tiny/export/yolov4_tao_deplay.trt'
#MODEL_PATH = '/workspace/tao-experiments/yolo_v4_tiny/export/trt.engine'
        
def main():
    trt_yolo = TrtYOLO(MODEL_PATH, 5, True)
    img_org = cv2.imread('bb.jpg')
    img = np.copy(img_org)
    print(img.shape, img.dtype)
    boxes, confs, clss = trt_yolo.detect(img, False)
    print(boxes.shape, confs.shape, clss.shape)
    print(boxes, confs, clss)
    for box, conf, clss in zip(boxes, confs, clss):
        x_min, y_min, x_max, y_max = box[0], box[1], box[2], box[3]
        cv2.rectangle(img, (x_min, y_min), (x_max, y_max), (255, 255, 255), 2)
        print(box, conf, clss)
    cv2.imwrite('aa.jpg', img)
    print('aaa')

if __name__ == '__main__':
    main()

除錯說明
訊息: Serialization assertion plan->header.magicTag == rt::kPLAN_MAGIC_TAG failed
解決: TensorRt 的版本不一致,安裝不同版本,或利用 docker
訊息: IPluginCreator not found in Plugin Registry
訊息: getPluginCreator could not find plugin: BatchedNMSDynamic_TRT version: 1
解決: 需安裝 TensorRT OSS
在 load_engine() 之前加上
trt.init_libnvinfer_plugins(self.trt_logger, namespace="")

2023年6月6日 星期二

在 Jetson 上設定 VNC server

參考 https://developer.nvidia.com/embedded/learn/tutorials/vnc-setup
或 L4T-README/README-vnc.txt

$ cd /usr/lib/systemd/user/graphical-session.target.wants
$ sudo ln -s ../vino-server.service ./.

$ gsettings set org.gnome.Vino prompt-enabled false
$ gsettings set org.gnome.Vino require-encryption false

$ gsettings set org.gnome.Vino authentication-methods "['vnc']"
$ gsettings set org.gnome.Vino vnc-password $(echo -n 'YourPassword'|base64)

$ vi vino.sh
DISP=`ps -u $(id -u) -o pid= | \
    while read pid; do
        cat /proc/$pid/environ 2>/dev/null | tr '\0' '\n' | grep '^DISPLAY=:'
    done | grep -o ':[0-9]*' | sort -u`
echo $DISP
/usr/lib/vino/vino-server --display=$DISP

重新開機