網頁

2018年7月9日 星期一

CUDA 安裝失敗

CUDA 安裝失敗,通常是由於 Visual Studio Integration 失敗
所以透過自訂安裝,跳過不安裝 Visual Studio Integration, 可以安裝成功
Installer Type 要選擇 exe(local)

而 Visual Studio Integration 的安裝方式如下:
1. 使得可以編譯 CUDA 程式
注意安裝 CUDA 時的路徑,拷貝出 CUDAVisualStudioIntegration 目錄夾
將 D:\CUDAVisualStudioIntegration\extras\visual_studio_integration\MSBuildExtensions
目錄下所有檔案拷貝至
C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\BuildCustomizations
2. 使得 Visual Studio 可以新建 CUDA 專案
將目錄
D:\CUDAVisualStudioIntegration\extras\visual_studio_integration\CudaProjectVsWizards
拷貝至
C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE\Extensions
3. 安裝
D:\CUDAVisualStudioIntegration\NVIDIA_Nsight_Visual_Studio_Edition_Win64_5.4.0.17229.msi

2018年7月3日 星期二

tensorflow audio recognition 之 SpeechActivity.java

分為 record thread 和 recognize thread, 兩個 thread 依靠 recordingBuffer 交換資料
兩者速度不會一致,所以 recognize thread 可能重複 recognize, 也可能漏

short[] recordingBuffer = new short[RECORDING_LENGTH];
int recordingOffset = 0;

private void record() {
  int numberRead = record.read(audioBuffer, 0, audioBuffer.length);
  int maxLength = recordingBuffer.length;
  int newRecordingOffset = recordingOffset + numberRead;
  //int secondCopyLength = Math.max(0, newRecordingOffset - maxLength);
  if (newRecordingOffset > maxLength) {
    secondCopyLength = newRecordingOffset - maxLength;
  } else {
    secondCopyLength = 0;
  }
  int firstCopyLength = numberRead - secondCopyLength;
  System.arraycopy(audioBuffer, 0, recordingBuffer, recordingOffset, firstCopyLength);
  System.arraycopy(audioBuffer, firstCopyLength, recordingBuffer, 0, secondCopyLength);
  recordingOffset = newRecordingOffset % maxLength;
}

private void recognize() {
  int maxLength = recordingBuffer.length;
  int firstCopyLength = maxLength - recordingOffset;
  int secondCopyLength = recordingOffset;
  System.arraycopy(recordingBuffer, recordingOffset, inputBuffer, 0, firstCopyLength);
  System.arraycopy(recordingBuffer, 0, inputBuffer, firstCopyLength, secondCopyLength);
}

Yolo

目錄 data/img

檔案 data/obj.data
classes= 2
train  = data/train.txt
valid  = data/train.txt
names = data/obj.names (相對於執行檔目錄)
backup = backup/

檔案 data/obj.names
air
bird

檔案 data/train.txt
data/img/air1.jpg
data/img/air2.jpg
data/img/air3.jpg

檔案 yolo-obj.cfg
(測試用)
batch=1
subdivisions=1
(訓練用)
batch=64
subdivisions=1, (視記憶體大小修改,記憶體小則使用64)
修改所有 [yolo] 層內的
classes =
修改所有 [yolo] 前一個 [convolutional] 層內的
filters = (classes + 5) * 3

標記
yolo_mark.exe data/img data/train.txt data/obj.names

訓練
darknet.exe detector train data/obj.data yolo-obj.cfg darknet19_448.conv.23
obj.data 內的 backup 指定輸出 weights 存放位置
darknet19_448.conv.23: 其實就是 weights, 要接續中斷的訓練時,則改為新產生的 weights
-dont_show: 不顯示 Loss-Window

檢測訓練結果(IoU, mAP)
darknet.exe detector map data/obj.data yolo-obj.cfg backup\yolo-objj_7000.weights

COCO Yolo v3(4GB GPU): yolov3.cfg, yolov3.weights
COCO Yolo v3 tiny(1GB GPU): yolov3-tiny.cfg, yo.ov3-tiny.weights
COCO Yolo v2(4GB GPU): yolov2.cfg, yolov2.weights
VOC Yolo v2(4GB GPU): yolo-voc.cfg, yolo-voc.weights
COCO Yolo v2 tiny(1GB GPU): yolov2-tiny.cfg, yolov2-tiny.weights
VOC Yolo v2 tiny(1GB GPU): yolov2-tiny-voc.cfg, yolov2-tiny-voc.weights
以上似乎是訓練時的需求,檢測或分類時似乎沒那麼大的需求

darknet.exe 參數
-i <index>, 指定 GPU, 可用 nvidia-smi.exe 查詢
-nogpu, 不使用 GPU
-thresh <val>, 預設為 0.25
-c <num>, OpenCV 影像, 預設為 0
-ext_output, 輸出物件位置
detector test, 相片
detector demo, 影片
detector train, 訓練
detector map, 檢測訓練結果
classifier predict, 分類

./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg
./darknet detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights data/dog.jpg
以上兩個命令一樣

使用命令取得 yolov3-tiny.conv.15
darknet.exe partial cfg/yolov3-tiny.cfg yolov3-tiny.weights yolov3-tiny.conv.15 15

如何增進物件檢測
訓練前:
.cfg 檔內的 random=1
增加 .cfg 檔內的 width, height (須為 32 的倍數)
執行下列命令,重新計算 anchors, 更改 .cfg 檔內的 anchors
darknet.exe detector calc_anchors voc.data -num_of_clusters 9 -width 416 -height 416
小心標註相片內的物件,每一物件都要標註,而且不要標錯
每個物件最好有 2000 以上的影像,包含有不同的大小、角度、光線、背景等
不要被檢出的物件要在相片內,而且不能被標註

訓練時相片和標註檔的對映
darknet.c
int main(int argc, char **argv)
>run_detector(argc, argv);
detector.c
void run_detector(int argc, char **argv)
>train_detector(datacfg, cfg, weights, gpus, ngpus, clear, dont_show);
void train_detector(char *datacfg, char *cfgfile, char *weightfile, int *gpus, int ngpus, int clear, int dont_show)
> pthread_t load_thread = load_data(args);
data.c
pthread_t load_data(load_args args)
>if(pthread_create(&thread, 0, load_threads, ptr)) error("Thread creation failed");
void *load_threads(void *ptr)
>threads[i] = load_data_in_thread(args);
if(pthread_create(&thread, 0, load_thread, ptr)) error("Thread creation failed");
void *load_thread(void *ptr)
>*a.d = load_data_detection(a.n, a.paths, a.m, a.w, a.h, a.c, a.num_boxes, a.classes, a.flip, a.jitter, a.hue, a.saturation, a.exposure, a.small_object);
data load_data_detection(int n, char **paths, int m, int w, int h, int c, int boxes, int classes, int use_flip, float jitter, float hue, float saturation, float exposure, int small_object)
>fill_truth_detection(filename, boxes, d.y.vals[i], classes, flip, dx, dy, 1./sx, 1./sy, small_object, w, h);
void fill_truth_detection(char *path, int num_boxes, float *truth, int classes, int flip, float dx, float dy, float sx, float sy, int small_object, int net_w, int net_h)
>replace_image_to_label(path, labelpath);
utils.c
void replace_image_to_label(char *input_path, char *output_path)

在相片上標註偵測出的物件
image.c
void draw_detections_cv_v3(IplImage* show_img, detection *dets, int num, float thresh, char **names, image **alphabet, int classes, int ext_output)

network.c
將 image 轉成 network
float *network_predict(network net, float *input)
從 network 中取得 detection
detection *get_network_boxes(network *net, int w, int h, float thresh, float hier, int *map, int relative, int *num, int letter)