網頁

顯示具有 OCR 標籤的文章。 顯示所有文章
顯示具有 OCR 標籤的文章。 顯示所有文章

2024年9月26日 星期四

MMOCR 學習紀錄

參考 https://mmocr.readthedocs.io/en/dev-1.x/
參考 https://github.com/open-mmlab/mmocr

$ docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:21.05-py3
$ docker run --gpus all -it --name MMOCR nvcr.io/nvidia/pytorch:21.05-py3
$ docker start MMOCR
$ docker attach MMOCR
# <ctrl+p><ctrl+q>
$ docker attach MMOCR
$ docker stop MMOCR
$ docker rm MMOCR
$ docker run --gpus all -it --name MMOCR --shm-size=8G \
  -v /mnt/Data/MMOCR/mmocr:/workspace/mmocr \
  -v /mnt/QNAP_A/ImageData/ICDAR:/mmocr/data \
  nvcr.io/nvidia/pytorch:21.05-py3

# pip install -U openmim
#### 出現下列錯誤
####ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.
####We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.
#### 不用擔心

確認 https://mmocr.readthedocs.io/en/dev-1.x/get_started/install.html 底部的版本資訊
安裝正確版本套件
# mim list
# mim install mmengine==
# mim install mmengine
# mim install mmcv==2.0.1
#### 出現下列錯誤
####    cv.gapi.wip.GStreamerPipeline = cv.gapi_wip_gst_GStreamerPipeline
####AttributeError: partially initialized module 'cv2' has no attribute 'gapi_wip_gst_GStreamerPipeline' (most likely due to a circular import)
#### 解決
#### 因為目前 opencv-python 版本為 4.10.0.84, 降版本
# pip install opencv-python==4.5.1.48
# mim install mmcv==2.0.1
# mim install mmdet==3.1.0
# cd /workspace/mmocr/
# pip install -v -e .
# pip install opencv-python-headless==4.5.1.48
# pip install -r requirements/albu.txt
# pip install -r requirements.txt
# python tools/infer.py demo/images --det DBNet --rec CRNN --print-result \
  --save_pred --save_vis --out-dir='results/' --batch-size=2

2019年1月23日 星期三

tsseract 訓練 在 Unbuntu 上

https://github.com/tesseract-ocr/tesseract/wiki/Compiling
https://github.com/tesseract-ocr/tesseract/wiki/Compiling-%E2%80%93-GitInstallation

sudo apt-get install g++
sudo apt-get install autoconf automake libtool
sudo apt-get install pkg-config
sudo apt-get install libpng-dev
sudo apt-get install libjpeg8-dev
sudo apt-get install libtiff5-dev
sudo apt-get install zlib1g-dev

sudo apt-get install libicu-dev
sudo apt-get install libpango1.0-dev
sudo apt-get install libcairo2-dev

sudo apt-get install libleptonica-dev

sudo apt install git

git clone https://github.com/tesseract-ocr/tesseract.git tesseract-ocr
git clone https://github.com/tesseract-ocr/langdata.git langdata
git clone https://github.com/tesseract-ocr/tessdata_best.git tessdata_best
git clone https://github.com/tesseract-ocr/tessdata_fast.git tessdata_fast
cd tesseract-ocr
./autogen.sh
./configure
make
sudo make install
sudo ldconfig
make training
sudo make training-install

https://github.com/tesseract-ocr/tesseract/wiki/Data-Files

sudo apt install curl
sudo apt install default-jre
sudo apt install openjdk-11-jre-headless
sudo apt install openjdk-8-jre-headless
sudo apt install default-jdk
sudo apt install openjdk-11-jdk-headless
sudo apt install openjdk-8-jdk-headless

cd java
make ScrollView.jar

https://github.com/tesseract-ocr/tesseract/wiki/ViewerDebugging

text2image --find_fonts \
--fonts_dir /usr/share/fonts \
--text ./langdata/plate/plate.
--min_coverage .9  \
--outputbase ./langdata/plate/plate \
|& grep raw \
 | sed -e 's/ :.*/@ \\/g' \
 | sed -e "s/^/  '/" \
 | sed -e "s/@/'/g" >./langdata/plate/fontslist.txt

text2image --font="aakar Medium" \
--fonts_dir /usr/share/fonts \
--text ./langdata/plate/plate.txt \
--min_coverage .9  \
--outputbase ./langdata/plate/plate.aakar_Medium.exp0

text2image --font="Abyssinica SIL" \
--fonts_dir /usr/share/fonts \
--text ./langdata/plate/plate.txt \
--min_coverage .9  \
--outputbase ./langdata/plate/plate.Abyssinica_SIL.exp0

mkdir training
cd training
mkdir -p langdata/eng
準備 langdata/eng/eng.traning_text
拷貝 langdata/radical-stroke.txt
mkdir -p tessdata/configs
拷貝 tessdata/eng.traineddata
拷貝 tessdata/configs/lstm.train

~/ocr/tesseract-ocr/src/training/tesstrain.sh --fonts_dir /usr/share/fonts \
 --lang eng --linedata_only \
 --noextract_font_properties --langdata_dir langdata \
 --tessdata_dir tessdata \
 --output_dir train

若字型不存在 (Could not find font named 'xxx xxx')
編輯 src/training/language-specific.sh 刪除不存在的字型

tif 檔產生到一半停住了, 原因是利用背景處理加速
let rem=counter%par_factor 為零時出錯
修改 src/training/tesstrain_utils.sh, 不要使用背景處理
有以下兩段, 刪除命令後面的 &
phase_I_generate_image()
  generate_font_image "${font}"
phase_E_extract_features()
  run_command tesseract ${img_file} ${img_file%.*} ${box_config} ${config} &
並刪除相關的 let, if, wait

combine_tessdata -e ../tessdata_best/eng.traineddata train/eng.lstm

lstmtraining --model_output model \
 --continue_from train/eng.lstm \
 --traineddata train/eng/eng.traineddata \
 --old_traineddata ../tessdata_best/eng.traineddata \
 --train_listfile train/eng.training_files.txt \
 --max_iterations 3600

lstmtraining --stop_training \
 --continue_from model_checkpoint \
 --traineddata train/eng/eng.traineddata \
 --old_traineddata ../tessdata_best/eng.traineddata \
 --model_output new.traineddata

若要直接使用 tif 和 box
修改 src/training/tesstrain_utils.sh, 固定 tif, box 檔案的目錄
    #TRAINING_DIR=${TMP_DIR}
    TRAINING_DIR=/tmp/images
修改 src/training/tesstrain.sh, 不要自動產生圖形
#phase_I_generate_image 8

2019年1月10日 星期四

Tesseract 訓練, 使用 jTessBoxEditor

這是舊版的訓練方式,Tesseract 4.0 開始了新的 LSTM(AI) 的方式

D:\>java -jar jTessBoxEditor.jar
TIFF/Box Generator
text2image 不要勾
Input 選擇 .txt 文字檔
Output 按 ... 設定輸出位置
英文輸入 eng, 中文輸入 chi_tra
選擇字型
按 Generate 產生 .box 和 .font_properties

tif文面命名格式[lang].[fontname].exp[num].tif
lang:plt
fontname:normal
num:0

手動由 .tif -> .box
D:\>ttesseract.exe plt.normal.exp0.tif plt.normal.exp0 --psm 7 -l eng batch.nochop makebox

手動生成 .font_properties, 內容為 <fontname> <italic> <bold> <fixed> <serif> <fraktur>
echo normal 0 1 1 0 1 >plt.font_properties

由 .box 文件 -> .tr
D:\>tesseract.exe plt.normal.exp0.tif plt.normal.exp0 box.train.stderr
由 .box 文件 -> unicharset
D:\>unicharset_extractor.exe plt.normal.exp0.box
由 font_properties, unicharset, .tr -> shapetable
D:\>shapeclustering.exe -F plt.font_properties -U unicharset plt.normal.exp0.tr
由 font_properties, unicharset, .tr ->lang.unicharset, inttemp, pffmtable
D:\>mftraining.exe -F plt.font_properties -U unicharset -O plt.unicharset plt.normal.exp0.tr
由.tr 文件 -> normproto
D:\>cntraining.exe plt.normal.exp0.tr
重命名 以 lang. 開頭重命名 inttemp, normproto, pffmtable, shapetable
D:\>move inttemp plt.inttemp
D:\>move normproto plt.normproto
D:\>move pffmtable plt.pffmtable
D:\>move shapetable plt.shapetable
合併生成 lang.traineddata文件,在這裏爲 plt.traineddata
D:\>combine_tessdata.exe plt.

2018年12月7日 星期五

EAST Tesseract 效能測試

EAST: An Efficient and Accurate Scene Text Detector

參考 OpenCV OCR and text recognition with Tesseract

發現 opencv 使用硬體加速是
net = cv2.dnn.readNet(args["east"])
net.setPreferableTarget(cv2.dnn.DNN_TARGET_OPENCL);
而 OPENCL 並不是 NVDIA 的 CUDA 是 Intel(GPU)

使用了 tensorflow 的 gpu(CUDA)
效能的卻比較好
640x480 從 400ms 到 340ms

但 Tsseract 沒有加速,只能加速 EAST

2018年11月6日 星期二

OpenCV DNN Sample Text Detection

An Efficient and Accurate Scene Text Detector
有效又精確的文字檢測

參考 EAST text detector
下載 frozen_east_text_detection.pb

samples/dnn/(sample)text_detection

Configuration Properties/Debugging/Command Arguments
--model="D:\OpenCV_4\OpenCV OCR\opencv-text-detection\frozen_east_text_detection.pb" --width=640 --height=480

Configuration Properties/Debugging/Environment
PATH=D:\TensorFlow\OCR\tesseract\win64\bin\Debug;D:\Anaconda3\Library\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin;%PATH%
QT_PLUGIN_PATH=D:\Anaconda3\Library\plugins

Edit text_detection.cpp 使用 OPENCL 加快速度(只能用 Intel, 不能使用 NVIDIA GPU)
    Net net = readNet(model);
    net.setPreferableTarget(DNN_TARGET_OPENCL);


EAST text detector

git clone https://github.com/argman/EAST EAST
下載 east_icdar2015_resnet_v1_50_rbox.zip 從
https://drive.google.com/open?id=0B3APw5BZJ67ETHNPaU9xUkVoV0U

Open
VS2015 x64 Native Tools Command Prompt
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC>
D:\OpenCV_4\OpenCV OCR\EAST\lanms>activate tensorflow
(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST\lanms>python --version
Python 3.5.5 :: Anaconda, Inc.D:\OpenCV_4\OpenCV OCR\EAST\lanms>cl adaptor.cpp .\include\clipper\clipper.cpp /I .\include /I "D:\Anaconda3\include" /LD /Fe:adaptor.pyd /link/LIBPATH:"D:\Anaconda3\libs"

Edit lanms/__init__.py 註解掉下兩行
#if subprocess.call(['make', '-C', BASE_DIR]) != 0:  # return value
#    raise RuntimeError('Cannot compile lanms: {}'.format(BASE_DIR))

Edit run_demo_server.py
change
parser.add_argument('--checkpoint-path', default=checkpoint_path)
to
parser.add_argument('--checkpoint_path', default=checkpoint_path)
並註解掉
        #ret.update(get_host_info())

期間因為 tensorflow 使用 python3.5
但是程式使用的 python36.lib, 所以移除掉 tensorflow, 重新安裝

查詢已安裝的模組,等待環境完成,重新安裝(沒有使用)
pip freeze>requirements.txt
pip install -r requirements.txt

查詢已安裝的模組
(tensorflow) D:\>conda list

(base) D:\>conda env remove -n tensorflow
(base) D:\>conda create -n tensorflow pip python=3.6
(base) D:\>activate tensorflow
(tensorflow) D:\>pip install --ignore-installed --upgrade tensorflow-gpu
(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>pip install opencv-python
(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>pip install opencv-contrib-python
(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>pip install --ignore-installed --upgrade tensorflow-gpu
(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>pip install scipy
(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>pip install matplotlib
(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>pip install Flask
(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>conda install shapely
(tensorflow) D:\OpenCV_4\OpenCV OCR\EAST>python run_demo_server.py --checkpoint_path="..\east_icdar2015_resnet_v1_50_rbox"

2018年10月26日 星期五

Tesseract OSD_example.cpp



// 決定頁面,行到行,字到字 的方向
#include "pch.h"
#include <iostream>

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>

int main()
{
// 頁面方向
static const char* const sOrientation[] {
"PAGE_UP",
"PAGE_RIGHT",
"PAGE_DOWN",
"PAGE_LEFT",
};
// 字到字的方向
static const char* const sWritingDirection[] {
"LEFT_TO_RIGHT",
"RIGHT_TO_LEFT",
"TOP_TO_BOTTOM",
};
// 行到行的方向
static const char* const sTextlineOrder[]{
"LEFT_TO_RIGHT",
"RIGHT_TO_LEFT",
"TOP_TO_BOTTOM",
};

Pix *image = pixRead("D:\\TensorFlow\\OCR\\aaa.png");
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
if (api->Init(NULL, "eng")) {
std::cerr << "Could not initialize tesseract.\n";
}
api->SetPageSegMode(tesseract::PSM_AUTO_OSD);
api->SetImage(image);
api->Recognize(0);
tesseract::PageIterator* it = api->AnalyseLayout();
tesseract::PageIteratorLevel level = tesseract::RIL_WORD;
if (it != 0) {
do {
tesseract::Orientation orientation;
tesseract::WritingDirection direction;
tesseract::TextlineOrder order;
float deskew_angle;

it->Orientation(&orientation, &direction, &order, &deskew_angle);
printf("Orientation: %s;\nWritingDirection: %s\nTextlineOrder: %s\n" \
"Deskew angle: %.4f\n",
sOrientation[orientation], sWritingDirection[direction],
sTextlineOrder[order], deskew_angle);
int left, top, right, bottom;
it->BoundingBox(level, &left, &top, &right, &bottom);
printf("BoundingBox: (%d, %d) (%d, %d)\n",
left, top, right, bottom);
} while (it->Next(level));
}

api->End();
pixDestroy(&image);
}

Tesseract ResultIterator.cpp

// 參照 Tesseract API for VS2017
// 依據字符辨識,列出所有候選字
#include "pch.h"
#include <iostream>

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>

int main()
{
Pix *image = pixRead("D:\\TensorFlow\\OCR\\aaa.png");
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
if (api->Init(NULL, "eng")) {
std::cerr << "Could not initialize tesseract.\n";
}
api->SetPageSegMode(tesseract::PSM_AUTO_OSD);
api->SetImage(image);
api->Recognize(0);
tesseract::ResultIterator* ri = api->GetIterator();
tesseract::PageIteratorLevel level = tesseract::RIL_SYMBOL;
//tesseract::PageIteratorLevel level = tesseract::RIL_TEXTLINE;
if (ri != 0) {
do {
const char* word = ri->GetUTF8Text(level);
float conf = ri->Confidence(level);
int x1, y1, x2, y2;
ri->BoundingBox(level, &x1, &y1, &x2, &y2);
printf("word: '%s';  \tconf: %.2f; BoundingBox: %d,%d,%d,%d;\n",
word, conf, x1, y1, x2, y2);
if (level = tesseract::RIL_SYMBOL) {
// 列出所有可能的候選字
tesseract::ChoiceIterator ci(*ri);
do {
const char* choice = ci.GetUTF8Text();
printf("\t\t%s conf: %f\n", choice, ci.Confidence());
} while (ci.Next());
printf("---------------------------------------------\n");
}
delete[] word;
} while (ri->Next(level));
}

api->End();
pixDestroy(&image);
}

Tesseract GetComponentImages.cpp

// 參照 Tesseract API for VS2017
// 依據行辨識
int main()
{
Pix *image = pixRead("D:\\temp\\OpenCV_err.png");
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
if (api->Init(NULL, "eng")) {
std::cerr << "Could not initialize tesseract.\n";
}
api->SetImage(image);

Boxa* boxes = api->GetComponentImages(tesseract::RIL_TEXTLINE, true, NULL, NULL);
fprintf(stdout, "Found %d textline image components.\n", boxes->n);
for (int i = 0; i < boxes->n; i++) {
BOX* box = boxaGetBox(boxes, i, L_CLONE);
api->SetRectangle(box->x, box->y, box->w, box->h);
char* ocrResult = api->GetUTF8Text();
int conf = api->MeanTextConf();
fprintf(stdout, "Box[%d]: x=%d, y=%d, w=%d, h=%d, confidence: %d, text: %s",
i, box->x, box->y, box->w, box->h, conf, ocrResult);
delete[] ocrResult;
}

api->End();
pixDestroy(&image);
}

Tesseract BasicExample.cpp

// 參照 Tesseract API for VS2017
// 整張圖片一次辨識
#include "pch.h"
#include <iostream>

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>

int main()
{
char *outText = NULL;
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
// 英文加中文
if (api->Init(NULL, "eng+chi_tra")) {
std::cerr << "Could not initialize tesseract.\n";
}
Pix *image = pixRead("D:\\TensorFlow\\OCR\\bbb.png");
// 以下兩行,可 擇一 或 都不執行
api->SetPageSegMode(tesseract::PSM_SINGLE_BLOCK); // 預設值
//api->SetPageSegMode(tesseract::PSM_SINGLE_LINE);
api->SetImage(image);
//api->SetRectangle(40, 5, 150, 30);
outText = api->GetUTF8Text();
// 要顯示中文要經很多轉換
int len = ::MultiByteToWideChar(CP_UTF8, NULL, outText, -1, NULL, 0);
wchar_t* wszString = new wchar_t[len + 1];
::MultiByteToWideChar(CP_UTF8, NULL, outText, -1, wszString, len);
wszString[len] = '\0';
len = ::WideCharToMultiByte(CP_ACP, 0, wszString, -1, NULL, 0, NULL, NULL);
char* szBig5 = new char[len + 1];
::WideCharToMultiByte(CP_ACP, 0, wszString, -1, szBig5, len, NULL, NULL);
szBig5[len] = '\0';

std::cout << "=======================\n";
std::cout << outText << "\n";
std::cout << "=======================\n";
std::cout << wszString << "\n";
std::cout << "=======================\n";
std::cout << szBig5 << "\n";
std::cout << "=======================\n";
api->End();
if (outText) delete[] outText;
if (wszString) delete[] wszString;
pixDestroy(&image);
}

Tesseract API for VS2017

參照 Tesseract OCR 安裝

c:\users\userName\.cppan\stroage 目錄下搜尋 leptonica
可以找到 C:\Users\userName\.cppan\storage\src\8f\a3\90d7\src 目錄
拷貝所有檔案至 C:\Program Files\tesseract\include\leptonica

VS2013 不能編譯, VS2017 才可以
Property Pages/Platform 選 x64
Property Pages/Configuration 選 Debug
Property Pages/Configuration Properties/Debugging/Environment
PATH=%PATH%;D:\TensorFlow\OCR\tesseract\win64\bin\Debug
Property Pages/Configuration Properties/C/C++/General/Additional Include Directories
新增 C:\Program Files\tesseract\include
Property Pages/Configuration Properties/Linker/General/Additional Library Directories
新增 D:\TensorFlow\OCR\tesseract\win64\Debug
Property Pages/Configuration Properties/Linker/Input/Additional Dependencies
新增 tesseract40d.lib
新增 pvt.cppan.demo.danbloomberg.leptonica-1.76.0.lib

Property Pages/Platform 選 x64
Property Pages/Configuration 選 Release
Property Pages/Configuration Properties/Debugging/Environment
PATH=%PATH%;C:\Program Files\tesseract\bin
Property Pages/Configuration Properties/C/C++/General/Additional Include Directories
新增 C:\Program Files\tesseract\include
Property Pages/Configuration Properties/Linker/General/Additional Library Directories
新增 C:\Program Files\tesseract\lib
Property Pages/Configuration Properties/Linker/Input/Additional Dependencies
新增 tesseract40.lib
新增 pvt.cppan.demo.danbloomberg.leptonica-1.76.0.lib

執行 Debug 程式時,若是使用 Release 的 lib, delete GetUTF8Text() 產生的記憶體
會產生 Exception


2018年10月19日 星期五

Tesseract OCR

https://digi.bib.uni-mannheim.de/tesseract/
可以下載安裝版
雖然它只可以執行,不能開發程式,但還是先安裝,因為要使用它的 tessdata
等用完再移除吧
https://github.com/UB-Mannheim/tesseract/wiki/Windows-build
有一些安裝檔如何產生的說明,但它是利用 Linux 跨平台編譯產生的

使用 Vcpkg
開啟 PowerShell
git clone https://github.com/Microsoft/vcpkg.git vcpkg
cd vcpkg
.\bootstrap-vcpkg.bat
產生 vcpkg.exe
.\vcpkg install tesseract:x64-windows
產生 installed\x64-windows\tools\tesseract
.\vcpkg install tesseract:x64-windows-static
產生 installed\x64-windows-static\tools\tesseract
.\vcpkg install tesseract:x86-windows-static
有 include, dll, lib, 但卻是 3.05 版

使用 cmake, cppan, vs2017
原先使用之前的 cmake(3.10版), 一直失敗, 更新成 cmake(3.12版)才成功
下載 cppan
cppan 會使用 c:\users\userName\.cppan 目錄,若有失敗要重新開始,刪除這個目錄
設定 PATH 到 cmake 和 cppan
開啟 PowerShell
git clone https://github.com/tesseract-ocr/tesseract tesseract
cd tesseract
mkdir win64
cd win64
PS D:\Tesseract\tesseract\win64> $env:Path += ";D:\Tesseract\cppan-master-Windows-client;C:\Program Files\CMake\bin"
PS D:\Tesseract\tesseract\win64> $env:path.split(";")
cppan ..
cmake .. -G "Visual Studio 15 2017 Win64"
開啟 vs2017
開啟 tesseract\win64\tesseract.sln
先編譯 "CPPAN Targets/Service/cppan-d-b-d" 專案,會產生錯誤
最主要為程式內含有錯誤的字元
開啟這些檔案,另存新檔,選擇 Save 旁邊的小按鈕,選擇 Save with encoding
Encoding 選擇 Unicode (UTF-8 with signature)
ALL_BUILD 可以成功,接著 build INSTALL
此時會產生 MSB307 setlocal 錯誤
主要是因為沒有權限安裝程式到 C:\Program Files\tesseract
使用 Administrator 身分重新開啟 vs2017
重新 build 即可
增加中文字(含手寫)的支援
到 https://github.com/tesseract-ocr/tessdata 下載 tessdata
但是我不知道要下載那些檔案,乾脆使用安裝檔內的 tessdata
設定環境變數 TESSDATA_PREFIX=C:\Program Files\tesseract\tessdata



發現在部分電腦上速度會非常慢,可關閉 openmp 改善
修改 project libtesseract 和 tesseract 的 property
C/C++/Language/Open MP Support: No(/openmp-)