參考 https://speaches-ai.github.io/speaches/
參考 https://github.com/speaches-ai/speaches/tree/master
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.yaml
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cuda.yaml
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cuda-cdi.yaml
export COMPOSE_FILE=compose.cuda-cdi.yaml
安裝使用 CUDA with CDI(Container Device Interface) feature enabled
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
修改 compose.cuda-cdi.yaml, 加入 command, 並修改 devices
services:
speaches:
command: ["uvicorn", "--factory", "speaches.main:create_app", "--ws-ping-interval", "1000", "--ws-ping-timeout", "1200"]
# WARN: requires Docker Compose 2.24.2
# https://docs.docker.com/reference/compose-file/merge/#replace-value
devices:
- driver: nvidia
device_ids: ['0']
capabilities:
- gpu
伺服器端 log 出現下列錯誤
websockets.exceptions.ConnectionClosedError: sent 1011 (internal error) keepalive ping timeout; no close frame received
compose.cuda-cdi.yaml 的 command 加入 ws-ping-interval ws-ping-timeout
$ docker compose up --detach
$ docker compose stop
$ docker compose rm
$ docker compose logs
$ docker inspect speaches
$ docker cp speaches:/home/ubuntu/speaches/speaches/config.py .
$ docker compose exec speaches sh
$ docker compose run -d speaches uvicorn --factory speaches.main:create_app --ws-ping-interval=10 --ws-ping-timeout=12
伺服器端 log 出現下列錯誤
INFO:speaches.routers.stt:audio_receiver:262:Not enough speech in the last 30.0 seconds.
$ vi speaches/src/speaches/config.py
inactivity_window_seconds: float = 1000.0
說明文件
http://localhost:8000/docs
http://localhost:8000/redoc
$ curl -X POST -F "file=@/mnt/Data/Whisper/examples/《大隋说书人 》 01.mp3" -F "prompt=歡迎收聽第一集處女觀大隨雍洲且墨城深秋夜" -F "language=zh" http://localhost:8000/v1/audio/transcriptions
ubuntu Settings/Sound:Input 選擇正確輸入源,螢幕上的音量可以顯示輸入音量
列出可用音源輸入
$ arecord -l
**** List of CAPTURE Hardware Devices ****
card 0: PCH [HDA Intel PCH], device 0: ALCS1200A Analog [ALCS1200A Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 0: PCH [HDA Intel PCH], device 2: ALCS1200A Alt Analog [ALCS1200A Alt Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
使用 card 0
$ ffmpeg -f alsa -i hw:0 -acodec libmp3lame -b:a 128k -abr 1 aaa.mp3
按q, 停止
因為使用 -i hw:0, 無法錄到聲音, 改用 -i default
$ arecord -L
default
Playback/recording through the PulseAudio sound server
使用預設音源輸入
$ ffmpeg -f alsa -i default -acodec libmp3lame -b:a 128k -abr 1 aaa.mp3
按q, 停止
$ ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le aaa.wav
按q, 停止
去除 mp3 的 metadata
$ ffmpeg -hide_banner -i '/mnt/Data/Whisper/examples/《大隋说书人 》 01.mp3' -c:v copy -c:a copy -map_metadata -1 test.mp3
轉 mp3 到 pcm
$ ffmpeg -i test.mp3 -f s16le -ar 16000 -ac 1 test.pcm
$ cat test.pcm | pv -qL 32000 | websocat --no-close --binary 'ws://localhost:8000/v1/audio/transcriptions?language=zh'
客戶端出現下列錯誤
Closing WebSocket connection due to ping timeout
命令中加入 --ping-timeout 和 --ping-interval
$ cat test.pcm | pv -qL 32000 | websocat --no-close --binary --ping-timeout 12000 --ping-interval 10000 'ws://localhost:8000/v1/audio/transcriptions?language=zh'
由麥克風輸入,產生 pcm 檔
$ ffmpeg -f alsa -ar 16000 -i default -ac 1 -f s16le aaa.pcm
轉成 mp3
$ ffmpeg -f s16le -ar 16000 -ac 1 -i aaa.pcm -codec:a libmp3lame aaa.mp3
$ cat aaa.pcm | pv -aL 32000 | websocat --no-close --binary --ping-timeout 12000 --ping-interval 10000 'ws://localhost:8000/v1/audio/transcriptions?language=zh'
測試 CLI
export OPENAI_BASE_URL=http://localhost:8000/v1/
export OPENAI_API_KEY="cant-be-empty"
openai api audio.transcriptions.create -m Systran/faster-whisper-large-v3 -f '/mnt/Data/Whisper/examples/《大隋说书人 》 01.mp3' --response-format text
申請 OPENAI_API_KEY, 並測試
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
curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 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" \
-d '{
"model": "gpt-4o-mini",
"store": true,
"messages": [
{"role": "user", "content": "write a haiku about ai"}
]
}'
沒有留言:
張貼留言