https://github.com/microsoft/VibeVoice/tree/main
但發現 TTS (語音合成) 的核心實作代碼已被移除。
https://github.com/vibevoice-community/VibeVoice/tree/main
https://github.com/dontriskit/VibeVoice-FastAPI/tree/main
$ git clone https://github.com/microsoft/VibeVoice.git
$ cd VibeVoice
$ uv venv --python 3.12
$ source .venv/bin/activate
$ vi pyproject.toml 移除 dependencies 下的 torch
$ uv sync
$ uv pip install --force-reinstall torch \
--index-url https://download.pytorch.org/whl/cu130
$ uv pip install --upgrade setuptools wheel pip ninja
$ MAX_JOBS=2 uv pip install flash-attn --no-build-isolation
microsoft/VibeVoice-ASR
--model_path microsoft/VibeVoice-Realtime-0.5B \
--model_path microsoft/VibeVoice-1.5B \ xxx
$ python demo/realtime_model_inference_from_file.py \
--model_path microsoft/VibeVoice-1.5B \
--speaker_name Carter \
--txt_path demo/text_examples/1p_vibevoice.txt
$ vi demo/vibevoice_asr_inference_from_file.py 將 dtype 改成 torch_dtype
self.model = VibeVoiceASRForConditionalGeneration.from_pretrained(
model_path,
dtype=dtype,
device_map=device if device == "auto" else None,
attn_implementation=attn_implementation,
trust_remote_code=True
)
$ python demo/vibevoice_asr_inference_from_file.py \
--model_path microsoft/VibeVoice-ASR \
--audio_files=outputs/1p_vibevoice_generated.wav
======================================
https://github.com/microsoft/VibeVoice/tree/main
但發現 TTS (語音合成) 的核心實作代碼已被移除。
https://github.com/vibevoice-community/VibeVoice/tree/main
https://github.com/dontriskit/VibeVoice-FastAPI/tree/main
$ git clone https://github.com/vibevoice-community/VibeVoice.git VibeVoice-community
$ cd VibeVoice-community
$ uv venv --python 3.12
$ source .venv/bin/activate
$ vi pyproject.toml 移除 dependencies 下的 torch, requires-python 改成 3.10
$ uv sync
$ uv pip install --force-reinstall torch \
--index-url https://download.pytorch.org/whl/cu130
$ uv pip install --upgrade setuptools wheel pip ninja
$ MAX_JOBS=4 uv pip install flash-attn --no-build-isolation
# --model_path vibevoice/VibeVoice-1.5B \
# --model_path microsoft/VibeVoice-1.5B \ 也可使用
# 多人語音,單 speaker 可以,但多 speaker 會不清楚
$ python demo/inference_from_file.py \
--model_path /mnt/480SSD/models/VibeVoice-1.5B \
--speaker_name Xinran \
--txt_path demo/text_examples/2p_yayi.txt
$ vi vibevoice/modular/modeling_vibevoice_asr.py 將 is_final_chunk 註解掉
# Encode chunk for acoustic tokenizer (don't sample yet)
acoustic_encoder_output = self.model.acoustic_tokenizer.encode(
chunk.unsqueeze(1),
cache=acoustic_encoder_cache,
sample_indices=sample_indices,
use_cache=True,
is_final_chunk=is_final,
)
acoustic_mean_segments.append(acoustic_encoder_output.mean)
# Encode chunk for semantic tokenizer (take mean directly)
semantic_encoder_output = self.model.semantic_tokenizer.encode(
chunk.unsqueeze(1),
cache=semantic_encoder_cache,
sample_indices=sample_indices,
use_cache=True,
is_final_chunk=is_final,
)
semantic_mean_segments.append(semantic_encoder_output.mean)
$ vi vibevoice/modular/modular_vibevoice_text_tokenizer.py
參考 https://github.com/microsoft/VibeVoice/tree/main 的檔案,加入 VibeVoiceASRTextTokenizerFast 部分程式碼
$ python demo/vibevoice_asr_inference_from_file.py \
--model_path /mnt/480SSD/models/VibeVoice-ASR \
--audio_files=outputs/2p_yayi_generated.wav
$ python demo/vibevoice_asr_inference_from_file.py \
--model_path /mnt/480SSD/models/VibeVoice-ASR \
--audio_files=outputs/1p_Ch2EN_generated.wav
$ python demo/server.py
$ curl http://localhost:8100/v1/models
$ curl http://localhost:8100/v1/audio/voices
$ curl http://localhost:8100/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F "file=@/home/spark/DiskD/audio_llm/GPT-SoVITS/GPT-SoVITS/samples/output.wav" \
-F "model=whisper-1"
$ curl http://localhost:8100/v1/audio/speech \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "大家好,今天我們要測試 OpenAI 的語音合成功能。",
"voice": "Xinran"
}' \
--output speech.wav
$ curl http://localhost:8100/v1/audio/voices
==========================================
建立安裝 wheel
$ uv pip install build
$ python -m build --wheel
$ ls dist
$ uv pip install path_to/VibeVoice-community/dist/vibevoice-0.1.0-py3-none-any.whl
$ python demo/tts_server.py
沒有留言:
張貼留言