生活紀錄: DGX Spark 安裝 VibeVoice

https://github.com/microsoft/VibeVoice/tree/main

但發現 TTS (語音合成) 的核心實作代碼已被移除。

https://github.com/vibevoice-community/VibeVoice/tree/main

https://github.com/dontriskit/VibeVoice-FastAPI/tree/main

$ git clone https://github.com/microsoft/VibeVoice.git

$ cd VibeVoice

$ uv venv --python 3.12

$ source .venv/bin/activate

$ vi pyproject.toml 移除 dependencies 下的 torch

$ uv sync

$ uv pip install --force-reinstall torch \

--index-url https://download.pytorch.org/whl/cu130

$ uv pip install --upgrade setuptools wheel pip ninja

$ MAX_JOBS=2 uv pip install flash-attn --no-build-isolation

microsoft/VibeVoice-ASR

--model_path microsoft/VibeVoice-Realtime-0.5B \

--model_path microsoft/VibeVoice-1.5B \ xxx

$ python demo/realtime_model_inference_from_file.py \

--model_path microsoft/VibeVoice-1.5B \

--speaker_name Carter \

--txt_path demo/text_examples/1p_vibevoice.txt

$ vi demo/vibevoice_asr_inference_from_file.py 將 dtype 改成 torch_dtype

self.model = VibeVoiceASRForConditionalGeneration.from_pretrained(

model_path,

dtype=dtype,

device_map=device if device == "auto" else None,

attn_implementation=attn_implementation,

trust_remote_code=True

)

$ python demo/vibevoice_asr_inference_from_file.py \

--model_path microsoft/VibeVoice-ASR \

--audio_files=outputs/1p_vibevoice_generated.wav

======================================

https://github.com/microsoft/VibeVoice/tree/main

但發現 TTS (語音合成) 的核心實作代碼已被移除。

https://github.com/vibevoice-community/VibeVoice/tree/main

https://github.com/dontriskit/VibeVoice-FastAPI/tree/main

$ git clone https://github.com/vibevoice-community/VibeVoice.git VibeVoice-community

$ cd VibeVoice-community

$ uv venv --python 3.12

$ source .venv/bin/activate

$ vi pyproject.toml 移除 dependencies 下的 torch, requires-python 改成 3.10

$ uv sync

$ uv pip install --force-reinstall torch \

--index-url https://download.pytorch.org/whl/cu130

$ uv pip install --upgrade setuptools wheel pip ninja

$ MAX_JOBS=4 uv pip install flash-attn --no-build-isolation

# --model_path vibevoice/VibeVoice-1.5B \

# --model_path microsoft/VibeVoice-1.5B \ 也可使用

# 多人語音，單 speaker 可以，但多 speaker 會不清楚

$ python demo/inference_from_file.py \

--model_path /mnt/480SSD/models/VibeVoice-1.5B \

--speaker_name Xinran \

--txt_path demo/text_examples/2p_yayi.txt

$ vi vibevoice/modular/modeling_vibevoice_asr.py 將 is_final_chunk 註解掉

# Encode chunk for acoustic tokenizer (don't sample yet)

acoustic_encoder_output = self.model.acoustic_tokenizer.encode(

chunk.unsqueeze(1),

cache=acoustic_encoder_cache,

sample_indices=sample_indices,

use_cache=True,

is_final_chunk=is_final,

)

acoustic_mean_segments.append(acoustic_encoder_output.mean)

# Encode chunk for semantic tokenizer (take mean directly)

semantic_encoder_output = self.model.semantic_tokenizer.encode(

chunk.unsqueeze(1),

cache=semantic_encoder_cache,

sample_indices=sample_indices,

use_cache=True,

is_final_chunk=is_final,

)

semantic_mean_segments.append(semantic_encoder_output.mean)

$ vi vibevoice/modular/modular_vibevoice_text_tokenizer.py

參考 https://github.com/microsoft/VibeVoice/tree/main 的檔案，加入 VibeVoiceASRTextTokenizerFast 部分程式碼

$ python demo/vibevoice_asr_inference_from_file.py \

--model_path /mnt/480SSD/models/VibeVoice-ASR \

--audio_files=outputs/2p_yayi_generated.wav

$ python demo/vibevoice_asr_inference_from_file.py \

--model_path /mnt/480SSD/models/VibeVoice-ASR \

--audio_files=outputs/1p_Ch2EN_generated.wav

$ python demo/server.py

$ curl http://localhost:8100/v1/models

$ curl http://localhost:8100/v1/audio/voices

$ curl http://localhost:8100/v1/audio/transcriptions \

-H "Content-Type: multipart/form-data" \

-F "file=@/home/spark/DiskD/audio_llm/GPT-SoVITS/GPT-SoVITS/samples/output.wav" \

-F "model=whisper-1"

$ curl http://localhost:8100/v1/audio/speech \

-H "Authorization: Bearer $OPENAI_API_KEY" \

-H "Content-Type: application/json" \

-d '{

"model": "tts-1",

"input": "大家好，今天我們要測試 OpenAI 的語音合成功能。",

"voice": "Xinran"

}' \

--output speech.wav

$ curl http://localhost:8100/v1/audio/voices

==========================================

建立安裝 wheel

$ uv pip install build

$ python -m build --wheel

$ ls dist

$ uv pip install path_to/VibeVoice-community/dist/vibevoice-0.1.0-py3-none-any.whl

$ python demo/tts_server.py

生活紀錄

網頁

2026年4月17日星期五

DGX Spark 安裝 VibeVoice

沒有留言:

張貼留言

網頁

2026年4月17日 星期五

DGX Spark 安裝 VibeVoice

沒有留言:

張貼留言

2026年4月17日星期五