網頁

2026年4月17日 星期五

DGX Spark 安裝 VibeVoice

https://github.com/microsoft/VibeVoice/tree/main
但發現 TTS (語音合成) 的核心實作代碼已被移除。
https://github.com/vibevoice-community/VibeVoice/tree/main
https://github.com/dontriskit/VibeVoice-FastAPI/tree/main

$ git clone https://github.com/microsoft/VibeVoice.git
$ cd VibeVoice
$ uv venv --python 3.12
$ source .venv/bin/activate
$ vi pyproject.toml 移除 dependencies 下的 torch
$ uv sync
$ uv pip install --force-reinstall torch \
  --index-url https://download.pytorch.org/whl/cu130
$ uv pip install --upgrade setuptools wheel pip ninja
$ MAX_JOBS=2 uv pip install flash-attn --no-build-isolation

microsoft/VibeVoice-ASR

  --model_path microsoft/VibeVoice-Realtime-0.5B \
  --model_path microsoft/VibeVoice-1.5B \ xxx
$ python demo/realtime_model_inference_from_file.py \
  --model_path microsoft/VibeVoice-1.5B \
  --speaker_name Carter \
  --txt_path demo/text_examples/1p_vibevoice.txt

$ vi demo/vibevoice_asr_inference_from_file.py 將 dtype 改成 torch_dtype
        self.model = VibeVoiceASRForConditionalGeneration.from_pretrained(
            model_path,
            dtype=dtype,
            device_map=device if device == "auto" else None,
            attn_implementation=attn_implementation,
            trust_remote_code=True
        )
$ python demo/vibevoice_asr_inference_from_file.py \
  --model_path microsoft/VibeVoice-ASR \
  --audio_files=outputs/1p_vibevoice_generated.wav 


======================================
https://github.com/microsoft/VibeVoice/tree/main
但發現 TTS (語音合成) 的核心實作代碼已被移除。
https://github.com/vibevoice-community/VibeVoice/tree/main
https://github.com/dontriskit/VibeVoice-FastAPI/tree/main

$ git clone https://github.com/vibevoice-community/VibeVoice.git VibeVoice-community
$ cd VibeVoice-community
$ uv venv --python 3.12
$ source .venv/bin/activate
$ vi pyproject.toml 移除 dependencies 下的 torch, requires-python 改成 3.10
$ uv sync
$ uv pip install --force-reinstall torch \
  --index-url https://download.pytorch.org/whl/cu130
$ uv pip install --upgrade setuptools wheel pip ninja
$ MAX_JOBS=4 uv pip install flash-attn --no-build-isolation

#  --model_path vibevoice/VibeVoice-1.5B \
#  --model_path microsoft/VibeVoice-1.5B \ 也可使用
#  多人語音,單 speaker 可以,但多 speaker 會不清楚
$ python demo/inference_from_file.py \
  --model_path /mnt/480SSD/models/VibeVoice-1.5B \
  --speaker_name Xinran \
  --txt_path demo/text_examples/2p_yayi.txt

$ vi vibevoice/modular/modeling_vibevoice_asr.py 將 is_final_chunk 註解掉
                    # Encode chunk for acoustic tokenizer (don't sample yet)
                    acoustic_encoder_output = self.model.acoustic_tokenizer.encode(
                        chunk.unsqueeze(1),
                        cache=acoustic_encoder_cache,
                        sample_indices=sample_indices,
                        use_cache=True,
                        is_final_chunk=is_final,
                    )
                    acoustic_mean_segments.append(acoustic_encoder_output.mean)
                    
                    # Encode chunk for semantic tokenizer (take mean directly)
                    semantic_encoder_output = self.model.semantic_tokenizer.encode(
                        chunk.unsqueeze(1),
                        cache=semantic_encoder_cache,
                        sample_indices=sample_indices,
                        use_cache=True,
                        is_final_chunk=is_final,
                    )
                    semantic_mean_segments.append(semantic_encoder_output.mean)
$ vi vibevoice/modular/modular_vibevoice_text_tokenizer.py
參考 https://github.com/microsoft/VibeVoice/tree/main 的檔案,加入 VibeVoiceASRTextTokenizerFast 部分程式碼

$ python demo/vibevoice_asr_inference_from_file.py \
  --model_path /mnt/480SSD/models/VibeVoice-ASR \
  --audio_files=outputs/2p_yayi_generated.wav

$ python demo/vibevoice_asr_inference_from_file.py \
  --model_path /mnt/480SSD/models/VibeVoice-ASR \
  --audio_files=outputs/1p_Ch2EN_generated.wav

$ python demo/server.py
$ curl http://localhost:8100/v1/models
$ curl http://localhost:8100/v1/audio/voices
$ curl http://localhost:8100/v1/audio/transcriptions \
  -H "Content-Type: multipart/form-data" \
  -F "file=@/home/spark/DiskD/audio_llm/GPT-SoVITS/GPT-SoVITS/samples/output.wav" \
  -F "model=whisper-1"
$ curl http://localhost:8100/v1/audio/speech \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "大家好,今天我們要測試 OpenAI 的語音合成功能。",
    "voice": "Xinran"
  }' \
  --output speech.wav
$ curl http://localhost:8100/v1/audio/voices

==========================================
建立安裝 wheel
$ uv pip install build
$ python -m build --wheel
$ ls dist
$ uv pip install path_to/VibeVoice-community/dist/vibevoice-0.1.0-py3-none-any.whl

$ python demo/tts_server.py

DGX Spark 上安裝 GPT-SoVITS

# 原始的 GPT-SoVITS 在 spark 上無法使用 cuda
# 安裝 cpu 版本,以作為參考
$ docker compose -f my_docker-compose.yaml up -d
$ docker exec -it GPT-SoVITS-CU128 bash
(base) root@65f69181a6d7:/workspace/GPT-SoVITS# python --version
Python 3.12.12
(base) root@65f69181a6d7:/workspace/GPT-SoVITS# pip list
Package                  Version
------------------------ -----------
absl-py                  2.4.0
accelerate               1.12.0
aiofiles                 23.2.1
aiohappyeyeballs         2.6.1
aiohttp                  3.13.3
aiosignal                1.4.0
aliyun-python-sdk-core   2.16.0
aliyun-python-sdk-kms    2.16.5
annotated-doc            0.0.4
annotated-types          0.7.0
antlr4-python3-runtime   4.9.3
anyio                    4.12.1
archspec                 0.2.5
attrs                    25.4.0
audioread                3.1.0
av                       16.1.0
backports.zstd           1.3.0
boltons                  25.0.0
Brotli                   1.2.0
budoux                   0.7.0
certifi                  2026.2.25
cffi                     2.0.0
chardet                  5.2.0
charset-normalizer       3.4.4
click                    8.3.1
cn2an                    0.5.23
colorlog                 6.10.1
conda                    26.1.0
conda-libmamba-solver    25.11.0
conda-package-handling   2.4.0
conda_package_streaming  0.12.0
contourpy                1.3.3
crcmod                   1.7
cryptography             46.0.4
ctranslate2              4.7.1
cuda-bindings            12.9.4
cuda-pathfinder          1.2.2
cycler                   0.12.1
decorator                5.2.1
Distance                 0.1.3
distro                   1.9.0
dnspython                2.8.0
editdistance             0.8.1
einops                   0.8.2
einx                     0.3.0
email-validator          2.3.0
fast-langdetect          1.0.0
fastapi                  0.128.5
fastapi-cli              0.0.20
fastapi-cloud-cli        0.11.0
fastar                   0.8.0
faster-whisper           1.2.1
fasttext-predict         0.9.2.4
ffmpeg-python            0.2.0
ffmpy                    1.0.0
filelock                 3.20.0
flash_attn               2.8.3
flatbuffers              25.12.19
fonttools                4.61.1
frozendict               2.4.7
frozenlist               1.8.0
fsspec                   2025.12.0
funasr                   1.0.27
future                   1.0.0
g2p-en                   2.1.0
g2pk2                    0.0.3
gradio                   4.44.1
gradio_client            1.3.0
grpcio                   1.78.0
h11                      0.16.0
h2                       4.3.0
hf-xet                   1.2.0
hpack                    4.1.0
httpcore                 1.0.9
httptools                0.7.1
httpx                    0.28.1
huggingface_hub          0.36.2
hydra-core               1.3.2
hyperframe               6.1.0
idna                     3.11
importlib_resources      6.5.2
inflect                  7.5.0
jaconv                   0.5.0
jamo                     0.4.1
jieba                    0.42.1
jieba_fast               0.53
Jinja2                   3.1.6
jmespath                 0.10.0
joblib                   1.5.3
jsonpatch                1.33
jsonpointer              3.0.0
kaldiio                  2.18.1
kiwisolver               1.4.9
ko-pron                  1.3
lazy_loader              0.4
libmambapy               2.5.0
librosa                  0.10.2
lightning-utilities      0.15.2
llvmlite                 0.46.0
loguru                   0.7.3
Markdown                 3.10.1
markdown-it-py           4.0.0
MarkupSafe               2.1.5
matplotlib               3.10.8
mdurl                    0.1.2
menuinst                 2.4.2
modelscope               1.34.0
more-itertools           10.8.0
mpmath                   1.3.0
msgpack                  1.1.2
multidict                6.7.1
networkx                 3.6.1
ninja                    1.13.0
nltk                     3.9.2
numba                    0.63.1
numpy                    1.26.4
nvidia-cublas-cu12       12.8.4.1
nvidia-cuda-cupti-cu12   12.8.90
nvidia-cuda-nvrtc-cu12   12.8.93
nvidia-cuda-runtime-cu12 12.8.90
nvidia-cudnn-cu12        9.10.2.21
nvidia-cufft-cu12        11.3.3.83
nvidia-cufile-cu12       1.13.1.3
nvidia-curand-cu12       10.3.9.90
nvidia-cusolver-cu12     11.7.3.90
nvidia-cusparse-cu12     12.5.8.93
nvidia-cusparselt-cu12   0.7.1
nvidia-nccl-cu12         2.27.5
nvidia-nvjitlink-cu12    12.8.93
nvidia-nvshmem-cu12      3.4.5
nvidia-nvtx-cu12         12.8.90
omegaconf                2.3.0
onnxruntime              1.24.1
openai-whisper           20250625
OpenCC                   1.2.0
orjson                   3.11.7
oss2                     2.19.1
packaging                26.0
pandas                   2.3.3
peft                     0.17.1
pillow                   10.4.0
pip                      26.0.1
platformdirs             4.5.1
pluggy                   1.6.0
pooch                    1.9.0
proces                   0.1.7
propcache                0.4.1
protobuf                 6.33.5
psutil                   7.2.2
pycosat                  0.6.6
pycparser                2.22
pycryptodome             3.23.0
pydantic                 2.10.6
pydantic_core            2.27.2
pydantic-extra-types     2.11.0
pydantic-settings        2.12.0
pydub                    0.25.1
Pygments                 2.19.2
pynndescent              0.6.0
pyopenjtalk              0.4.1
pyparsing                3.3.2
pypinyin                 0.55.0
PySocks                  1.7.1
python-dateutil          2.9.0.post0
python-dotenv            1.2.1
python-mecab-ko          1.3.7
python-mecab-ko-dic      2.1.1.post2
python-multipart         0.0.22
pytorch-lightning        2.6.1
pytorch-wpe              0.0.1
pytz                     2025.2
PyYAML                   6.0.3
regex                    2026.1.15
requests                 2.32.5
rich                     14.3.2
rich-toolkit             0.18.1
rignore                  0.7.6
robust-downloader        0.0.2
rotary-embedding-torch   0.8.9
ruamel.yaml              0.18.17
ruamel.yaml.clib         0.2.15
ruff                     0.15.0
safetensors              0.7.0
scikit-learn             1.8.0
scipy                    1.17.0
semantic-version         2.10.0
sentencepiece            0.2.1
sentry-sdk               2.52.0
setuptools               81.0.0
shellingham              1.5.4
six                      1.17.0
soundfile                0.13.1
soxr                     1.0.0
split-lang               2.1.1
starlette                0.52.1
sympy                    1.14.0
tensorboard              2.20.0
tensorboard-data-server  0.7.2
tensorboardX             2.6.4
threadpoolctl            3.6.0
tiktoken                 0.12.0
ToJyutping               3.2.0
tokenizers               0.21.4
tomlkit                  0.12.0
torch                    2.7.0+cpu
torch-complex            0.4.4
torchaudio               2.7.0
torchmetrics             1.5.0
tqdm                     4.67.3
transformers             4.50.0
triton                   3.6.0
truststore               0.10.4
typeguard                4.4.4
typer                    0.21.1
typing_extensions        4.15.0
typing-inspection        0.4.2
tzdata                   2025.3
umap-learn               0.5.11
urllib3                  2.6.3
uvicorn                  0.40.0
uvloop                   0.22.1
watchfiles               1.1.1
websockets               12.0
Werkzeug                 3.1.5
wheel                    0.46.3
wordsegment              1.3.1
x-transformers           2.16.0
yarl                     1.22.0
zstandard                0.25.0
(base) root@65f69181a6d7:/workspace/GPT-SoVITS# python3 -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA is available: {torch.cuda.is_available()}'); print(f'CUDA version: {torch.version.cuda}')"
PyTorch version: 2.7.0+cpu
CUDA is available: False
CUDA version: None
(base) root@65f69181a6d7:/workspace/GPT-SoVITS# python3 -c "import flash_attn; print(f'Flash Attention version: {flash_attn.__version__}')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/root/conda/lib/python3.12/site-packages/flash_attn/__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File "/root/conda/lib/python3.12/site-packages/flash_attn/flash_attn_interface.py", line 15, in <module>
    import flash_attn_2_cuda as flash_attn_gpu
ImportError: libcudart.so.12: cannot open shared object file: No such file or directory
(base) root@65f69181a6d7:/workspace/GPT-SoVITS# python3 -c "import librosa; print('import librosa ok')"
import librosa ok
(base) root@65f69181a6d7:/workspace/GPT-SoVITS# python3 -c "import transformers; print('Transformers OK')"
Transformers OK


=============================================
# 參考 https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html
# docker image 不要使用 nvcr.io/nvidia/pytorch, 因為沒有 torchaudio
# 使用 nvidia/cuda:13.1.1-devel-ubuntu24.04
$ git clone https://github.com/XXXXRT666/Docker-Base.git
$ cd Docker-Base
$ vi my_Dockerfile
$ docker image rm gpt-sovits-spark:v2p-0217
$ docker builder prune
$ docker build --progress=plain \
  --no-cache \
  --build-arg CUDA_VERSION=13.1 \
  -t gpt-sovits-spark:v2p-0217 \
  -f my_Dockerfile . 2>&1 | tee ../aaa.txt
$ docker run --rm -it --gpus all gpt-sovits-spark:v2p-0217 /bin/bash
root@1f79611b4722:/workspace# python3 --version
Python 3.12.3
root@1f79611b4722:/workspace# nvidia-smi
Tue Apr 14 07:07:43 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.142                Driver Version: 580.142        CUDA Version: 13.1     |
+-----------------------------------------+------------------------+----------------------+
root@1158f7b98c55:/workspace# ninja --version
1.13.0.git.kitware.jobserver-pipe-1
root@1f79611b4722:/workspace# ffmpeg -version
ffmpeg version 6.1.1-3ubuntu5 Copyright (c) 2000-2023 the FFmpeg developers
built with gcc 13 (Ubuntu 13.2.0-23ubuntu3)
configuration: --prefix=/usr --extra-version=3ubuntu5 --toolchain=hardened --libdir=/usr/lib/aarch64-linux-gnu --incdir=/usr/include/aarch64-linux-gnu --arch=arm64 --enable-gpl --disable-stripping --disable-omx --enable-gnutls --enable-libaom --enable-libass --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-openal --enable-opencl --enable-opengl --disable-sndio --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-ladspa --enable-libbluray --enable-libjack --enable-libpulse --enable-librabbitmq --enable-librist --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libx264 --enable-libzmq --enable-libzvbi --enable-lv2 --enable-sdl2 --enable-libplacebo --enable-librav1e --enable-pocketsphinx --enable-librsvg --enable-libjxl --enable-shared
libavutil      58. 29.100 / 58. 29.100
libavcodec     60. 31.102 / 60. 31.102
libavformat    60. 16.100 / 60. 16.100
libavdevice    60.  3.100 / 60.  3.100
libavfilter     9. 12.100 /  9. 12.100
libswscale      7.  5.100 /  7.  5.100
libswresample   4. 12.100 /  4. 12.100
libpostproc    57.  3.100 / 57.  3.100
(.venv) root@7d3af3d35cd6:/workspace# pip list
Package                  Version
------------------------ ----------------
accelerate               1.13.0
audioread                3.1.0
certifi                  2026.2.25
cffi                     2.0.0
charset-normalizer       3.4.7
cuda-bindings            13.0.3
cuda-pathfinder          1.5.3
cuda-toolkit             13.0.3.0
decorator                5.2.1
dllist                   2.0.0
einops                   0.8.2
filelock                 3.28.0
flash_attn               2.8.3
fsspec                   2026.3.0
hf-xet                   1.4.3
huggingface_hub          0.36.2
idna                     3.11
Jinja2                   3.1.6
joblib                   1.5.3
lazy-loader              0.5
librosa                  0.11.0
llvmlite                 0.47.0
MarkupSafe               3.0.3
mpmath                   1.3.0
msgpack                  1.1.2
networkx                 3.6.1
numba                    0.65.0
numpy                    2.4.4
nvidia-cublas            13.1.0.3
nvidia-cuda-cupti        13.0.85
nvidia-cuda-nvrtc        13.0.88
nvidia-cuda-runtime      13.0.96
nvidia-cuda-runtime-cu13 0.0.0a0
nvidia-cudnn-cu13        9.15.1.9
nvidia-cufft             12.0.0.61
nvidia-cufile            1.15.1.6
nvidia-curand            10.4.0.35
nvidia-cusolver          12.0.4.66
nvidia-cusparse          12.6.3.3
nvidia-cusparselt-cu13   0.8.0
nvidia-nccl-cu13         2.28.9
nvidia-nvjitlink         13.0.88
nvidia-nvshmem-cu13      3.4.5
nvidia-nvtx              13.0.85
packaging                26.1
pillow                   12.2.0
pip                      26.0.1
platformdirs             4.9.6
pooch                    1.9.0
psutil                   7.2.2
pycparser                3.0
PyYAML                   6.0.3
regex                    2026.4.4
requests                 2.33.1
safetensors              0.7.0
scikit-learn             1.8.0
scipy                    1.17.1
sentencepiece            0.2.1
setuptools               82.0.1
soundfile                0.13.1
soxr                     1.0.0
sympy                    1.14.0
tensorrt                 10.14.1.48.post1
tensorrt_cu13            10.14.1.48.post1
tensorrt_cu13_bindings   10.14.1.48.post1
tensorrt_cu13_libs       10.14.1.48.post1
threadpoolctl            3.6.0
tokenizers               0.21.4
torch                    2.10.0+cu130
torch_tensorrt           2.10.0+cu130
torchaudio               2.10.0+cu130
torchcodec               0.10.0+cu130
torchvision              0.25.0+cu130
tqdm                     4.67.3
transformers             4.50.0
triton                   3.6.0
typing_extensions        4.15.0
urllib3                  2.6.3
wheel                    0.46.3
(.venv) root@7d3af3d35cd6:/workspace# python3 -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA is available: {torch.cuda.is_available()}'); print(f'CUDA version: {torch.version.cuda}')"
PyTorch version: 2.10.0+cu130
CUDA is available: True
CUDA version: 13.0
(.venv) root@7d3af3d35cd6:/workspace# python3 -c "import torch; a=torch.randn(1, 3, 224, 224).cuda(); print('GPU ok')"
/workspace/.venv/lib/python3.12/site-packages/torch/cuda/__init__.py:435: UserWarning: 
    Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.
    Minimum and Maximum cuda capability supported by this version of PyTorch is
    (8.0) - (12.0)
    
  queued_call()
GPU ok
(.venv) root@7d3af3d35cd6:/workspace# python3 -c "import torchaudio; print(f'torchaudio version: {torchaudio.__version__}')"
torchaudio version: 2.10.0+cu130
(.venv) root@7d3af3d35cd6:/workspace# python3 -c "import torchcodec; print(f'torchcodec version: {torchcodec.__version__}')"
torchcodec version: 0.10.0+cu130
(.venv) root@7d3af3d35cd6:/workspace# python3 -c "import torch; import torchcodec; print('modules:', dir(torchcodec))"
modules: ['AudioSamples', 'Frame', 'FrameBatch', 'Path', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '_core', '_frame', '_internally_replaced_utils', 'cmake_prefix_path', 'core_library_path', 'decoders', 'encoders', 'ffmpeg_major_version', 'samplers', 'transforms', 'version']
(.venv) root@7d3af3d35cd6:/workspace# python3 -c "import flash_attn; print(f'Flash Attention version: {flash_attn.__version__}')"
Flash Attention version: 2.8.3
(.venv) root@7d3af3d35cd6:/workspace# python3 -c "import torchvision; print(f'torchvision version: {torchvision.__version__}')"
torchvision version: 0.25.0+cu130
(.venv) root@7d3af3d35cd6:/workspace# python3 -c "import librosa; print('import librosa ok')"
import librosa ok
(.venv) root@7d3af3d35cd6:/workspace# python3 -c "import transformers; print('Transformers OK')"
Transformers OK


=============================================
$ docker image rm gpt-sovits-cu128-fixed:latest
$ docker builder prune
$ docker build --progress=plain \
  --no-cache \
  -t gpt-sovits-cu128-fixed \
  -f my_Dockerfile . 2>&1 | tee ../aaa.txt
$ docker run --rm -it --gpus all gpt-sovits-cu128-fixed /bin/bash

$ ffmpeg -i /home/spark/DiskD/audio_llm/audio_openai/audio_openai/audio_files/6551d5b0-e35f-43ee-b262-79c55ad548ea.webm \
  -vn -acodec pcm_s16le -ar 44100 -ac 2 samples/hard_way.wav
$ docker compose -f my_docker-compose.yaml up -d
$ docker logs -f GPT-SoVITS-CU128
$ docker stop GPT-SoVITS-CU128
$ docker start GPT-SoVITS-CU128
$ docker restart GPT-SoVITS-CU128
$ docker logs -f GPT-SoVITS-CU128
$ curl -X POST http://localhost:9880/ \
  -H "Content-Type: application/json" \
  -d '{
    "text": "你好,這是中英文語音 Zero-shot TTS 測試",
    "text_language": "zh"
  }' \
  --output out.wav
$ curl -X POST http://localhost:9880/ \
  -H "Content-Type: application/json" \
  -d '{
    "refer_wav_path": "samples/output.wav",
    "prompt_text": "說書相生這種東西仍靠一張嘴,通過語言的結構把看官聽眾吸引到故事裡面,在演出的時候,要求你身上的每個動作都必須要有含義。",
    "prompt_language": "zh",
    "text": "你好,這是中英文語音 Zero-shot TTS 測試",
    "text_language": "zh"
  }' \
  --output out.wav

$ docker exec -it GPT-SoVITS-CU128 bash
(base)# python webui.py 

2026年3月19日 星期四

DGX Spark 之 ConnectX-7

原先在機器上都可發現 ConnectX-7 的蹤跡
但在一次系統更新後,發現 ConnectX-7 消失了

經查詢發現是經由 /etc/nvidia/cx7-hotplug-enabled 檔案,控制是否啟動 ConnectX-7
spark@gx10-spark:~$ ls -al /etc/nvidia/cx7-hotplug-enabled 
-rw-r--r-- 1 root root 278 Mar 12 10:06 /etc/nvidia/cx7-hotplug-enabled
spark@gx10-spark:~$ cat /etc/nvidia/cx7-hotplug-enabled 
# CX7 Hotplug Configuration
# This file controls CX7 hotplug functionality on DGX Spark systems.
# Presence of this file: enables hotplug
# Absence of this file: disables hotplug (safe default)
# To disable hotplug, remove this file or uninstall dgx-spark-mlnx-hotplug package.
spark@gx10-spark:~$ 


spark@gx10-spark:~$ ifconfig
br-060c917749b1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.22.0.1  netmask 255.255.0.0  broadcast 172.22.255.255
        ether 52:84:c6:04:50:2e  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

br-69dc297fd5bc: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.21.0.1  netmask 255.255.0.0  broadcast 172.21.255.255
        ether ae:8d:e1:e3:d8:a4  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether fa:8e:d1:c2:3a:4c  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 31 overruns 0  carrier 0  collisions 0

enP2p1s0f0np0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 30:c5:99:40:83:24  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enP2p1s0f1np1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 30:c5:99:40:83:25  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enP7s7: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.0.108  netmask 255.255.255.0  broadcast 192.168.0.255
        inet6 fe80::2b9:90bd:9766:aed  prefixlen 64  scopeid 0x20<link>
        ether 30:c5:99:40:83:1f  txqueuelen 1000  (Ethernet)
        RX packets 31075  bytes 1955490 (1.9 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 44295  bytes 34158888 (34.1 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 81  

enp1s0f0np0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 30:c5:99:40:83:20  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp1s0f1np1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 30:c5:99:40:83:21  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 46099  bytes 33381287 (33.3 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 46099  bytes 33381287 (33.3 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

wlP9s9: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 50:bb:b5:a4:24:84  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

$ sudo lshw -class network -short
H/W path Device Class Description
========================================================
/0/100/0 enp1s0f0np0 network MT2910 Family [ConnectX-7]
/0/100/0.1 enp1s0f1np1 network MT2910 Family [ConnectX-7]
/0/101/0 enP2p1s0f0np0 network MT2910 Family [ConnectX-7]
/0/101/0.1 enP2p1s0f1np1 network MT2910 Family [ConnectX-7]
/0/103/0      enP7s7          network        Realtek Semiconductor Co., Ltd.
/0/104/0      wlP9s9          network        MEDIATEK Corp.

spark@gx10-spark:~$ sudo fwupdmgr get-devices
ASUSTeK COMPUTER INC. GX10
├─ESL01TBTLCZ-27J2-TYN:
│     Device ID:          7de5ffdca08fa52d95fd4bb42aa5d07a4b35d2dd
│     Summary:            NVM Express solid state drive
│     Current version:    ERFM12.0
│     Vendor:             Phison Electronics Corporation (NVME:0x1987)
│     Serial Number:      511250702501001126
│     GUIDs:              3d29962d-a81b-5b11-bd43-aec65c7e9e60 ← NVME\VEN_1987&DEV_5027
│                         18fbf8a9-d429-57e9-b174-ea8afd7e6877 ← NVME\VEN_1987&DEV_5027&SUBSYS_19875027
│                         ed9808fe-4f78-5d97-ab19-c0e627af31bf ← ESL01TBTLCZ-27J2-TYN
│     Device Flags:       • Internal device
│                         • Updatable
│                         • System requires external power source
│                         • Needs shutdown after installation
│                         • Device is usable for the duration of the update
│   
├─Embedded Controller:
│     Device ID:          798397787f4afcf1e2bb8575cb19630f12180584
│     Summary:            UEFI System Resource Table device (Updated via capsule-on-disk)
│     Current version:    0x02000004
│     Minimum Version:    0x01000000
│     Vendor:             Asus (DMI:American Megatrends International, LLC.)
│     Update State:       Success
│     GUID:               c3cccaf0-9a3e-4ee2-992e-9f0cf9b55fa0
│     Device Flags:       • Internal device
│                         • Updatable
│                         • System requires external power source
│                         • Supported on remote server
│                         • Needs a reboot after installation
│                         • Device is usable for the duration of the update
│                         • Signed Payload
│     Device Requests:    • Message
│   
├─MT2910 Family [ConnectX-7]:
│     Device ID:          ce4c74a5188d5b9cdb1e72ed32dad2d313c1c999
│     Current version:    01
│     Vendor:             Mellanox Technologies (PCI:0x15B3, PCI:0x10DE)
│     GUIDs:              12029307-5bb1-5200-99a5-536f1be9d081 ← PCI\VEN_15B3&DEV_1021
│                         b5e95689-ad65-5e57-8778-897f04396256 ← PCI\VEN_15B3&DEV_1021&SUBSYS_15B321EC
│                         cfc0de0b-adb3-5060-ba22-e4010a78368f ← PCI\VEN_10DE&DEV_22CE
│                         59007998-a3d7-54a3-b30e-eb3b77e2f351 ← PCI\VEN_10DE&DEV_22CE&SUBSYS_15B321EC
│     Device Flags:       • Internal device
│                         • Cryptographic hash verification is available
│   
├─MT2910 Family [ConnectX-7]:
│     Device ID:          7d29f2075dcafb4488b40c73f199cf46bb76bddb
│     Current version:    01
│     Vendor:             Mellanox Technologies (PCI:0x15B3, PCI:0x10DE)
│     GUIDs:              12029307-5bb1-5200-99a5-536f1be9d081 ← PCI\VEN_15B3&DEV_1021
│                         b5e95689-ad65-5e57-8778-897f04396256 ← PCI\VEN_15B3&DEV_1021&SUBSYS_15B321EC
│                         cfc0de0b-adb3-5060-ba22-e4010a78368f ← PCI\VEN_10DE&DEV_22CE
│                         59007998-a3d7-54a3-b30e-eb3b77e2f351 ← PCI\VEN_10DE&DEV_22CE&SUBSYS_15B321EC
│     Device Flags:       • Internal device
│                         • Cryptographic hash verification is available
│   
├─MT2910 Family [ConnectX-7]:
│     Device ID:          024ec185fcba9289f4336862423686455165d68a
│     Current version:    01
│     Vendor:             Mellanox Technologies (PCI:0x15B3, PCI:0x10DE)
│     GUIDs:              12029307-5bb1-5200-99a5-536f1be9d081 ← PCI\VEN_15B3&DEV_1021
│                         b5e95689-ad65-5e57-8778-897f04396256 ← PCI\VEN_15B3&DEV_1021&SUBSYS_15B321EC
│                         cfc0de0b-adb3-5060-ba22-e4010a78368f ← PCI\VEN_10DE&DEV_22CE
│                         59007998-a3d7-54a3-b30e-eb3b77e2f351 ← PCI\VEN_10DE&DEV_22CE&SUBSYS_15B321EC
│     Device Flags:       • Internal device
│                         • Cryptographic hash verification is available
│   
├─MT2910 Family [ConnectX-7]:
│     Device ID:          fd0f3bbe941288a4198e7476ae94fd87b6e58b15
│     Current version:    01
│     Vendor:             Mellanox Technologies (PCI:0x15B3, PCI:0x10DE)
│     GUIDs:              12029307-5bb1-5200-99a5-536f1be9d081 ← PCI\VEN_15B3&DEV_1021
│                         b5e95689-ad65-5e57-8778-897f04396256 ← PCI\VEN_15B3&DEV_1021&SUBSYS_15B321EC
│                         cfc0de0b-adb3-5060-ba22-e4010a78368f ← PCI\VEN_10DE&DEV_22CE
│                         59007998-a3d7-54a3-b30e-eb3b77e2f351 ← PCI\VEN_10DE&DEV_22CE&SUBSYS_15B321EC
│     Device Flags:       • Internal device
│                         • Cryptographic hash verification is available
│   
├─SV300S37A48:
│     Device ID:          df2bc95dfb8bd5d535f85cdf9ad662d25bc8bda6
│     Summary:            SCSI device
│     Current version:    8a
│     Vendor:             KINGSTON (SCSI:KINGSTON)
│     Serial Number:      50026b726204f2e9
│     GUIDs:              074d3e05-f8d3-5fbe-8b98-b37df122f06c ← SCSI\VEN_KINGSTON&DEV_SV300S37A48
│                         167b2441-5d5d-538c-bc01-b49059831d58 ← SCSI\VEN_KINGSTON&DEV_SV300S37A48&REV_8a
│     Device Flags:       • Internal device
│   
├─UEFI Device Firmware:
│     Device ID:          1df564c6ffffdc355893f9d0ec29813e0a1141b5
│     Summary:            UEFI System Resource Table device (Updated via capsule-on-disk)
│     Current version:    0x03000005
│     Minimum Version:    0x02000000
│     Vendor:             Asus (DMI:American Megatrends International, LLC.)
│     Update State:       Success
│     GUID:               f1392323-3920-4598-a932-ef06360cf403
│     Device Flags:       • Internal device
│                         • Updatable
│                         • System requires external power source
│                         • Supported on remote server
│                         • Needs a reboot after installation
│                         • Device is usable for the duration of the update
│                         • Signed Payload
│     Device Requests:    • Message
│   
├─UEFI Device Firmware:
│     Device ID:          7fd0410f4194ca73faaa60d3392bcd91ef4ee070
│     Summary:            UEFI System Resource Table device (Updated via capsule-on-disk)
│     Current version:    0x00000507
│     Minimum Version:    0x00000309
│     Vendor:             Asus (DMI:American Megatrends International, LLC.)
│     Update State:       Success
│     GUID:               fe75bb1c-5ccc-4936-b603-cc7cf945dc30
│     Device Flags:       • Internal device
│                         • Updatable
│                         • System requires external power source
│                         • Supported on remote server
│                         • Needs a reboot after installation
│                         • Device is usable for the duration of the update
│                         • Signed Payload
│     Device Requests:    • Message
│   
├─UEFI dbx:
│     Device ID:          362301da643102b9f38477387e2193e57abaa590
│     Summary:            UEFI revocation database
│     Current version:    20230501
│     Minimum Version:    20230501
│     Vendor:             UEFI:Microsoft
│     Install Duration:   1 second
│     GUIDs:              026c46fa-db36-5397-883d-047809df980a ← UEFI\CRT_103560ADA2E78C48DDA52A2D71A00FC1D30F469E1E20332FDA01CDE9B796B049&ARCH_AA64
│                         67d35028-ca5b-5834-834a-f97380381082 ← UEFI\CRT_A1117F516A32CEFCBA3F2D1ACE10A87972FD6BBE8FE0D0B996E09E65D802A503&ARCH_AA64
│                         10ec82f4-ff64-5362-9e5d-688febf5dbb0 ← UEFI\CRT_3CD3F0309EDAE228767A976DD40D9F4AFFC4FBD5218F2E8CC3C9DD97E8AC6F9D&ARCH_AA64
│     Device Flags:       • Internal device
│                         • Updatable
│                         • Needs a reboot after installation
│                         • Device is usable for the duration of the update
│                         • Only version upgrades are allowed
│                         • Signed Payload
│   
└─Unifying Receiver:
      Device ID:          ecbb086d2f75882bb4cd0f6bbd0df5ffba00cd39
      Summary:            Miniaturised USB wireless receiver
      Current version:    RQR12.10_B0032
      Bootloader Version: BOT01.02_B0014
      Vendor:             Logitech, Inc. (HIDRAW:0x046D, USB:0x046D)
      Install Duration:   30 seconds
      GUIDs:              9d131a0c-a606-580f-8eda-80587250b8d6
                          279ed287-3607-549e-bacc-f873bb9838c4 ← HIDRAW\VEN_046D&DEV_C52B
      Device Flags:       • Updatable
                          • Supported on remote server
                          • Unsigned Payload



orin 上,之空間不足

# 找尋占空間之檔案
$ sudo du -sh /* 2>/dev/null | sort -hr | head -n 10
$ sudo du -sh /home/* 2>/dev/null | sort -hr | head -n 10

# 清空 Docker 的 cache
$ docker system df
$ docker builder prune

# Docker 資料搬家
$ sudo systemctl stop docker
$ sudo systemctl stop docker.socket
$ sudo rsync -aqxP /var/lib/docker/ /mnt/Data/docker_data
$ sudo vi /etc/docker/daemon.json 
{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    },
    "data-root": "/mnt/Data/docker-data",
    "iptables": false,
    "bridge": "docker0"
}
$ sudo systemctl stop containerd
$ sudo vi /etc/containerd/config.toml
root = "/mnt/Data/containerd"

$ sudo mkdir -p /mnt/Data/containerd
$ sudo rsync -avz /var/lib/containerd/ /mnt/Data/containerd/
$ sudo systemctl start containerd
$ sudo systemctl start docker
$ sudo systemctl start docker.socket

Orin 上安裝 Ollama 相關套件

參考 https://yingrenn.blogspot.com/2026/03/ollama-open-webui-searxng-redis-caddy.html

$ >searxng/limiter.toml
$ vi searxng/settings.yml
use_default_settings: true

search:
  formats:
    - html
    - json

server:
  # 安全起見,secret_key 可用 openssl rand -hex 32 產生
  secret_key: "c199725f396362fd99ad0e3239fbb5be9d01c04083cffb7e16d50301c67288ee"
  limiter: false
  image_proxy: true
  real_ip: true # 如果你有用 Nginx/Caddy,這行必須為 true
  limiter: false # 關閉內建限制器,limiter.toml 保持空檔

# 處理 Bot 檢測報錯的核心設定
bot_detection:
  ip_limit:
    filter_link_local: true

valkey:
  url: redis://redis:6379/0

engines:
  - name: ahmia
    disabled: true
  - name: torch
    disabled: true

$ vi Caddyfile
{$SEARXNG_HOSTNAME} {
    encode gzip zstd
    header {
        Strict-Transport-Security "max-age=31536000;"
        X-Content-Type-Options "nosniff"
        X-Frame-Options "SAMEORIGIN"
        Referrer-Policy "no-referrer"
    }
    #handle /searxng* {
    #    uri strip_prefix /searxng
    #    reverse_proxy searxng:8080
    #}
    #handle /litellm* {
    #    uri strip_prefix /litellm
    #    reverse_proxy litellm:4000
    #}
    handle_path /searxng* {
        reverse_proxy searxng:8080
    }
    handle_path /litellm* {
        reverse_proxy litellm:4000
    }
    handle {
        reverse_proxy open-webui:8080
    }
}

$ vi docker-compose-llm.yaml
version: '3.8'
services:
  caddy:
    container_name: caddy
    image: docker.io/library/caddy:2-alpine
    networks:
      - webnet
    ports:
      - "80:80"   # HTTP 埠號
      - "443:443" # HTTPS 埠號
      - "443:443/udp" # HTTP/3 支援
    restart: unless-stopped
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile:ro
      - caddy-data:/data:rw
      - caddy-config:/config:rw
    environment:
      - SEARXNG_HOSTNAME=${SEARXNG_HOSTNAME:-localhost}
      - SEARXNG_TLS=${LETSENCRYPT_EMAIL:-internal}

  redis:
    container_name: redis
    image: docker.io/valkey/valkey:8-alpine
    command: valkey-server --save 30 1 --loglevel warning
    restart: unless-stopped
    networks:
      - webnet
    volumes:
      - valkey-data2:/data

  searxng:
    container_name: searxng
    image: docker.io/searxng/searxng:latest
    restart: unless-stopped
    networks:
      - webnet
    ports:
      - "0.0.0.0:8888:8080"
    volumes:
      - ./searxng:/etc/searxng:rw
      - searxng-log:/var/cache/searxng:rw
    environment:
      - SEARXNG_BASE_URL=https://${SEARXNG_HOSTNAME:-localhost}/searxng/
      #- SEARXNG_BASE_URL=https://${SEARXNG_HOSTNAME:-localhost}/

  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    container_name: litellm
    restart: unless-stopped
    ports:
      - "4000:4000"
    volumes:
      - ./litellm_config.yaml:/app/config.yaml:ro
    command: --config /app/config.yaml --detailed_debug --num_workers 4
    networks:
      - webnet
    environment:
      - LITELLM_MASTER_KEY=${LITELLM_KEY}
    env_file:
      - .env

  open-webui:
    image: ghcr.io/open-webui/open-webui:cuda
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "8080:8080"
    volumes:
      - open-webui:/app/backend/data
      - ${HOME_CACHE:-~/.cache}/huggingface:/root/.cache/huggingface
    environment:
      - ENABLE_WEB_SEARCH=true
      - WEB_SEARCH_ENGINE=searxng
      - SEARXNG_URL=http://searxng:8080
      - OPENAI_API_BASE_URL=http://litellm:4000/v1
      - OPENAI_API_KEY=${LITELLM_KEY:-0p3n-w3bu!}
    networks:
      - webnet
      
networks:
  webnet:
    external: true

volumes:
  caddy-data:
  caddy-config:
  valkey-data2:
  searxng-log:
  open-webui:

$ vi .env
SEARXNG_HOSTNAME=www.abcd.com.tw
LETSENCRYPT_EMAIL=abcd@gmail.com
HOME_CACHE=/home/mic-733ao/.cache
LITELLM_KEY=abcd


$ vi sta_llm.sh
#!/bin/bash

# 1. 檢查 .env 檔案是否存在 (避免啟動失敗)
if [ ! -f .env ]; then
    echo "❌ 錯誤: 找不到 .env 檔案,請先建立它。"
    exit 1
fi

# 2. 執行 Docker Compose
echo "🚀 正在啟動 LLM 服務..."
docker compose -f docker-compose-llm.yaml up -d

# 3. 檢查啟動狀態
if [ $? -eq 0 ]; then
    echo "✅ 服務已成功在背景執行!"
    echo "使用 'docker compose -f docker-compose-llm.yaml logs -f' 查看全部日誌。"
    echo "使用 'docker compose -f docker-compose-llm.yaml logs -f litellm' 查看 litellm 日誌。"
else
    echo "❌ 啟動失敗,請檢查配置。"
fi

$ docker pull docker.io/library/caddy:2-alpine
$ docker pull docker.io/valkey/valkey:8-alpine
$ docker pull docker.io/searxng/searxng:latest
$ docker pull ghcr.io/berriai/litellm:main-latest
$ docker pull ghcr.io/open-webui/open-webui:cuda

$ docker network create webnet
$ docker compose -f docker-compose-llm.yaml up -d
$ docker compose -f docker-compose-llm.yaml up -d --force-recreate

# 瀏覽器開啟 open-webui
# 右上圖像/Admin Panel/Settings/Web Search
Web Search: ON
Web Search Engine: searxng
Searxng Query URL: http://searxng:8080
Searxng search language: all

# 測試 searxng 網路
$ docker exec -it caddy ping searxng
# 瀏覽器開啟 http://localhost:8888
$ curl "http://localhost:8888/search?q=test&format=json"
$ curl "http://192.168.0.107:8888/search?q=test&format=json"
$ curl "https://www.abcd.com.tw/searxng/search?q=test&format=json"

$ curl -s http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer abcd" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3.5-9B",
    "messages": [{"role": "user", "content": "請你自我介紹"}],
    "max_tokens": 64
  }'
$ curl http://192.168.0.107:4000/v1/models \
  -H "Authorization: Bearer abcd"
$ curl -s http://192.168.0.107:4000/v1/chat/completions \
  -H "Authorization: Bearer abcd" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3.5-9B",
    "messages": [{"role": "user", "content": "請你自我介紹"}],
    "max_tokens": 64
  }'








Ollama 之 Open-webui + Searxng + Redis + Caddy

參考 https://forums.developer.nvidia.com/t/playbook-1-open-webui-searxng-private-web-search-on-dgx-spark/359578
參考 https://forums.developer.nvidia.com/t/building-local-hybrid-llms-on-dgx-spark-that-outperform-top-cloud-models/359569
參考 https://forums.developer.nvidia.com/t/dgx-spark-rag-on-docker/363125

$ > searxng/limiter.toml
$ vi searxng/settings.yml
use_default_settings: true

search:
  formats:
    - html
    - json

server:
  # 安全起見,secret_key 可用 openssl rand -hex 32 產生
  secret_key: "c199725f396362fd99ad0e3239fbb5be9d01c04083cffb7e16d50301c67288ee"
  limiter: false
  image_proxy: true

valkey:
  url: redis://redis:6379/0

engines:
  - name: ahmia
    disabled: true
  - name: torch
    disabled: true

$ vi docker-compose-llm.yaml
version: '3.8'
services:
  caddy:
    container_name: caddy
    image: docker.io/library/caddy:2-alpine
    networks:
      - webnet
    ports:
      - "80:80"   # HTTP 埠號
      - "443:443" # HTTPS 埠號
      - "443:443/udp" # HTTP/3 支援
    restart: unless-stopped
    volumes:
      - ./caddy/Caddyfile:/etc/caddy/Caddyfile:ro
      - caddy-data:/data:rw
      - caddy-config:/config:rw
    environment:
      - SEARXNG_HOSTNAME=${SEARXNG_HOSTNAME:-localhost}
      - SEARXNG_TLS=${LETSENCRYPT_EMAIL:-internal}

  redis:
    container_name: redis
    image: docker.io/valkey/valkey:8-alpine
    command: valkey-server --save 30 1 --loglevel warning
    restart: unless-stopped
    networks:
      - webnet
    volumes:
      - valkey-data2:/data

  searxng:
    container_name: searxng
    image: docker.io/searxng/searxng:latest
    restart: unless-stopped
    networks:
      - webnet
    ports:
      - "0.0.0.0:8888:8080"
    volumes:
      - ./searxng:/etc/searxng:rw
      - searxng-log:/var/cache/searxng:rw
    environment:
      - SEARXNG_BASE_URL=https://${SEARXNG_HOSTNAME:-localhost}/searxng/
      #- SEARXNG_BASE_URL=https://${SEARXNG_HOSTNAME:-localhost}/

  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    container_name: litellm
    restart: unless-stopped
    ports:
      - "4000:4000"
    volumes:
      - ./litellm_config.yaml:/app/config.yaml:ro
    command: --config /app/config.yaml --detailed_debug --num_workers 4
    networks:
      - webnet
    environment:
      - LITELLM_MASTER_KEY=${LITELLM_KEY}
    env_file:
      - .env

  open-webui:
    image: ghcr.io/open-webui/open-webui:cuda
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "8080:8080"
    volumes:
      - open-webui:/app/backend/data
      - ${HOME_CACHE:-~/.cache}/huggingface:/root/.cache/huggingface
    environment:
      - ENABLE_WEB_SEARCH=true
      - WEB_SEARCH_ENGINE=searxng
      - SEARXNG_URL=http://searxng:8080
      - OPENAI_API_BASE_URL=http://litellm:4000/v1
      - OPENAI_API_KEY=${LITELLM_KEY:-0p3n-w3bu!}
    networks:
      - webnet
      
networks:
  webnet:
    external: true

volumes:
  caddy-data:
  caddy-config:
  valkey-data2:
  searxng-log:
  open-webui:

$ vi caddy/Caddyfile
{$SEARXNG_HOSTNAME} {
    encode gzip zstd
    header {
        Strict-Transport-Security "max-age=31536000;"
        X-Content-Type-Options "nosniff"
        X-Frame-Options "SAMEORIGIN"
        Referrer-Policy "no-referrer"
    }
    #handle /searxng* {
    #    uri strip_prefix /searxng
    #    reverse_proxy searxng:8080
    #}
    #handle /litellm* {
    #    uri strip_prefix /litellm
    #    reverse_proxy litellm:4000
    #}
    handle_path /searxng* {
        reverse_proxy searxng:8080
    }
    handle_path /litellm* {
        reverse_proxy litellm:4000
    }
    handle {
        reverse_proxy open-webui:8080
    }
}

$ vi litellm_config.yaml
model_list:
  - model_name: Nemotron-3-Nano-30B-A3B # 這是你在 Open WebUI 選單中會看到的名稱
    litellm_params:
      model: openai/Nemotron-3-Nano-30B-A3B  # 這裡填寫你 vLLM 載入的模型完整路徑或名稱
      api_base: http://vllm-Nemotron-3-Nano-30B-A3B:8000/v1 # vLLM 的 API 地址
      health_check_url: http://vllm-Nemotron-3-Nano-30B-A3B:8000/health
      api_key: "not-needed" # vLLM 預設不需 key,但 litellm 要求必填
      rpm: 10 # 每分鐘請求限制 (選填)
  - model_name: Qwen3.5-35B-A3B # 這是你在 Open WebUI 選單中會看到的名稱
    litellm_params:
      model: openai/Qwen3.5-35B-A3B  # 這裡填寫你 vLLM 載入的模型完整路徑或名稱
      api_base: http://vllm-Qwen3.5-35B-A3B:8000/v1 # vLLM 的 API 地址
      health_check_url: http://vllm-Qwen3.5-35B-A3B:8000/health
      api_key: "not-needed" # vLLM 預設不需 key,但 litellm 要求必填
      rpm: 10 # 每分鐘請求限制 (選填)
  - model_name: GLM-4.7-Flash # 這是你在 Open WebUI 選單中會看到的名稱
    litellm_params:
      model: openai/GLM-4.7-Flash  # 這裡填寫你 vLLM 載入的模型完整路徑或名稱
      api_base: http://vllm-GLM-4.7-Flash:8000/v1 # vLLM 的 API 地址
      health_check_url: http://vllm-GLM-4.7-Flash:8000/health
      api_key: "not-needed" # vLLM 預設不需 key,但 litellm 要求必填
      rpm: 10 # 每分鐘請求限制 (選填)
  - model_name: Qwen3.5-9B # 這是你在 Open WebUI 選單中會看到的名稱
    litellm_params:
      model: openai/Qwen3.5-9B  # 這裡填寫你 vLLM 載入的模型完整路徑或名稱
      api_base: http://192.168.0.107:8080/v1 # vLLM 的 API 地址
      api_key: "not-needed" # vLLM 預設不需 key,但 litellm 要求必填
      rpm: 10 # 每分鐘請求限制 (選填)

litellm_settings:
  drop_params: True       # 如果 Open WebUI 傳送了 vLLM 不支援的參數,自動剔除避免報錯
  set_verbose: False      # 若要除錯可改為 True 查看詳細日誌

#general_settings:
#  #master_key: ${LITELLM_KEY} # 對應你 .env 裡的 OPENAI_API_KEY (0p3n-w3bu!)
#  master_key: asdfasdf23

$ vi .env
SEARXNG_HOSTNAME=www.fwwrcom.tw
LETSENCRYPT_EMAIL=ewr@gmail.com
HOME_CACHE=/home/spark/.cache
LITELLM_KEY=asdfasdf23

$ vi sta_llm.sh
#!/bin/bash

# 1. 檢查 .env 檔案是否存在 (避免啟動失敗)
if [ ! -f .env ]; then
    echo "❌ 錯誤: 找不到 .env 檔案,請先建立它。"
    exit 1
fi

# 2. 執行 Docker Compose
echo "🚀 正在啟動 LLM 服務..."
docker compose -f docker-compose-llm.yaml up -d

# 3. 檢查啟動狀態
if [ $? -eq 0 ]; then
    echo "✅ 服務已成功在背景執行!"
    echo "使用 'docker compose -f docker-compose-llm.yaml logs -f' 查看全部日誌。"
    echo "使用 'docker compose -f docker-compose-llm.yaml logs -f litellm' 查看 litellm 日誌。"
else
    echo "❌ 啟動失敗,請檢查配置。"
fi


$ docker compose -f docker-compose-llm.yaml up -d
$ docker compose -f docker-compose-llm.yaml up -d --force-recreate
# 測試 searxng 網路
$ docker exec -it caddy ping searxng
# 瀏覽器開啟 http://localhost:8888
$ curl "http://localhost:8888/search?q=test&format=json"
$ curl "http://192.168.0.108:8888/search?q=test&format=json"
$ curl "https://www.fwwrcom.com.tw/searxng/search?q=test&format=json"

$ curl "http://192.168.0.108:8888/"
$ curl "https://www.fwwrcom.com.tw/searxng/"

$ curl http://192.168.0.108:4000/v1/models \
  -H "Authorization: Bearer asdfasdf23"
$ curl -s http://192.168.0.108:4000/v1/chat/completions \
  -H "Authorization: Bearer asdfasdf23" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3.5-35B-A3B",
    "messages": [{"role": "user", "content": "請你自我介紹"}],
    "max_tokens": 64
  }'

$ curl -i https://www.fwwrcom.com.tw/litellm/v1/chat/completions \
  -H "Authorization: Bearer asdfasdf23" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3.5-35B-A3B",
    "messages": [{"role": "user", "content": "請你自我介紹"}],
    "max_tokens": 64
  }'


2026年3月12日 星期四

DGX Spark 如何避免 OOM 當機

# 參考 https://forums.developer.nvidia.com/t/mitigating-oom-system-freezes-on-uma-based-single-board-computers/362769
# 另外可參考 DGX Spark 之溫度 https://yingrenn.blogspot.com/2026/02/dgx-spark.html

# 安裝輕量級的 Dropbear SSH
$ sudo apt update && sudo apt install dropbear
$ sudo vi /etc/default/dropbear
NO_START=0
DROPBEAR_PORT=2222

$ sudo systemctl enable dropbear
$ sudo systemctl start dropbear

# Standard connection (OpenSSH)
$ ssh spark@<your-ip>
# Emergency connection (Dropbear)
$ ssh spark@<your-ip> -p 2222

# 安裝 earlyoom
$ sudo apt update
$ sudo apt install earlyoom
$ sudo vi /etc/default/earlyoom
EARLYOOM_ARGS="-m 5 -s 10 --avoid 'pipewire|wireplumber|systemd|ssh|journald' --prefer 'vllm|python|triton'"
# This tells earlyoom to intervene when RAM is under 5% AND Swap is under 10%. 
# It will aggressively target vllm or Python scripts over other processes

$ sudo EDITOR=vi systemctl edit earlyoom
### Editing /etc/systemd/system/earlyoom.service.d/override.conf
### Anything between here and the comment below will become the contents of the drop-in file

[Service]
LimitMEMLOCK=infinity
CapabilityBoundingSet=CAP_IPC_LOCK CAP_SYS_NICE CAP_KILL
AmbientCapabilities=CAP_IPC_LOCK CAP_SYS_NICE CAP_KILL
MemoryLock=infinity
OOMScoreAdjust=-1000

### Edits below this comment will be discarded


### /usr/lib/systemd/system/earlyoom.service
# [Unit]
# Description=Early OOM Daemon
# Documentation=man:earlyoom(1) https://github.com/rfjakob/earlyoom
# [Service]
# EnvironmentFile=-/etc/default/earlyoom
# ExecStart=/usr/bin/earlyoom $EARLYOOM_ARGS
# # Run as an unprivileged user with random user id
# DynamicUser=true
# # Allow killing processes and calling mlockall()
# AmbientCapabilities=CAP_KILL CAP_IPC_LOCK
# # We don't need write access anywhere
# ProtectSystem=strict
# # We don't need /home at all, make it inaccessible
# ProtectHome=true
# # earlyoom never exits on it's own, so have systemd
# # restart it should it get killed for some reason.
# Restart=always
# # set memory limits and max tasks number
# TasksMax=10
# MemoryMax=50M
# [Install]
# WantedBy=multi-user.target

$ sudo systemctl daemon-reload
$ sudo systemctl restart earlyoom
$ sudo systemctl status earlyoom

# 查詢 log
$ journalctl -u earlyoom -f

$ cat /etc/systemd/system/earlyoom.service.d/override.conf 
[Service]
LimitMEMLOCK=infinity
CapabilityBoundingSet=CAP_IPC_LOCK CAP_SYS_NICE CAP_KILL
AmbientCapabilities=CAP_IPC_LOCK CAP_SYS_NICE CAP_KILL
MemoryLock=infinity
OOMScoreAdjust=-1000
$ ps aux |grep earlyoom
earlyoom   80791  0.0  0.0   2288  1688 ?        SLs  11:55   0:00 /usr/bin/earlyoom -m 5 -s 10 --avoid pipewire|wireplumber|systemd|ssh|journald --prefer vllm|python|triton
$ cat /proc/$(pgrep earlyoom)/oom_score_adj
-1000