參考 https://github.com/ollama/ollama
參考 https://hub.docker.com/r/ollama/ollama
參考 https://www.53ai.com/news/OpenSourceLLM/2024072585037.html
$ docker run -d --gpus=all -p 11434:11434 --name ollama \
-v /mnt/Data/ollama/ollama_volume:/root/.ollama \
ollama/ollama
$ docker exec -it ollama ollama run deepseek-r1
$ git clone https://github.com/ggerganov/llama.cpp.git
$ cd llama.cpp
$ cmake -B build
$ cmake --build build --config Release
$ pip install huggingface_hub
轉換 huggingface 上的 model, 成為 GGUF 格式
vi download.py
from huggingface_hub import snapshot_download, login
login("hf_BqLATKBqbVzOWNBJcFMwHKzCJfu")
# 下载模型
snapshot_download(
"taide/Llama-3.1-TAIDE-LX-8B-Chat",
local_dir="taide_Llama-3.1-TAIDE-LX-8B-Chat",
local_dir_use_symlinks=False,
ignore_patterns=["*.gguf"]
)
$ vi convert_hf_to_gguf_update.py
在 models 中, 加入下行, 注意 TOKENIZER_TYPE 的選擇
{"name": "taide_Llama-3.1-TAIDE-LX-8B-Chat", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/taide/Llama-3.1-TAIDE-LX-8B-Chat"},
{"name": "yentinglin_Llama-3-Taiwan-8B-Instruct", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/yentinglin/Llama-3-Taiwan-8B-Instruct"},
$ python convert_hf_to_gguf_update.py hf_BqLATKBqbVzOWNBJcFMwHKzCJfu
$ python convert_hf_to_gguf.py taide_Llama-3.1-TAIDE-LX-8B-Chat --outtype f16 --outfile taide_Llama-3.1-TAIDE-LX-8B-Chat.fp16.gguf
$ llama.cpp/build/bin/llama-quantize taide_Llama-3.1-TAIDE-LX-8B-Chat.fp16.gguf Q4_K_M
$ mv ggml-model-Q4_K_M.gguf taide_Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf
$ vi Modelfile.taide-8b
FROM ./yentinglin_Llama-3-Taiwan-8B-Instruct.Q4_K_M.gguf
# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# set the system message
SYSTEM """
我是一個萬事通
"""
$ docker exec -it ollama /bin/bash
# cd /root/.ollama
# ollama create taide-8b -f ./Modelfile.taide-8b
# ollama list
# ollama show taide-8b
# ollama rm taide-8b
# ollama ps
# ollama run taide-8b
>>> /bye
# OLLAMA_HOST=127.0.0.1:11434 ollama serve
$ curl http://localhost:11434/api/generate -d '{
"model": "yentinglin-8b",
"prompt": "建議適合ai的程式語言"
}'
$ curl http://localhost:11434/api/generate -d '{
"model": "yentinglin-8b",
"prompt": "建議適合ai的程式語言",
"stream", false
}'
$ curl http://localhost:11434/api/chat -d '{
"model": "yentinglin-8b",
"messages": [
{"role": "user", "content": "建議適合ai的程式語言"}
]
}'
$ curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "yentinglin-8b",
"messages": [
{
"role": "system",
"content": "你是一個萬事通"
},
{
"role": "user",
"content": "眼睛酸痛,怎麼辦?"
}
]
}'
$ docker logs ollama
$ python ../llama.cpp/convert_hf_to_gguf.py yentinglin_Llama-3-Taiwan-8B-Instruct --outtype f16 --outfile yentinglin_Llama-3-Taiwan-8B-Instruct.fp16.gguf
$ llama.cpp/build/bin/llama-quantize yentinglin_Llama-3-Taiwan-8B-Instruct.fp16.gguf Q4_K_M
建議適合ai的程式語言
$ docker run -d -p 3000:8080 --gpus all \
--add-host=host.docker.internal:host-gateway \
-v /mnt/Data/ollama/open-webui_volume:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:cuda
Firefox Web Browser 輸入 http://localhost:3000
出現 This address is restricted 錯誤
進入 Firefox Web Browser 設定
網址列輸入 about:config, 按 "Accept the Risk and Continue" 按鈕
在收尋欄輸入 network.security.ports.banned.override, 點選 "String", 按 +
輸入 port 3000, 按 V
重新載入 http://localhost:3000
chrome 設定
chrome://flags/#unsafely-treat-insecure-origin-as-secure
輸入網址 http://localhost:3000
安裝 nginx
參考 https://docs.openwebui.com/tutorials/https-nginx/
參考 https://yingrenn.blogspot.com/2020/07/ssl-nginx.html
vi nginx.conf
server {
listen 443 ssl;
server_name www.domain.com.tw;
ssl_certificate /etc/nginx/conf/Certs/server.pem;
ssl_certificate_key /etc/nginx/conf/Certs/server.key;
ssl_trusted_certificate /etc/nginx/conf/Certs/caChain.crt;
ssl_stapling on;
ssl_stapling_verify on;
ssl_session_timeout 5m;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:HIGH:!aNULL:!MD5:!RC4:!DHE;
ssl_prefer_server_ciphers on;
location / {
proxy_set_header HOST $host;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_pass http://host.docker.internal:3000;
# Add WebSocket support (Necessary for version 0.5.0 and up)
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# (Optional) Disable proxy buffering for better streaming response from models
proxy_buffering off;
}
}
server {
listen 80;
server_name www.domain.com.tw;
return 301 https://$host$request_uri;
}
docker run -itd --name nginx \
-p 80:80 -p 443:443 \
--add-host=host.docker.internal:host-gateway \
-v /mnt/Data/ollama/nginx/conf.d/nginx.conf:/etc/nginx/conf.d/nginx.conf \
-v /mnt/Data/ollama/nginx/conf:/etc/nginx/conf \
-m 100m library/nginx:latest
https://www.domain.com.tw