AivisSpeech-Engine
AivisSpeech Engine: AI Voice Imitation System - Text to Speech Engine
Stars: 76
README:
ð AivisSpeech Engine: AI Voice Imitation System - Text to Speech Engine
AivisSpeech Engine ã¯ãVOICEVOX ENGINE ãããŒã¹ã«ãããæ¥æ¬èªé³å£°åæãšã³ãžã³ã§ãã
æ¥æ¬èªé³å£°åæãœãããŠã§ã¢ã® AivisSpeech ã«çµã¿èŸŒãŸããŠãããããããã«ææ
è±ããªé³å£°ãçæã§ããŸãã
- ãŠãŒã¶ãŒã®æ¹ãž
- åäœç°å¢
- ãµããŒããããŠããé³å£°åæã¢ãã«
- å°å ¥æ¹æ³
- é³å£°åæ API ã䜿ã
- VOICEVOX API ãšã®äºææ§ã«ã€ããŠ
- ãããã質å / Q&A
- éçºæ¹é
- éçºç°å¢ã®æ§ç¯
- éçº
- ã©ã€ã»ã³ã¹
AivisSpeech ã®äœ¿ãæ¹ããæ¢ãã®æ¹ã¯ãAivisSpeech å ¬åŒãµã€ã ãã芧ãã ããã
ãã®ããŒãžã§ã¯ãäž»ã«éçºè
åãã®æ
å ±ãæ²èŒããŠããŸãã
以äžã¯ãŠãŒã¶ãŒã®æ¹åãã®ããã¥ã¡ã³ãã§ãã
Windowsã»macOSã»Linux æèŒã® PC ã«å¯Ÿå¿ããŠããŸãã
AivisSpeech Engine ãèµ·åããã«ã¯ãPC ã« 1.5GB 以äžã®ç©ºãã¡ã¢ãª (RAM) ãå¿
èŠã§ãã
- Windows: Windows 10 (22H2 以é)ã»Windows 11
- macOS: macOS 13 Ventura 以é
- Linux: Ubuntu 20.04 以é
[!TIP] ãã¹ã¯ãããã¢ããªã§ãã AivisSpeech ã¯ãWindowsã»macOS ã®ã¿ãµããŒã察象ãšããŠããŸãã
äžæ¹ãé³å£°åæ API ãµãŒããŒã§ãã AivisSpeech Engine ã¯ãUbuntu / Debian 系㮠Linux ã§ãå©çšã§ããŸãã
[!NOTE] Intel CPU æèŒ Mac ã§ã®åäœã¯ç©æ¥µçã«æ€èšŒããŠããŸããã
Intel CPU æèŒ Mac ã¯ãã§ã«è£œé ãçµäºããŠãããæ€èšŒç°å¢ããã«ãç°å¢ã®çšæèªäœãé£ãããªã£ãŠããŠããŸãããªãã¹ã Apple Silicon æèŒ Mac ã§ã®å©çšãããããããããŸãã
[!WARNING] Windows 10 ã§ã¯ãããŒãžã§ã³ 22H2 ã§ã®åäœç¢ºèªã®ã¿è¡ã£ãŠããŸãã
ãµããŒããçµäºãã Windows 10 ã®å€ãããŒãžã§ã³ã§ã¯ãAivisSpeech Engine ãã¯ã©ãã·ã¥ãèµ·åã«å€±æããäºäŸãå ±åãããŠããŸãã
ã»ãã¥ãªãã£äžã®èŠ³ç¹ããããWindows 10 ç°å¢ã®æ¹ã¯ãæäœéããŒãžã§ã³ 22H2 ãŸã§æŽæ°ããŠããã®å©çšã匷ãããããããããŸãã
AivisSpeech Engine ã¯ãAIVMX (Aivis Voice Model for ONNX) (æ¡åŒµå .aivmx
) ãã©ãŒãããã®é³å£°åæã¢ãã«ãã¡ã€ã«ããµããŒãããŠããŸãã
AIVM (Aivis Voice Model) / AIVMX (Aivis Voice Model for ONNX) ã¯ãåŠç¿æžã¿ã¢ãã«ã»ãã€ããŒãã©ã¡ãŒã¿ã»ã¹ã¿ã€ã«ãã¯ãã«ã»è©±è ã¡ã¿ããŒã¿ïŒååã»æŠèŠã»ã©ã€ã»ã³ã¹ã»ã¢ã€ã³ã³ã»ãã€ã¹ãµã³ãã« ãªã©ïŒã 1 ã€ã®ãã¡ã€ã«ã«ã®ã¥ããšãŸãšãããAI é³å£°åæã¢ãã«çšãªãŒãã³ãã¡ã€ã«ãã©ãŒãããã§ãã
AIVM ä»æ§ã AIVM / AIVMX ãã¡ã€ã«ã«ã€ããŠã®è©³çŽ°ã¯ãAivis Project ã«ãŠçå®ãã AIVM ä»æ§ ããåç §ãã ããã
[!NOTE]
ãAIVMãã¯ãAIVM / AIVMX äž¡æ¹ã®ãã©ãŒãããä»æ§ã»ã¡ã¿ããŒã¿ä»æ§ã®ç·ç§°ã§ããããŸãã
å ·äœçã«ã¯ãAIVM ãã¡ã€ã«ã¯ãAIVM ã¡ã¿ããŒã¿ãè¿œå ãã Safetensors 圢åŒããAIVMX ãã¡ã€ã«ã¯ãAIVM ã¡ã¿ããŒã¿ãè¿œå ãã ONNX 圢åŒãã®ã¢ãã«ãã¡ã€ã«ã§ãã
ãAIVM ã¡ã¿ããŒã¿ããšã¯ãAIVM ä»æ§ã«å®çŸ©ãããŠãããåŠç¿æžã¿ã¢ãã«ã«çŽã¥ãåçš®ã¡ã¿ããŒã¿ã®ããšããããŸãã
[!IMPORTANT]
AivisSpeech Engine 㯠AIVM ä»æ§ã®ãªãã¡ã¬ã³ã¹å®è£ ã§ããããŸãããæ¢ã㊠AIVMX ãã¡ã€ã«ã®ã¿ããµããŒãããèšèšãšããŠããŸãã
ããã«ãããPyTorch ãžã®äŸåãæé€ããŠã€ã³ã¹ããŒã«ãµã€ãºãåæžããONNX Runtime ã«ããé«é㪠CPU æšè«ãå®çŸããŠããŸãã
[!TIP]
AIVM Generator ã䜿ããšãæ¢åã®é³å£°åæã¢ãã«ãã AIVM / AIVMX ãã¡ã€ã«ãçæããããæ¢åã® AIVM / AIVMX ãã¡ã€ã«ã®ã¡ã¿ããŒã¿ãç·šéãããã§ããŸãïŒ
以äžã®ã¢ãã«ã¢ãŒããã¯ãã£ã® AIVMX ãã¡ã€ã«ãå©çšã§ããŸãã
Style-Bert-VITS2
Style-Bert-VITS2 (JP-Extra)
[!NOTE] AIVM ã¡ã¿ããŒã¿ã®ä»æ§äžã¯å€èšèªå¯Ÿå¿ã®è©±è ãå®çŸ©ã§ããŸãããAivisSpeech Engine 㯠VOICEVOX ENGINE ãšåæ§ã«ãæ¥æ¬èªé³å£°åæã®ã¿ã«å¯Ÿå¿ããŠããŸãã
ãã®ãããè±èªãäžåœèªã«å¯Ÿå¿ããé³å£°åæã¢ãã«ã§ãã£ãŠããæ¥æ¬èªä»¥å€ã®é³å£°åæã¯ã§ããŸããã
AIVMX ãã¡ã€ã«ã¯ãOS ããšã«ä»¥äžã®ãã©ã«ãã«é 眮ããŠãã ããã
-
Windows:
C:\Users\(ãŠãŒã¶ãŒå)\AppData\Roaming\AivisSpeech-Engine\Models
-
macOS:
~/Library/Application Support/AivisSpeech-Engine/Models
-
Linux:
~/.local/share/AivisSpeech-Engine/Models
å®éã®ãã©ã«ããã¹ã¯ãAivisSpeech Engine ã®èµ·åçŽåŸã®ãã°ã« Models directory:
ãšããŠè¡šç€ºãããŸãã
[!TIP]
AivisSpeech å©çšæã¯ãAivisSpeech ã® UI ç»é¢ããç°¡åã«é³å£°åæã¢ãã«ãè¿œå ã§ããŸãïŒ
ãšã³ããŠãŒã¶ãŒã®æ¹ã¯ãåºæ¬çã«ãã¡ãã®æ¹æ³ã§é³å£°åæã¢ãã«ãè¿œå ããããšãããããããŸãã
[!IMPORTANT] éçºç (PyInstaller ã§ãã«ããããŠããªãç¶æ ã§å®è¡ããŠããå Žå) ã®é 眮ãã©ã«ãã¯ã
AivisSpeech-Engine
以äžã§ã¯ãªãAivisSpeech-Engine-Dev
以äžãšãªããŸãã
AivisSpeech Engine ã§ã¯ã以äžã®ãããªäŸ¿å©ãªã³ãã³ãã©ã€ã³ãªãã·ã§ã³ãå©çšã§ããŸãïŒ
-
--host 0.0.0.0
ãæå®ãããšãåäžãããã¯ãŒã¯å ã®ä»ã®ç«¯æ«ããã AivisSpeech Engine ãžã¢ã¯ã»ã¹ã§ããããã«ãªããŸãã -
--cors_policy_mode all
ãæå®ãããšããã¹ãŠã®ãã¡ã€ã³ããã® CORS ãªã¯ãšã¹ããèš±å¯ããŸãã -
--load_all_models
ãæå®ãããšãAivisSpeech Engine ã®èµ·åæã«ãã€ã³ã¹ããŒã«ãããŠãããã¹ãŠã®é³å£°åæã¢ãã«ãäºåã«ããŒãããŸãã -
--help
ãæå®ãããšãå©çšå¯èœãªãã¹ãŠã®ãªãã·ã§ã³ã®äžèŠ§ãšèª¬æã衚瀺ããŸãã
ãã®ä»ã«ãå€ãã®ãªãã·ã§ã³ãçšæãããŠããŸãã詳现㯠--help
ãªãã·ã§ã³ã§ã確èªãã ããã
[!TIP]
--use_gpu
ãªãã·ã§ã³ãä»ããŠå®è¡ãããšãWindows ã§ã¯ DirectML ãLinux ã§ã¯ NVIDIA GPU (CUDA) ã掻çšããé«éã«é³å£°åæãè¡ããŸãã
ãªããWindows ç°å¢ã§ã¯ CPU å èµã® GPU (iGPU) ã®ã¿ã® PC ã§ã DirectML æšè«ãè¡ããŸãããã»ãšãã©ã®å Žå CPU æšè«ãããããªãé ããªã£ãŠããŸããããããããã§ããŸããã
詳现㯠ãããã質å ãåç §ããŠãã ããã
[!NOTE] AivisSpeech Engine ã¯ãããã©ã«ãã§ã¯ããŒãçªå·
10101
ã§åäœããŸãã
ä»ã®ã¢ããªã±ãŒã·ã§ã³ãšç«¶åããå Žåã¯ã--port
ãªãã·ã§ã³ã§ä»»æã®ããŒãçªå·ã«å€æŽã§ããŸãã
[!WARNING] VOICEVOX ENGINE ãšç°ãªããäžéšã®ãªãã·ã§ã³ã¯ AivisSpeech Engine ã§ã¯æªå®è£ ã§ãã
Windows / macOS ã§ã¯ãAivisSpeech Engine ãåç¬ã§ã€ã³ã¹ããŒã«ããããšãã§ããŸãããAivisSpeech æ¬äœã«ä»å±ãã AivisSpeech Engine ãåç¬ã§èµ·åãããæ¹ãããç°¡åã§ãã
AivisSpeech ã«å梱ãããŠãã AivisSpeech Engine ã®å®è¡ãã¡ã€ã« (run.exe
/ run
) ã®ãã¹ã¯ä»¥äžã®ãšããã§ãã
-
Windows:
C:\Program Files\AivisSpeech\AivisSpeech-Engine\run.exe
- ãŠãŒã¶ãŒæš©éã§ã€ã³ã¹ããŒã«ãããŠããå Žåã
C:\Users\(ãŠãŒã¶ãŒå)\AppData\Local\Programs\AivisSpeech\AivisSpeech-Engine\run.exe
ãšãªããŸãã
- ãŠãŒã¶ãŒæš©éã§ã€ã³ã¹ããŒã«ãããŠããå Žåã
-
macOS:
/Applications/AivisSpeech.app/Contents/Resources/AivisSpeech-Engine/run
- ãŠãŒã¶ãŒæš©éã§ã€ã³ã¹ããŒã«ãããŠããå Žåã
~/Applications/AivisSpeech.app/Contents/Resources/AivisSpeech-Engine/run
ãšãªããŸãã
- ãŠãŒã¶ãŒæš©éã§ã€ã³ã¹ããŒã«ãããŠããå Žåã
[!NOTE] ååèµ·åæã¯ããã©ã«ãã¢ãã« (çŽ 250MB) ãšæšè«æã«å¿ èŠãª BERT ã¢ãã« (çŽ 1.3GB) ãèªåçã«ããŠã³ããŒããããé¢ä¿ã§ãèµ·åå®äºãŸã§æ倧æ°åã»ã©ããããŸãã
èµ·åå®äºãŸã§ãã°ãããåŸ ã¡ãã ããã
AivisSpeech Engine ã«é³å£°åæã¢ãã«ãè¿œå ããã«ã¯ãã¢ãã«ãã¡ã€ã«ã®é
çœ®å Žæ ãã芧ãã ããã
AivisSpeech å
ã®ãèšå®ãâãé³å£°åæã¢ãã«ã®ç®¡çãããè¿œå ããããšãå¯èœã§ãã
Linux + NVIDIA GPU ç°å¢ã§å®è¡ããéã¯ãONNX Runtime ã察å¿ãã CUDA / cuDNN ããŒãžã§ã³ãšãã¹ãç°å¢ã® CUDA / cuDNN ããŒãžã§ã³ãäžèŽããŠããå¿
èŠããããåäœæ¡ä»¶ãå³ããã§ãã
å
·äœçã«ã¯ãAivisSpeech Engine ã§å©çšããŠãã ONNX Runtime 㯠CUDA 12.x / cuDNN 9.x 以äžãèŠæ±ããŸãã
Docker ã§ããã°ãã¹ã OS ã®ç°å¢ã«é¢ãããåäœããŸãã®ã§ãDocker ã§ã®å°å ¥ãããããããŸãã
Docker ã³ã³ãããå®è¡ããéã¯ãåžžã« ~/.local/share/AivisSpeech-Engine
ãã³ã³ããå
ã® /home/user/.local/share/AivisSpeech-Engine-Dev
ã«ããŠã³ãããŠãã ããã
ããããããšã§ãã³ã³ãããåæ¢ã»åèµ·åããåŸã§ããã€ã³ã¹ããŒã«ããé³å£°åæã¢ãã«ã BERT ã¢ãã«ãã£ãã·ã¥ (çŽ 1.3GB) ãç¶æã§ããŸãã
Docker ç°å¢ã® AivisSpeech Engine ã«é³å£°åæã¢ãã«ãè¿œå ããã«ã¯ããã¹ãç°å¢ã® ~/.local/share/AivisSpeech-Engine/Models
以äžã«ã¢ãã«ãã¡ã€ã« (.aivmx) ãé
眮ããŠãã ããã
[!IMPORTANT] å¿ ã
/home/user/.local/share/AivisSpeech-Engine-Dev
ã«å¯ŸããŠããŠã³ãããŠãã ããã
Docker ã€ã¡ãŒãžäžã® AivisSpeech Engine 㯠PyInstaller ã§ãã«ããããŠããªããããããŒã¿ãã©ã«ãåã«ã¯-Dev
ã® Suffix ãä»äžããAivisSpeech-Engine-Dev
ãšãªããŸãã
docker pull ghcr.io/aivis-project/aivisspeech-engine:cpu-latest
docker run --rm -p '10101:10101' \
-v ~/.local/share/AivisSpeech-Engine:/home/user/.local/share/AivisSpeech-Engine-Dev \
ghcr.io/aivis-project/aivisspeech-engine:cpu-latest
docker pull ghcr.io/aivis-project/aivisspeech-engine:nvidia-latest
docker run --rm --gpus all -p '10101:10101' \
-v ~/.local/share/AivisSpeech-Engine:/home/user/.local/share/AivisSpeech-Engine-Dev \
ghcr.io/aivis-project/aivisspeech-engine:nvidia-latest
Bash ã§ä»¥äžã®ã¯ã³ã©ã€ããŒãå®è¡ãããšãaudio.wav
ã«é³å£°åæãã WAV ãã¡ã€ã«ãåºåãããŸãã
[!IMPORTANT]
äºåã« AivisSpeech Engine ãèµ·åããŠããŠããã€ãã°ã«è¡šç€ºãããModels directory:
以äžã®ãã£ã¬ã¯ããªã«ãã¹ã¿ã€ã« ID ã«å¯Ÿå¿ããé³å£°åæã¢ãã« (.aivmx) ãæ ŒçŽãããŠããããšãåæã§ãã
# STYLE_ID ã¯é³å£°åæ察象ã®ã¹ã¿ã€ã« ID ãå¥é /speakers API ããååŸãå¿
èŠ
STYLE_ID=888753760 && \
echo -n "ããã«ã¡ã¯ãé³å£°åæã®äžçãžããããïŒ" > text.txt && \
curl -s -X POST "127.0.0.1:10101/audio_query?speaker=$STYLE_ID" --get --data-urlencode [email protected] > query.json && \
curl -s -H "Content-Type: application/json" -X POST -d @query.json "127.0.0.1:10101/synthesis?speaker=$STYLE_ID" > audio.wav && \
rm text.txt query.json
[!TIP] 詳ãã API ãªã¯ãšã¹ãã»ã¬ã¹ãã³ã¹ä»æ§ã¯ API ããã¥ã¡ã³ã ã VOICEVOX API ãšã®äºææ§ã«ã€ã㊠ããåç §ãã ãããAPI ããã¥ã¡ã³ãã§ã¯ãææ°ã®éçºçã§ã®å€æŽãéæåæ ããŠããŸãã
èµ·åäžã® AivisSpeech Engine ã® API ããã¥ã¡ã³ã (Swagger UI) ã¯ãAivisSpeech Engine ããã㯠AivisSpeech ãšãã£ã¿ãèµ·åããç¶æ ã§ãhttp://127.0.0.1:10101/docs ã«ã¢ã¯ã»ã¹ãããšç¢ºèªã§ããŸãã
AivisSpeech Engine ã¯ãæŠã VOICEVOX ENGINE ã® HTTP API ãšäºææ§ããããŸãã
VOICEVOX ENGINE ã® HTTP API ã«å¯Ÿå¿ãããœãããŠã§ã¢ã§ããã°ãAPI URL ã http://127.0.0.1:10101
ã«å·®ãæ¿ããã ãã§ãAivisSpeech Engine ã«å¯Ÿå¿ã§ããã¯ãã§ãã
[!IMPORTANT]
ãã ããAPI ã¯ã©ã€ã¢ã³ãåŽã§/audio_query
API ããååŸããAudioQuery
ã®å 容ãç·šéããŠãã/synthesis
API ã«æž¡ããŠããå Žåã¯ãä»æ§å·®ç°ã«ããæ£åžžã«é³å£°åæã§ããªãå ŽåããããŸã (åŸè¿°) ããã®é¢ä¿ã§ãAivisSpeech ãšãã£ã¿ã¯ AivisSpeech Engine ãš VOICEVOX ENGINE ã®äž¡æ¹ãå©çšã§ããŸããïŒãã«ããšã³ãžã³æ©èœå©çšæïŒãVOICEVOX ãšãã£ã¿ãã AivisSpeech Engine ãå©çšããããšã¯ã§ããŸããã
VOICEVOX ãšãã£ã¿ã§ AivisSpeech Engine ãå©çšãããšããšãã£ã¿ã®å®è£ äžã®å¶éã«ããé³å£°åæã®å質ãèããäœäžããŸããAivisSpeech Engine ç¬èªã®ãã©ã¡ãŒã¿ã掻çšã§ããªããªãã»ããé察å¿æ©èœã®åŒã³åºãã§ãšã©ãŒãçºçããå¯èœæ§ããããŸãã
ããè¯ãé³å£°åæçµæãåŸããããAivisSpeech ãšãã£ã¿ã§ã®å©çšã匷ãããããããŸãã
[!NOTE]
äžè¬ç㪠API ãŠãŒã¹ã±ãŒã¹ã«ãããŠã¯æŠãäºææ§ãããã¯ãã§ãããæ ¹æ¬çã«ç°ãªãã¢ãã«ã¢ãŒããã¯ãã£ã®é³å£°åæã·ã¹ãã ã匷åŒã«åäžã® API ä»æ§ã«åããŠããé¢ä¿ã§ãäžèšä»¥å€ã«ãäºææ§ã®ãªã API ããããããããŸããã
Issue ã«ãŠå ±åé ããã°ãäºææ§æ¹åãå¯èœãªãã®ã«é¢ããŠã¯ä¿®æ£ããããŸãã
VOICEVOX ENGINE ããã® API ä»æ§ã®å€æŽç¹ã¯æ¬¡ã®ãšããã§ãã
AIVMX ãã¡ã€ã«ã«å«ãŸãã AIVM ãããã§ã¹ãå
ã®è©±è
ã¹ã¿ã€ã«ã®ããŒã«ã« ID ã¯ã話è
ããšã« 0 ããå§ãŸãé£çªã§ç®¡çãããŠããŸãã
Style-Bert-VITS2 ã¢ãŒããã¯ãã£ã®é³å£°åæã¢ãã«ã§ã¯ããã®å€ã¯ã¢ãã«ã®ãã€ããŒãã©ã¡ãŒã¿ data.style2id
ã®å€ãšäžèŽããŸãã
äžæ¹ãVOICEVOX ENGINE ã® API ã§ã¯ãæŽå²ççµç·¯ãããã話è
UUIDã(speaker_uuid
) ãæå®ããããã¹ã¿ã€ã« IDã(style_id
) ã®ã¿ãé³å£°åæ API ã«æž¡ãä»æ§ãšãªã£ãŠããŸãã
VOICEVOX ENGINE ã§ã¯æèŒãããŠãã話è
ãã¹ã¿ã€ã«ã¯åºå®ã®ãããéçºåŽã§ãã¹ã¿ã€ã« IDããäžæã«ç®¡çã§ããŠããŸããã
äžæ¹ãAivisSpeech Engine ã§ã¯ããŠãŒã¶ãŒãèªç±ã«é³å£°åæã¢ãã«ãè¿œå ã§ããä»æ§ãšãªã£ãŠããŸãã
ãã®ãããVOICEVOX API äºæã®ãã¹ã¿ã€ã« IDãã¯ãã©ã®ãããªé³å£°åæã¢ãã«ãè¿œå ãããŠãäžæãªå€ã§ããå¿
èŠããããŸãã
ããã¯ãäžæãªå€ã§ãªãå Žåãæ°ããé³å£°åæã¢ãã«ãè¿œå ããéã«æ¢åã®ã¢ãã«ã«å«ãŸãã話è
ã¹ã¿ã€ã«ãšã¹ã¿ã€ã« ID ãéè€ããŠããŸãå¯èœæ§ãããããã§ãã
ãã㧠AivisSpeech Engine ã§ã¯ãAIVM ãããã§ã¹ãäžã®è©±è
UUID ãšã¹ã¿ã€ã« ID ãçµã¿åãããŠãVOICEVOX API äºæã®ã°ããŒãã«ã«äžæãªãã¹ã¿ã€ã« IDããçæããŠããŸãã
å
·äœçãªçææ¹æ³ã¯ä»¥äžã®ãšããã§ãã
- 話è UUID ã MD5 ããã·ã¥å€ã«å€æãã
- ãã®ããã·ã¥å€ã®äžäœ 27bit ãšããŒã«ã«ã¹ã¿ã€ã« ID ã® 5bit (0 ~ 31) ãçµã¿åããã
- 32bit 笊å·ä»ãæŽæ°ã«å€æãã
[!WARNING]
ãã®é¢ä¿ã§ããã¹ã¿ã€ã« IDãã« 32bit 笊å·ä»ãæŽæ°ãå ¥ãããšãæ³å®ããŠããªã VOICEVOX API 察å¿ãœãããŠã§ã¢ã§ã¯ãäºæãã¬äžå ·åãçºçããå¯èœæ§ããããŸãã
[!WARNING]
32bit 笊å·ä»ãæŽæ°ã®ç¯å²ã«åããããã«è©±è UUID ã®ã°ããŒãã«ãªäžææ§ãç ç²ã«ããŠããããã極ããŠäœã確çã§ãããç°ãªã話è ã®ã¹ã¿ã€ã« ID ãéè€ïŒè¡çªïŒããå¯èœæ§ããããŸãã
çŸæç¹ã§ã¹ã¿ã€ã« ID ãéè€ããéã®åé¿çã¯ãããŸããããçŸå®çã«ã¯ã»ãšãã©ã®ã±ãŒã¹ã§åé¡ã«ãªããªããšèããããŸãã
[!TIP]
AivisSpeech Engine ã«ãã£ãŠèªåçæããã VOICEVOX API äºæã®ãã¹ã¿ã€ã« IDãã¯ã/speakers
API ããååŸã§ããŸãã
ãã® API ã¯ãAivisSpeech Engine ã«ã€ã³ã¹ããŒã«ãããŠãã話è æ å ±ã®äžèŠ§ãè¿ããŸãã
AudioQuery
åã¯ãããã¹ããé³çŽ åãæå®ããŠé³å£°åæãè¡ãããã®ã¯ãšãªã§ãã
VOICEVOX ENGINE ã® AudioQuery
åããã®äž»ãªå€æŽç¹ã¯ä»¥äžã®ãšããã§ãã
-
intonationScale
ãã£ãŒã«ãã®æå³ãç°ãªããŸãã- VOICEVOX ENGINE ã§ã¯ãå šäœã®ææããè¡šããã©ã¡ãŒã¿ã§ããããAivisSpeech Engine ã§ã¯ãå šäœã®ã¹ã¿ã€ã«ã®åŒ·ãããè¡šããã©ã¡ãŒã¿ãšãªã£ãŠããŸãã
- 話è ã¹ã¿ã€ã«ã®å£°è²ã®åŒ·åŒ±ã 0.0 ~ 2.0 ã®ç¯å²ã§æå®ããŸã (ããã©ã«ã: 1.0) ã
- å€ã倧ããã»ã©ãéžæããã¹ã¿ã€ã«ã«è¿ãææãã€ãã声ã«ãªããŸãã
- äŸãã°ããããããã¹ã¿ã€ã«ãªããå€ã倧ããã»ã©ããå¬ããããªæãã話ãæ¹ã«ãªããŸãã
- ãã ãã話è ãã¹ã¿ã€ã«ã«ãã£ãŠã¯æ°å€ãäžãããããšäžèªç¶ãªå£°ã«ãªãå ŽåããããŸãã
- å šã¹ã¿ã€ã«ã®å¹³åã§ããããŒãã«ã¹ã¿ã€ã«ã«ã¯æå®ã§ããŸããïŒå€ã«ãããããç¡èŠãããŸãïŒã
- Style-Bert-VITS2 ã«ããããã¹ã¿ã€ã«ã®åŒ·ãããã©ã¡ãŒã¿ã¯ãAivisSpeech Engine ã®
intonationScale
ã«å€æãããéã«ä»¥äžã®ããã«å€æãããŸãã-
intonationScale
ã 0.0 ~ 1.0 ã®å ŽåãStyle-Bert-VITS2 ã§ã¯ 0.0 ~ 1.0 ã®ç¯å²ã«çžåœããŸãã -
intonationScale
ã 1.0 ~ 2.0 ã®å ŽåãStyle-Bert-VITS2 ã§ã¯ 1.0 ~ 10.0 ã®ç¯å²ã«çžåœããŸãã
-
-
tempoDynamicsScale
ãã£ãŒã«ããç¬èªã«è¿œå ãããŸããã- AivisSpeech Engine åºæã®ãã©ã¡ãŒã¿ã§ãã話ãéãã®ç·©æ¥ã®åŒ·åŒ±ã 0.0 ~ 2.0 ã®ç¯å²ã§æå®ã§ããŸãïŒããã©ã«ã: 1.0ïŒã
- å€ã倧ããã»ã©ãããæ©å£ã§çã£ãœãææãã€ãã声ã«ãªããŸãã
- Style-Bert-VITS2 ã«ãããããã³ãã®ç·©æ¥ããã©ã¡ãŒã¿ã¯ãAivisSpeech Engine ã®
tempoDynamicsScale
ã«å€æãããéã«ä»¥äžã®ããã«å€æãããŸãã-
tempoDynamicsScale
ã 0.0 ~ 1.0 ã®å ŽåãStyle-Bert-VITS2 ã§ã¯ 0.0 ~ 0.2 ã®ç¯å²ã«çžåœããŸãã -
tempoDynamicsScale
ã 1.0 ~ 2.0 ã®å ŽåãStyle-Bert-VITS2 ã§ã¯ 0.2 ~ 1.0 ã®ç¯å²ã«çžåœããŸãã
-
-
pitchScale
ãã£ãŒã«ãã®ä»æ§ãç°ãªããŸãã- VOICEVOX ENGINE ãšç°ãªãããã®å€ã 0.0 ããå€æŽãããšé³è³ªãå£åããå¯èœæ§ããããŸãã
-
pauseLength
ããã³pauseLengthScale
ãã£ãŒã«ãã¯ãµããŒããããŠããŸããã- äºææ§ã®ãããã£ãŒã«ããšããŠååšã¯ããŸãããåžžã«ç¡èŠãããŸãã
-
kana
ãã£ãŒã«ãã®ä»æ§ãç°ãªããŸãã- VOICEVOX ENGINE ã§ã¯ AquesTalk 颚èšæ³ããã¹ããå ¥ãèªã¿åãå°çšãã£ãŒã«ãã§ããããAivisSpeech Engine ã§ã¯éåžžã®èªã¿äžãããã¹ããæå®ãããã£ãŒã«ããšããŠå©çšããŠããŸãã
- null ã空æååãæå®ãããå Žåã¯ãã¢ã¯ã»ã³ãå¥ããèªåçæãããã²ãããªæååãèªã¿äžãããã¹ããšãªããŸãããäžèªç¶ãªã€ã³ãããŒã·ã§ã³ã«ãªãå¯èœæ§ããããŸãã
- ããèªç¶ãªé³å£°åæçµæãåŸããããå¯èœãªéãéåžžã®èªã¿äžãããã¹ããæå®ããããšãæšå¥šããŸãã
å€æŽç¹ã®è©³çŽ°ã¯ãmodel.py ãåç §ããŠãã ããã
Mora
åã¯ãèªã¿äžãããã¹ãã®ã¢ãŒã©ãè¡šãããŒã¿æ§é ã§ãã
[!TIP]
ã¢ãŒã©ãšã¯ãå®éã«çºé³ãããéã®é³ã®ãŸãšãŸãã®æå°åäœïŒããããããããããªã©ïŒã®ããšã§ãã
Mora
ååç¬ã§ API ãªã¯ãšã¹ãã»ã¬ã¹ãã³ã¹ã«äœ¿ãããããšã¯ãªããåžžã«AudioQuery.accent_phrases[n].moras
ãŸãã¯AudioQuery.accent_phrases[n].pause_mora
ãéããŠéæ¥çã«å©çšãããŸãã
VOICEVOX ENGINE ã® Mora
åããã®äž»ãªå€æŽç¹ã¯ä»¥äžã®ãšããã§ãã
-
èšå·ãã¢ãŒã©ãšããŠæ±ãããŸãã
- VOICEVOX ENGINE ã§ã¯æå笊ã»å¥èªç¹ãªã©ã®èšå·ã¯
pause_mora
ãšããŠæ±ãããŠããŸããããAivisSpeech Engine ã§ã¯éåžžã®ã¢ãŒã©ãšããŠæ±ãããŸãã - èšå·ã¢ãŒã©ã®å Žåã
text
ã«ã¯èšå·ããã®ãŸãŸãvowel
ã«ã¯ "pau" ãèšå®ãããŸãã
- VOICEVOX ENGINE ã§ã¯æå笊ã»å¥èªç¹ãªã©ã®èšå·ã¯
-
consonant
/vowel
ãã£ãŒã«ãã¯èªã¿åãå°çšã§ãã- é³å£°åææã®ããã¹ãã®èªã¿ã«ã¯ãåžžã«
text
ãã£ãŒã«ãã®å€ãå©çšãããŸãã - ãããã®ãã£ãŒã«ãã®å€ãå€æŽããŠããé³å£°åæçµæã«ã¯åœ±é¿ããŸããã
- é³å£°åææã®ããã¹ãã®èªã¿ã«ã¯ãåžžã«
-
consonant_length
/vowel_length
/pitch
ãã£ãŒã«ãã¯ãµããŒããããŠããŸããã- AivisSpeech Engine ã®å®è£ äžããããã®å€ãç®åºããããšãã§ããªããããåžžã«ãããŒå€ãšã㊠0.0 ãè¿ãããŸãã
- äºææ§ã®ãããã£ãŒã«ããšããŠååšã¯ããŸãããåžžã«ç¡èŠãããŸãã
å€æŽç¹ã®è©³çŽ°ã¯ãtts_pipeline/model.py ãåç §ããŠãã ããã
Preset
åã¯ããšãã£ã¿åŽã§é³å£°åæã¯ãšãªã®åæå€ã決å®ããããã®ããªã»ããæ
å ±ã§ãã
å€æŽç¹ã¯ãAudioQuery
åã§èª¬æãã intonationScale
/ tempoDynamicsScale
/ pitchScale
/ pauseLength
/ pauseLengthScale
ã®ãã£ãŒã«ãã®ä»æ§å€æŽã«æŠã察å¿ããŠããŸãã
å€æŽç¹ã®è©³çŽ°ã¯ãpreset/model.py ãåç §ããŠãã ããã
[!WARNING]
æ声åæç³» API ãšããã£ã³ã»ã«å¯èœãªé³å£°åæ API ã¯ãµããŒããããŠããŸããã
äºææ§ã®ãããšã³ããã€ã³ããšããŠååšã¯ããŸãããåžžã«501 Not Implemented
ãè¿ããŸãã
詳现㯠app/routers/character.py / app/routers/tts_pipeline.py ã確èªããŠãã ããã
- GET
/singers
- GET
/singer_info
- POST
/cancellable_synthesis
- POST
/sing_frame_audio_query
- POST
/sing_frame_volume
- POST
/frame_synthesis
[!WARNING]
ã¢ãŒãã£ã³ã°æ©èœãæäŸãã/synthesis_morphing
API ã¯ãµããŒããããŠããŸããã
話è ããšã«çºå£°ã¿ã€ãã³ã°ãç°ãªãé¢ä¿ã§å®è£ äžå¯èœãªããïŒåäœãããããèŽãã«èããªãïŒãåžžã«400 Bad Request
ãè¿ããŸãã
å話è ããšã«ã¢ãŒãã£ã³ã°ã®å©çšå¯åŠãè¿ã/morphable_targets
API ã§ã¯ããã¹ãŠã®è©±è ã§ã¢ãŒãã£ã³ã°çŠæ¢æ±ããšããŠããŸãã
詳现㯠app/routers/morphing.py ã確èªããŠãã ããã
- POST
/synthesis_morphing
- POST
/morphable_targets
[!WARNING]
äºææ§ã®ãããã©ã¡ãŒã¿ãšããŠååšã¯ããŸãããåžžã«ç¡èŠãããŸãã
詳现㯠app/routers/character.py / app/routers/tts_pipeline.py ã確èªããŠãã ããã
-
core_version
ãã©ã¡ãŒã¿- VOICEVOX CORE ã®ããŒãžã§ã³ãæå®ãããã©ã¡ãŒã¿ã§ãã
- AivisSpeech Engine ã§ã¯ VOICEVOX CORE ã«å¯Ÿå¿ããã³ã³ããŒãã³ãããªããããåžžã«ç¡èŠãããŸãã
-
enable_interrogative_upspeak
ãã©ã¡ãŒã¿- çåç³»ã®ããã¹ããäžãããããèªå°Ÿãèªå調æŽãããã®ãã©ã¡ãŒã¿ã§ãã
- AivisSpeech Engine ã§ã¯ãåžžã«ãïŒããïŒããâŠãããããªã©ã®ããã¹ãã«å«ãŸããèšå·ã«å¯Ÿå¿ãããèªç¶ãªææã§èªã¿äžããããŸãã
- ãããã£ãŠã
ã©ãã§ããâŠïŒ
ã®ããã«èªã¿äžãããã¹ãã®æ«å°Ÿã«ãïŒããä»äžããã ãã§ãçåç³»ã®ææã§èªã¿äžããããšãã§ããŸãã
[!TIP]
AivisSpeech ãšãã£ã¿ã® ãããã質å / Q&A ãããããŠã芧ãã ããã
Q. ãã¹ã¿ã€ã«ã®åŒ·ãã(intonationScale
) ã®å€ãäžãããšçºå£°ããããããªããŸãã
AivisSpeech Engine ã§å¯Ÿå¿ããŠãããStyle-Bert-VITS2 ã¢ãã«ã¢ãŒããã¯ãã£ã®çŸæç¹ã§ã®ä»æ§ã«ãªããŸãã
話è
ãã¹ã¿ã€ã«ã«ããããŸãããintonationScale
ã®å€ãäžãããããšçºå£°ããããããªã£ãããæ£èªã¿ã§äžèªç¶ãªå£°ã«ãªãå ŽåããããŸãã
ã¡ãããšçºå£°ã§ãã intonationScale
ã®å€ã®äžéã¯ã話è
ãã¹ã¿ã€ã«ã«ãã£ãŠç°ãªããŸããæé©ãªå€ã«é©å®èª¿æŽããŠãã ããã
AivisSpeech Engine ã§ã¯ãªãã¹ãäžçºã§æ£ããèªã¿ã»æ£ããã¢ã¯ã»ã³ãã«ãªãããåŠçã工倫ããŠããŸãããã©ãããŠãééã£ãèªã¿ã»ã¢ã¯ã»ã³ãã«ãªãå ŽåããããŸãã
ããŸã䜿ãããªãåºæåè©ã人åïŒç¹ã«ãã©ãã©ããŒã ïŒãªã©ãå
èµèŸæžã«ç»é²ãããŠããªãåèªã¯ãæ£ããèªã¿ã«ãªããªãããšãå€ãã§ãã
ããããåèªã®èªã¿æ¹ã¯èŸæžç»é²ã§å€æŽã§ããŸããAivisSpeech ãšãã£ã¿ãŸã㯠API ããåèªãç»é²ããŠã¿ãŠãã ããã
ãªããè€åèªãè±åèªã«é¢ããŠã¯ãåèªã®åªå
床ã«ããããããèŸæžãžã®ç»é²å
容ãåæ ãããªãããšããããŸããããã¯çŸæç¹ã§ã®ä»æ§ã«ãªããŸãã
Q. é·ãæç« ãäžåºŠã«é³å£°åæ API ã«éããšãé³å£°ãäžèªç¶ã«ãªã£ããã¡ã¢ãªãªãŒã¯ãçºçããŸãã
AivisSpeech Engine ã¯ãäžæãæå³ã®ãŸãšãŸããªã©ãæ¯èŒççãæã®åäœã§é³å£°åæããããšãæ³å®ããŠèšèšãããŠããŸãã
ãã®ããã1000 æåãè¶
ãããããªé·ãæç« ãäžåºŠã« /synthesis
API ã«éããšã以äžã®ãããªåé¡ãçºçããå¯èœæ§ããããŸãã
- ã¡ã¢ãªäœ¿çšéãæ¥æ¿ã«å¢å ããPC ã®åäœãé ããªã
- ã¡ã¢ãªãªãŒã¯ãçºçããAivisSpeech Engine ãã¯ã©ãã·ã¥ãã
- é³å£°ã®ææãäžèªç¶ã«ãªããæ£èªã¿ã®ãããªå£°ã«ãªã
é·ãæç« ãé³å£°åæããå Žåã¯ã以äžã®ãããªäœçœ®ã§æç« ãåºåã£ãŠãããããé³å£°åæ API ã«éä¿¡ããããšãããããããŸãã
ããŒããªãããã¯ãããŸããããé³å£°åæ1åã«ã€ã 500 æå以å
ãæãŸããã§ãã
- å¥èªç¹ïŒããããããïŒã®äœçœ®
- æã®æå³ã®åãç®ïŒæ®µèœã®åºåããªã©ïŒ
- äŒè©±æã®åºåãïŒããã§å²ãŸããéšåïŒ
[!TIP]
æã®æå³ã®åãç®ã§åå²ãããšãããèªç¶ãªææã®é³å£°ãçæã§ããåŸåããããŸãã
ããã¯ãäžåºŠã«é³å£°åæ API ã«éãããæç« å šäœã«ãããã¹ãã®å 容ã«å¯Ÿå¿ããææ è¡šçŸãææãé©çšãããããã§ãã
æç« ãé©åã«åå²ããããšã§ãåæã®ææ è¡šçŸãã€ã³ãããŒã·ã§ã³ããªã»ããããããèªç¶ãªèªã¿äžããå®çŸã§ããŸãã
AivisSpeech ãã¯ãããŠèµ·åãããšãã®ã¿ãã¢ãã«ããŒã¿ã®ããŠã³ããŒãã®ãããã€ã³ã¿ãŒãããã¢ã¯ã»ã¹ãå¿
èŠã«ãªããŸãã
2åç®ä»¥éã®èµ·åã§ã¯ãPC ããªãã©ã€ã³ã§ãã䜿ãããã ããŸãã
èµ·åäžã® AivisSpeech Engine ã®èšå®ç»é¢ã§è¡ããŸãã
AivisSpeech Engine èµ·åäžã«ãã©ãŠã¶ãã http://127.0.0.1:[AivisSpeech Engine ã®ããŒãçªå·]/setting
ã«ã¢ã¯ã»ã¹ãããšãAivisSpeech Engine ã®èšå®ç»é¢ãéããŸãã
AivisSpeech Engine ã®ããŒãçªå·ã®ããã©ã«ã㯠10101
ã§ãã
Q. GPU ã¢ãŒã (--use_gpu
) ã«åãæ¿ããã®ã«é³å£°çæã CPU ã¢ãŒããããé
ãã§ãã
CPU å èµã® GPU (iGPU) ã®ã¿ã® PC ã§ã GPU ã¢ãŒãã¯äœ¿ããŸãããã»ãšãã©ã®å Žå CPU ã¢ãŒãããããªãé ããªã£ãŠããŸããããããããã§ããŸããã
ããã¯ãCPU å
èµã® GPU ã¯ç¬ç«ãã GPU (dGPU) ã«æ¯ã¹ãŠæ§èœãäœããAI é³å£°åæã®ãããªéãåŠçãèŠæãªããã§ãã
äžæ¹ã§ãæè¿ã® CPU ã¯æ§èœã倧å¹
ã«åäžããŠãããCPU ã ãã§ãååé«éã«é³å£°ãçæã§ããŸãã
ãã®ãããdGPU éæèŒã® PC ã§ã¯ CPU ã¢ãŒãã®å©çšãããããããŸãã
Intel ã®ç¬¬ 12 äžä»£ä»¥éã® CPUïŒP ã³ã¢ã»E ã³ã¢ã®ãã€ããªããæ§æïŒæèŒ PC ãã䜿ãã®å ŽåãWindows ã®é»æºèšå®ã«ãã£ãŠé³å£°çæã®æ§èœã倧ããå€ããããšããããŸãã
ããã¯ãããã©ã«ãã®ããã©ã³ã¹ãã¢ãŒãã§ã¯ãé³å£°çæã¿ã¹ã¯ãçé»åéèŠã® E ã³ã¢ã«å²ãåœãŠãããããããã§ãã
以äžã®æé ã§èšå®ãå€æŽãããšãP ã³ã¢ãš E ã³ã¢ã®äž¡æ¹ãæ倧é掻çšããé³å£°çæãããé«éã«è¡ããŸãã
- Windows 11 ã®èšå®ãéã
- ã·ã¹ãã â é»æº ãšé²ã
- ãé»æºã¢ãŒããããæé©ãªããã©ãŒãã³ã¹ãã«å€æŽãã
â» ã³ã³ãããŒã«ããã«å
ãé»æºãã©ã³ãã«ããé«ããã©ãŒãã³ã¹ãèšå®ããããŸãããèšå®å
容ãç°ãªããŸãã
Intel 第 12 äžä»£ä»¥éã® CPU ã§ã¯ãWindows 11 ã®èšå®ç»é¢ããã®ãé»æºã¢ãŒããã®å€æŽãããããããŸãã
AivisSpeech ã¯ãå©çšçšéãæçžãããªããèªç±ãª AI é³å£°åæãœãããŠã§ã¢ãç®æããŠããŸãã
ïŒææç©ã§äœ¿ã£ãé³å£°åæã¢ãã«ã®ã©ã€ã»ã³ã¹æ¬¡ç¬¬ã§ã¯ãããŸããïŒå°ãªããšããœãããŠã§ã¢æ¬äœã¯ã¯ã¬ãžããè¡šèšäžèŠã§ãå人ã»æ³äººã»åçšã»éåçšãåãããèªç±ã«ã䜿ãããã ããŸãã
âŠãšã¯ãããããå€ãã®æ¹ã« AivisSpeech ã®ããšãç¥ã£ãŠããã ãããæ°æã¡ããããŸãã
ãããããã°ãææç©ã®ã©ããã« AivisSpeech ã®ããšãã¯ã¬ãžããããŠããã ãããšå¬ããã§ããïŒã¯ã¬ãžããã®è¡šèšãã©ãŒãããã¯ãä»»ãããŸããïŒ
以äžã®ãã©ã«ãã«ä¿åãããŠããŸãã
-
Windows:
C:\Users\(ãŠãŒã¶ãŒå)\AppData\Roaming\AivisSpeech-Engine\Logs
-
Mac:
~/Library/Application Support/AivisSpeech-Engine/Logs
-
Linux:
~/.local/share/AivisSpeech-Engine/Logs
äžå ·åãèŠã€ããããæ¹ã¯ã以äžã®ããããã®æ¹æ³ã§ãå ±åãã ããã
-
GitHub Issue (æšå¥š)
GitHub ã¢ã«ãŠã³ãããæã¡ã®æ¹ã¯ãGitHub ã® Issue ãããå ±åããã ããŸããšãæ©æã®å¯Ÿå¿ãå¯èœã§ãã -
Twitter (X)
Aivis Project å ¬åŒã¢ã«ãŠã³ã ãžã®ãªãã©ã€ã DMããŸãã¯ããã·ã¥ã¿ã° #AivisSpeech ãä»ãããã€ãŒãã§ãå ±åããã ããŸãã -
ãåãåãããã©ãŒã
Aivis Project ãåãåãããã©ãŒã ããããå ±åããã ããŸãã
ãªãã¹ã以äžã®æ å ±ãæ·»ããŠãå ±åããã ããŸããšãããè¿ éãªå¯Ÿå¿ãå¯èœã§ãã
- äžå ·åã®å 容
- åçŸæé ïŒåç»ãåçãããã°æ·»ä»ããŠãã ããïŒ
- OS ã®çš®é¡ã»AivisSpeech ã®ããŒãžã§ã³
- 解決ã®ããã«è©Šãããããš
- ãŠã€ã«ã¹å¯Ÿçãœãããªã©ã®æç¡ïŒé¢ä¿ãããããã§ããã°ïŒ
- 衚瀺ããããšã©ãŒã¡ãã»ãŒãž
- ãšã©ãŒãã°
VOICEVOX ã¯éåžžã«å·šå€§ãªãœãããŠã§ã¢ã§ãããçŸåšã掻çºã«éçºãç¶ããããŠããŸãã
ãã®ãããAivisSpeech Engine ã§ã¯ VOICEVOX ENGINE ã®ææ°çãããŒã¹ã«ã以äžã®æ¹éã§éçºãè¡ã£ãŠããŸãã
- VOICEVOX ææ°çãžã®è¿œåŸã容æã«ãããããã§ããã ãæ¹å€ãå¿
èŠæå°éã«çãã
- VOICEVOX ENGINE ãã AivisSpeech Engine ãžã®ãªãã©ã³ãã£ã³ã°ã¯å¿ èŠãªç®æã®ã¿è¡ã
-
voicevox_engine
ãã£ã¬ã¯ããªããªããŒã ãããš import æã®å€æŽå·®åãèšå€§ã«ãªãããããããŠãªãã©ã³ãã£ã³ã°ãè¡ããªã
- ãªãã¡ã¯ã¿ãªã³ã°ãè¡ããªã
- VOICEVOX ENGINE ãšã®ã³ã³ããªã¯ããçºçããããšã容æã«äºæ³ãããäžãã³ãŒãå šäœã«ç²ŸéããŠããããã§ã¯ãªããã
- AivisSpeech ã§å©çšããªãæ©èœ (æ声åææ©èœãªã©) ã§ãã£ãŠããã³ãŒãã®åé€ã¯è¡ããªã
- ãããã³ã³ããªã¯ããåé¿ãããã
- å©çšããªãã³ãŒãã®ç¡å¹åã¯åé€ã§ã¯ãªããã³ã¡ã³ãã¢ãŠãã§è¡ã
- VOICEVOX ENGINE ãšã®å·®åãæå°éã«æããããã倧éã«ã³ã¡ã³ãã¢ãŠããå¿ èŠãªå Žåã¯ã# ã§ã¯ãªã """ """ ã䜿ã
- ãã ããDockerfile ã GitHub Actions ãªã©ã®æ§æãã¡ã€ã«ããã«ãããŒã«é¡ã¯ãã®éãã§ã¯ãªã
- å ã AivisSpeech Engine ã§ã®æ¹å€éã倧ããéšåã«ã€ããã³ã¡ã³ãã¢ãŠãã§ã¯éåžžã«éå€ãªã³ãŒãã«ãªããã
- ä¿å®ãè¿œåŸãå°é£ãªãããããã¥ã¡ã³ãã®æŽæ°ã¯è¡ããªã
- ãã®ããåããã¥ã¡ã³ãã¯äžåæŽæ°ãããŠããããAivisSpeech Engine ã§ã®å€æŽãåæ ããŠããªã
- AivisSpeech Engine åãã®æ¹å€ã«ãšããªããã¹ãã³ãŒãã®ç¶æãå°é£ãªããããã¹ãã³ãŒãã®è¿œå ã¯è¡ããªã
- æ¢åã®ãã¹ãã³ãŒãã®ã¿ããã¹ããéãããã«äžéšç®æã®ä¿®æ£ãã³ã¡ã³ãã¢ãŠããè¡ããæ¶æ¥µçã«ç¶æãã
- AivisSpeech Engine ã§ã®æ¹å€ã«ããããã¹ãçµæã®ã¹ãããã·ã§ãã㯠VOICEVOX ENGINE ãšç°ãªã
- AivisSpeech Engine ã§ã®æ¹å€ã«ããåããªããªã£ããã¹ãã®ä¿®æ£ã¯è¡ãããã³ã¡ã³ãã¢ãŠãã§å¯Ÿå¿ãã
- AivisSpeech Engine åãã«æ°èŠéçºããç®æã¯ãä¿å®ã³ã¹ããéã¿ãã¹ãã³ãŒããè¿œå ããªã
- æ¢åã®ãã¹ãã³ãŒãã®ã¿ããã¹ããéãããã«äžéšç®æã®ä¿®æ£ãã³ã¡ã³ãã¢ãŠããè¡ããæ¶æ¥µçã«ç¶æãã
æé ã¯ãªãªãžãã«ã® VOICEVOX ENGINE ãšå€§å¹
ã«ç°ãªããŸãã
äºåã« Python 3.11 ãã€ã³ã¹ããŒã«ãããŠããå¿
èŠããããŸãã
# Poetry ãš pre-commit ãã€ã³ã¹ããŒã«
pip install poetry poetry-plugin-export pre-commit
# pre-commit ãæå¹å
pre-commit install
# äŸåé¢ä¿ããã¹ãŠã€ã³ã¹ããŒã«
poetry install
æé ã¯ãªãªãžãã«ã® VOICEVOX ENGINE ãšå€§å¹ ã«ç°ãªããŸãã
# éçºç°å¢ã§ AivisSpeech Engine ãèµ·å
poetry run task serve
# AivisSpeech Engine ã®ãã«ãã衚瀺
poetry run task serve --help
# ã³ãŒããã©ãŒããããèªåä¿®æ£
poetry run task format
# ã³ãŒããã©ãŒãããããã§ãã¯
poetry run task lint
# typos ã«ããã¿ã€ããã§ãã¯
poetry run task typos
# ãã¹ããå®è¡
poetry run task test
# ãã¹ãã®ã¹ãããã·ã§ãããæŽæ°
poetry run task update-snapshots
# ã©ã€ã»ã³ã¹æ
å ±ãæŽæ°
poetry run task update-licenses
# AivisSpeech Engine ããã«ã
poetry run task build
ããŒã¹ã§ãã VOICEVOX ENGINE ã®ãã¥ã¢ã«ã©ã€ã»ã³ã¹ã®ãã¡ãLGPL-3.0 ã®ã¿ãåç¬ã§ç¶æ¿ããŸãã
äžèšãªãã³ã« docs/ 以äžã®ããã¥ã¡ã³ãã¯ãVOICEVOX ENGINE æ¬å®¶ã®ããã¥ã¡ã³ããæ¹å€ãªãã§ãã®ãŸãŸåŒãç¶ãã§ããŸãããããã®ããã¥ã¡ã³ãã®å 容ã AivisSpeech Engine ã«ãéçšãããã¯ä¿èšŒãããŸããã
VOICEVOX ã®ãšã³ãžã³ã§ãã
å®æ
㯠HTTP ãµãŒããŒãªã®ã§ããªã¯ãšã¹ããéä¿¡ããã°ããã¹ãé³å£°åæã§ããŸãã
ïŒãšãã£ã¿ãŒã¯ VOICEVOX ã ã³ã¢ã¯ VOICEVOX CORE ã å šäœæ§æ㯠ãã¡ã ã«è©³çŽ°ããããŸããïŒ
ç®çã«åãããã¬ã€ãã¯ãã¡ãã§ãã
- ãŠãŒã¶ãŒã¬ã€ã: é³å£°åæããããæ¹åã
- è²¢ç®è ã¬ã€ã: ã³ã³ããªãã¥ãŒããããæ¹åã
- éçºè ã¬ã€ã: ã³ãŒããå©çšãããæ¹åã
ãã¡ããã察å¿ãããšã³ãžã³ãããŠã³ããŒãããŠãã ããã
API ããã¥ã¡ã³ãããåç §ãã ããã
VOICEVOX ãšã³ãžã³ãããã¯ãšãã£ã¿ãèµ·åããç¶æ
㧠http://127.0.0.1:50021/docs ã«ã¢ã¯ã»ã¹ãããšãèµ·åäžã®ãšã³ãžã³ã®ããã¥ã¡ã³ãã確èªã§ããŸãã
ä»åŸã®æ¹éãªã©ã«ã€ããŠã¯ VOICEVOX é³å£°åæãšã³ãžã³ãšã®é£æº ãåèã«ãªããããããŸããã
docker pull voicevox/voicevox_engine:cpu-latest
docker run --rm -p '127.0.0.1:50021:50021' voicevox/voicevox_engine:cpu-latest
docker pull voicevox/voicevox_engine:nvidia-latest
docker run --rm --gpus all -p '127.0.0.1:50021:50021' voicevox/voicevox_engine:nvidia-latest
GPU çãå©çšããå Žåãç°å¢ã«ãã£ãŠãšã©ãŒãçºçããããšããããŸãããã®å Žåã--runtime=nvidia
ãdocker run
ã«ã€ããŠå®è¡ãããšè§£æ±ºã§ããããšããããŸãã
echo -n "ããã«ã¡ã¯ãé³å£°åæã®äžçãžãããã" >text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1"\
--get --data-urlencode [email protected] \
> query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio.wav
çæãããé³å£°ã¯ãµã³ããªã³ã°ã¬ãŒãã 24000Hz ãšå°ãç¹æ®ãªãããé³å£°ãã¬ãŒã€ãŒã«ãã£ãŠã¯åçã§ããªãå ŽåããããŸãã
speaker
ã«æå®ããå€ã¯ /speakers
ãšã³ããã€ã³ãã§åŸããã style_id
ã§ããäºææ§ã®ããã« speaker
ãšããååã«ãªã£ãŠããŸãã
/audio_query
ã§åŸãããé³å£°åæçšã®ã¯ãšãªã®ãã©ã¡ãŒã¿ãç·šéããããšã§ãé³å£°ã調æŽã§ããŸãã
äŸãã°ã話éã 1.5 åéã«ããŠã¿ãŸãã
echo -n "ããã«ã¡ã¯ãé³å£°åæã®äžçãžãããã" >text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1" \
--get --data-urlencode [email protected] \
> query.json
# sed ã䜿çšã㊠speedScale ã®å€ã 1.5 ã«å€æŽ
sed -i -r 's/"speedScale":[0-9.]+/"speedScale":1.5/' query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio_fast.wav
ãAquesTalk 颚èšæ³ãã¯ã«ã¿ã«ããšèšå·ã ãã§èªã¿æ¹ãæå®ããèšæ³ã§ããAquesTalk æ¬å®¶ã®èšæ³ãšã¯äžéšãç°ãªããŸãã
AquesTalk 颚èšæ³ã¯æ¬¡ã®ã«ãŒã«ã«åŸããŸãïŒ
- å šãŠã®ã«ãã¯ã«ã¿ã«ãã§èšè¿°ããã
- ã¢ã¯ã»ã³ãå¥ã¯
/
ãŸãã¯ã
ã§åºåããã
ã§åºåã£ãå Žåã«éãç¡é³åºéãæ¿å ¥ãããã - ã«ãã®æåã«
_
ãå ¥ãããšãã®ã«ãã¯ç¡å£°åããã - ã¢ã¯ã»ã³ãäœçœ®ã
'
ã§æå®ãããå šãŠã®ã¢ã¯ã»ã³ãå¥ã«ã¯ã¢ã¯ã»ã³ãäœçœ®ã 1 ã€æå®ããå¿ èŠãããã - ã¢ã¯ã»ã³ãå¥æ«ã«
ïŒ
(å šè§)ãå ¥ããããšã«ããçåæã®çºé³ãã§ãã
/audio_query
ã®ã¬ã¹ãã³ã¹ã«ã¯ãšã³ãžã³ãå€æããèªã¿æ¹ãAquesTalk 颚èšæ³ã§èšè¿°ãããŸãã
ãããä¿®æ£ããããšã§é³å£°ã®èªã¿ä»®åãã¢ã¯ã»ã³ããå¶åŸ¡ã§ããŸãã
# èªãŸãããæç« ãutf-8ã§text.txtã«æžãåºã
echo -n "ãã£ãŒãã©ãŒãã³ã°ã¯äžèœè¬ã§ã¯ãããŸãã" >text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1" \
--get --data-urlencode [email protected] \
> query.json
cat query.json | grep -o -E "\"kana\":\".*\""
# çµæ... "kana":"ãã£'ã€ã/ã©'ã¢ãã³ã°ã¯/ãã³ããªã€ã¯ãã¯ã¢ãªãã»'ã³"
# "ãã£ã€ãã©'ã¢ãã³ã°ã¯/ãã³ã'ãªã€ã¯ãã¯/ã¢ãªãã»'ã³"ãšèªãŸãããã®ã§ã
# is_kana=trueãã€ããŠã€ã³ãããŒã·ã§ã³ãååŸãnewphrases.jsonã«ä¿å
echo -n "ãã£ã€ãã©'ã¢ãã³ã°ã¯/ãã³ã'ãªã€ã¯ãã¯/ã¢ãªãã»'ã³" > kana.txt
curl -s \
-X POST \
"127.0.0.1:50021/accent_phrases?speaker=1&is_kana=true" \
--get --data-urlencode [email protected] \
> newphrases.json
# query.jsonã®"accent_phrases"ã®å
容ãnewphrases.jsonã®å
容ã«çœ®ãæãã
cat query.json | sed -e "s/\[{.*}\]/$(cat newphrases.json)/g" > newquery.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @newquery.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio.wav
API ãããŠãŒã¶ãŒèŸæžã®åç §ãåèªã®è¿œå ãç·šéãåé€ãè¡ãããšãã§ããŸãã
/user_dict
ã« GET ãªã¯ãšã¹ããæããããšã§ãŠãŒã¶ãŒèŸæžã®äžèŠ§ãååŸããããšãã§ããŸãã
curl -s -X GET "127.0.0.1:50021/user_dict"
/user_dict_word
ã« POST ãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã«åèªãè¿œå ããããšãã§ããŸãã
URL ãã©ã¡ãŒã¿ãšããŠã以äžãå¿
èŠã§ãã
- surface ïŒèŸæžã«ç»é²ããåèªïŒ
- pronunciation ïŒã«ã¿ã«ãã§ã®èªã¿æ¹ïŒ
- accent_type ïŒã¢ã¯ã»ã³ãæ žäœçœ®ãæŽæ°ïŒ
ã¢ã¯ã»ã³ãæ žäœçœ®ã«ã€ããŠã¯ããã¡ãã®æç« ãåèã«ãªãããšæããŸãã
ãåãšãªã£ãŠããæ°åã®éšåãã¢ã¯ã»ã³ãæ žäœçœ®ã«ãªããŸãã
https://tdmelodic.readthedocs.io/ja/latest/pages/introduction.html
æåããå Žåã®è¿ãå€ã¯åèªã«å²ãåœãŠããã UUID ã®æååã«ãªããŸãã
surface="test"
pronunciation="ãã¹ã"
accent_type="1"
curl -s -X POST "127.0.0.1:50021/user_dict_word" \
--get \
--data-urlencode "surface=$surface" \
--data-urlencode "pronunciation=$pronunciation" \
--data-urlencode "accent_type=$accent_type"
/user_dict_word/{word_uuid}
ã« PUT ãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã®åèªãä¿®æ£ããããšãã§ããŸãã
URL ãã©ã¡ãŒã¿ãšããŠã以äžãå¿
èŠã§ãã
- surface ïŒèŸæžã«ç»é²ããã¯ãŒãïŒ
- pronunciation ïŒã«ã¿ã«ãã§ã®èªã¿æ¹ïŒ
- accent_type ïŒã¢ã¯ã»ã³ãæ žäœçœ®ãæŽæ°ïŒ
word_uuid ã¯åèªè¿œå æã«ç¢ºèªã§ããã»ãããŠãŒã¶ãŒèŸæžãåç
§ããããšã§ã確èªã§ããŸãã
æåããå Žåã®è¿ãå€ã¯204 No Content
ã«ãªããŸãã
surface="test2"
pronunciation="ãã¹ãããŒ"
accent_type="2"
# ç°å¢ã«ãã£ãŠword_uuidã¯é©å®æžãæããŠãã ãã
word_uuid="cce59b5f-86ab-42b9-bb75-9fd3407f1e2d"
curl -s -X PUT "127.0.0.1:50021/user_dict_word/$word_uuid" \
--get \
--data-urlencode "surface=$surface" \
--data-urlencode "pronunciation=$pronunciation" \
--data-urlencode "accent_type=$accent_type"
/user_dict_word/{word_uuid}
ã« DELETE ãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã®åèªãåé€ããããšãã§ããŸãã
word_uuid ã¯åèªè¿œå æã«ç¢ºèªã§ããã»ãããŠãŒã¶ãŒèŸæžãåç
§ããããšã§ã確èªã§ããŸãã
æåããå Žåã®è¿ãå€ã¯204 No Content
ã«ãªããŸãã
# ç°å¢ã«ãã£ãŠword_uuidã¯é©å®æžãæããŠãã ãã
word_uuid="cce59b5f-86ab-42b9-bb75-9fd3407f1e2d"
curl -s -X DELETE "127.0.0.1:50021/user_dict_word/$word_uuid"
ãšã³ãžã³ã®èšå®ããŒãžå ã®ããŠãŒã¶ãŒèŸæžã®ãšã¯ã¹ããŒã&ã€ã³ããŒããç¯ã§ããŠãŒã¶ãŒèŸæžã®ã€ã³ããŒã&ãšã¯ã¹ããŒããå¯èœã§ãã
ä»ã«ã API ã§ãŠãŒã¶ãŒèŸæžã®ã€ã³ããŒã&ãšã¯ã¹ããŒããå¯èœã§ãã
ã€ã³ããŒãã«ã¯ POST /import_user_dict
ããšã¯ã¹ããŒãã«ã¯ GET /user_dict
ãå©çšããŸãã
åŒæ°çã®è©³çŽ°ã¯ API ããã¥ã¡ã³ããã芧ãã ããã
ãŠãŒã¶ãŒãã£ã¬ã¯ããªã«ããpresets.yaml
ãç·šéããããšã§ãã£ã©ã¯ã¿ãŒã話éãªã©ã®ããªã»ããã䜿ãããšãã§ããŸãã
echo -n "ããªã»ãããããŸã掻çšããã°ããµãŒãããŒãã£éã§åãèšå®ã䜿ãããšãã§ããŸã" >text.txt
# ããªã»ããæ
å ±ãååŸ
curl -s -X GET "127.0.0.1:50021/presets" > presets.json
preset_id=$(cat presets.json | sed -r 's/^.+"id"\:\s?([0-9]+?).+$/\1/g')
style_id=$(cat presets.json | sed -r 's/^.+"style_id"\:\s?([0-9]+?).+$/\1/g')
# é³å£°åæçšã®ã¯ãšãªãååŸ
curl -s \
-X POST \
"127.0.0.1:50021/audio_query_from_preset?preset_id=$preset_id"\
--get --data-urlencode [email protected] \
> query.json
# é³å£°åæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=$style_id" \
> audio.wav
-
speaker_uuid
ã¯ã/speakers
ã§ç¢ºèªã§ããŸã -
id
ã¯éè€ããŠã¯ãããŸãã - ãšã³ãžã³èµ·ååŸã«ãã¡ã€ã«ãæžãæãããšãšã³ãžã³ã«åæ ãããŸã
/synthesis_morphing
ã§ã¯ã2 çš®é¡ã®ã¹ã¿ã€ã«ã§ããããåæãããé³å£°ãå
ã«ãã¢ãŒãã£ã³ã°ããé³å£°ãçæããŸãã
echo -n "ã¢ãŒãã£ã³ã°ãå©çšããããšã§ãïŒçš®é¡ã®å£°ãæ··ããããšãã§ããŸãã" > text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=8"\
--get --data-urlencode [email protected] \
> query.json
# å
ã®ã¹ã¿ã€ã«ã§ã®åæçµæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=8" \
> audio.wav
export MORPH_RATE=0.5
# ã¹ã¿ã€ã«2çš®é¡åã®é³å£°åæ+WORLDã«ããé³å£°åæãå
¥ãããæéãæããã®ã§æ³šæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis_morphing?base_speaker=8&target_speaker=10&morph_rate=$MORPH_RATE" \
> audio.wav
export MORPH_RATE=0.9
# queryãbase_speakerãtarget_speakerãåãå Žåã¯ãã£ãã·ã¥ã䜿çšãããããæ¯èŒçé«éã«çæããã
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis_morphing?base_speaker=8&target_speaker=10&morph_rate=$MORPH_RATE" \
> audio.wav
è¿œå æ
å ±ã®äžã® portrait.png ãååŸããã³ãŒãã§ãã
ïŒjqã䜿çšã㊠json ãããŒã¹ããŠããŸããïŒ
curl -s -X GET "127.0.0.1:50021/speaker_info?speaker_uuid=7ffcb7ce-00ec-4bdc-82cd-45a8889e43ff" \
| jq -r ".portrait" \
| base64 -d \
> portrait.png
/cancellable_synthesis
ã§ã¯éä¿¡ãåæããå Žåã«å³åº§ã«èšç®ãªãœãŒã¹ãéæŸãããŸãã
(/synthesis
ã§ã¯éä¿¡ãåæããŠãæåŸãŸã§é³å£°åæã®èšç®ãè¡ãããŸã)
ãã® API ã¯å®éšçæ©èœã§ããããšã³ãžã³èµ·åæã«åŒæ°ã§--enable_cancellable_synthesis
ãæå®ããªããšæå¹åãããŸããã
é³å£°åæã«å¿
èŠãªãã©ã¡ãŒã¿ã¯/synthesis
ãšåæ§ã§ãã
echo -n '{
"notes": [
{ "key": null, "frame_length": 15, "lyric": "" },
{ "key": 60, "frame_length": 45, "lyric": "ã" },
{ "key": 62, "frame_length": 45, "lyric": "ã¬" },
{ "key": 64, "frame_length": 45, "lyric": "ã" },
{ "key": null, "frame_length": 15, "lyric": "" }
]
}' > score.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @score.json \
"127.0.0.1:50021/sing_frame_audio_query?speaker=6000" \
> query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/frame_synthesis?speaker=3001" \
> audio.wav
楜èã®key
㯠MIDI çªå·ã§ãã
lyric
ã¯æè©ã§ãä»»æã®æååãæå®ã§ããŸããããšã³ãžã³ã«ãã£ãŠã¯ã²ãããªã»ã«ã¿ã«ãïŒã¢ãŒã©ä»¥å€ã®æååã¯ãšã©ãŒã«ãªãããšããããŸãã
ãã¬ãŒã ã¬ãŒãã¯ããã©ã«ãã 93.75Hz ã§ããšã³ãžã³ãããã§ã¹ãã®frame_rate
ã§ååŸã§ããŸãã
ïŒã€ç®ã®ããŒãã¯ç¡é³ã§ããå¿
èŠããããŸãã
/sing_frame_audio_query
ã§æå®ã§ããspeaker
ã¯ã/singers
ã§ååŸã§ããã¹ã¿ã€ã«ã®å
ãçš®é¡ãsing
ãsinging_teacher
ãªã¹ã¿ã€ã«ã®style_id
ã§ãã
/frame_synthesis
ã§æå®ã§ããspeaker
ã¯ã/singers
ã§ååŸã§ããã¹ã¿ã€ã«ã®å
ãçš®é¡ãframe_decode
ã®style_id
ã§ãã
åŒæ°ã speaker
ãšããååã«ãªã£ãŠããã®ã¯ãä»ã® API ãšäžè²«æ§ãããããããã§ãã
/sing_frame_audio_query
ãš/frame_synthesis
ã«ç°ãªãã¹ã¿ã€ã«ãæå®ããããšãå¯èœã§ãã
VOICEVOX ã§ã¯ã»ãã¥ãªãã£ä¿è·ã®ããlocalhost
ã»127.0.0.1
ã»app://
ã»Origin ãªã以å€ã® Origin ãããªã¯ãšã¹ããåãå
¥ããªãããã«ãªã£ãŠããŸãã
ãã®ãããäžéšã®ãµãŒãããŒãã£ã¢ããªããã®ã¬ã¹ãã³ã¹ãåãåããªãå¯èœæ§ããããŸãã
ãããåé¿ããæ¹æ³ãšããŠããšã³ãžã³ããèšå®ã§ãã UI ãçšæããŠããŸãã
- http://127.0.0.1:50021/setting ã«ã¢ã¯ã»ã¹ããŸãã
- å©çšããã¢ããªã«åãããŠèšå®ãå€æŽãè¿œå ããŠãã ããã
- ä¿åãã¿ã³ãæŒããŠãå€æŽã確å®ããŠãã ããã
- èšå®ã®é©çšã«ã¯ãšã³ãžã³ã®åèµ·åãå¿ èŠã§ããå¿ èŠã«å¿ããŠåèµ·åãããŠãã ããã
å®è¡æåŒæ°--disable_mutable_api
ãç°å¢å€æ°VV_DISABLE_MUTABLE_API=1
ãæå®ããããšã§ããšã³ãžã³ã®èšå®ãèŸæžãªã©ãå€æŽãã API ãç¡å¹ã«ã§ããŸãã
ãªã¯ãšã¹ãã»ã¬ã¹ãã³ã¹ã®æåã³ãŒãã¯ãã¹ãŠ UTF-8 ã§ãã
ãšã³ãžã³èµ·åæã«åŒæ°ãæå®ã§ããŸãã詳ããããšã¯-h
åŒæ°ã§ãã«ãã確èªããŠãã ããã
$ python run.py -h
usage: run.py [-h] [--host HOST] [--port PORT] [--use_gpu] [--voicevox_dir VOICEVOX_DIR] [--voicelib_dir VOICELIB_DIR] [--runtime_dir RUNTIME_DIR] [--enable_mock] [--enable_cancellable_synthesis]
[--init_processes INIT_PROCESSES] [--load_all_models] [--cpu_num_threads CPU_NUM_THREADS] [--output_log_utf8] [--cors_policy_mode {CorsPolicyMode.all,CorsPolicyMode.localapps}]
[--allow_origin [ALLOW_ORIGIN ...]] [--setting_file SETTING_FILE] [--preset_file PRESET_FILE] [--disable_mutable_api]
VOICEVOX ã®ãšã³ãžã³ã§ãã
options:
-h, --help show this help message and exit
--host HOST æ¥ç¶ãåãä»ãããã¹ãã¢ãã¬ã¹ã§ãã
--port PORT æ¥ç¶ãåãä»ããããŒãçªå·ã§ãã
--use_gpu GPUã䜿ã£ãŠé³å£°åæããããã«ãªããŸãã
--voicevox_dir VOICEVOX_DIR
VOICEVOXã®ãã£ã¬ã¯ããªãã¹ã§ãã
--voicelib_dir VOICELIB_DIR
VOICEVOX COREã®ãã£ã¬ã¯ããªãã¹ã§ãã
--runtime_dir RUNTIME_DIR
VOICEVOX COREã§äœ¿çšããã©ã€ãã©ãªã®ãã£ã¬ã¯ããªãã¹ã§ãã
--enable_mock VOICEVOX COREã䜿ããã¢ãã¯ã§é³å£°åæãè¡ããŸãã
--enable_cancellable_synthesis
é³å£°åæãéäžã§ãã£ã³ã»ã«ã§ããããã«ãªããŸãã
--init_processes INIT_PROCESSES
cancellable_synthesisæ©èœã®åæåæã«çæããããã»ã¹æ°ã§ãã
--load_all_models èµ·åæã«å
šãŠã®é³å£°åæã¢ãã«ãèªã¿èŸŒã¿ãŸãã
--cpu_num_threads CPU_NUM_THREADS
é³å£°åæãè¡ãã¹ã¬ããæ°ã§ããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_CPU_NUM_THREADS ã®å€ã䜿ãããŸããVV_CPU_NUM_THREADS ã空æååã§ãªãæ°å€ã§ããªãå Žåã¯ãšã©ãŒçµäºããŸãã
--output_log_utf8 ãã°åºåãUTF-8ã§ãããªããŸããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_OUTPUT_LOG_UTF8 ã®å€ã䜿ãããŸããVV_OUTPUT_LOG_UTF8 ã®å€ã1ã®å Žåã¯UTF-8ã§ã0ãŸãã¯ç©ºæåãå€ããªãå Žåã¯ç°å¢ã«ãã£ãŠèªåçã«æ±ºå®ãããŸãã
--cors_policy_mode {CorsPolicyMode.all,CorsPolicyMode.localapps}
CORSã®èš±å¯ã¢ãŒããallãŸãã¯localappsãæå®ã§ããŸããallã¯ãã¹ãŠãèš±å¯ããŸããlocalappsã¯ãªãªãžã³éãªãœãŒã¹å
±æããªã·ãŒããapp://.ãšlocalhosté¢é£ã«éå®ããŸãããã®ä»ã®ãªãªãžã³ã¯allow_originãªãã·ã§ã³ã§è¿œå ã§ããŸããããã©ã«ãã¯localappsããã®ãªãã·ã§ã³ã¯--
setting_fileã§æå®ãããèšå®ãã¡ã€ã«ãããåªå
ãããŸãã
--allow_origin [ALLOW_ORIGIN ...]
èš±å¯ãããªãªãžã³ãæå®ããŸããã¹ããŒã¹ã§åºåãããšã§è€æ°æå®ã§ããŸãããã®ãªãã·ã§ã³ã¯--setting_fileã§æå®ãããèšå®ãã¡ã€ã«ãããåªå
ãããŸãã
--setting_file SETTING_FILE
èšå®ãã¡ã€ã«ãæå®ã§ããŸãã
--preset_file PRESET_FILE
ããªã»ãããã¡ã€ã«ãæå®ã§ããŸããæå®ããªãå Žåãç°å¢å€æ° VV_PRESET_FILEããŠãŒã¶ãŒãã£ã¬ã¯ããªã®presets.yamlãé ã«æ¢ããŸãã
--disable_mutable_api
èŸæžç»é²ãèšå®å€æŽãªã©ããšã³ãžã³ã®éçãªããŒã¿ãå€æŽããAPIãç¡å¹åããŸããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_DISABLE_MUTABLE_API ã®å€ã䜿ãããŸããVV_DISABLE_MUTABLE_API ã®å€ã1ã®å Žåã¯ç¡å¹åã§ã0ãŸãã¯ç©ºæåãå€ããªãå Žåã¯ç¡èŠãããŸãã
ãšã³ãžã³ãã£ã¬ã¯ããªå ã«ãããã¡ã€ã«ãå šãŠæ¶å»ããæ°ãããã®ã«çœ®ãæããŠãã ããã
VOICEVOX ENGINE ã¯çããã®ã³ã³ããªãã¥ãŒã·ã§ã³ããåŸ
ã¡ããŠããŸãïŒ
詳现㯠CONTRIBUTING.md ãã芧ãã ããã
ãŸã VOICEVOX éå
¬åŒ Discord ãµãŒããŒã«ãŠãéçºã®è°è«ãéè«ãè¡ã£ãŠããŸããæ°è»œã«ãåå ãã ããã
ãªããIssue ã解決ãããã«ãªã¯ãšã¹ããäœæãããéã¯ãå¥ã®æ¹ãšåã Issue ã«åãçµãããšãé¿ãããããIssue åŽã§åãçµã¿å§ããããšãäŒããããæåã« Draft ãã«ãªã¯ãšã¹ããäœæããããšãæšå¥šããŠããŸãã
Python 3.11.9
ãçšããŠéçºãããŠããŸãã
ã€ã³ã¹ããŒã«ããã«ã¯ãå OS ããšã® C/C++ ã³ã³ãã€ã©ãCMake ãå¿
èŠã«ãªããŸãã
# å®è¡ç°å¢ã®ã€ã³ã¹ããŒã«
python -m pip install -r requirements.txt
# éçºç°å¢ã»ãã¹ãç°å¢ã»ãã«ãç°å¢ã®ã€ã³ã¹ããŒã«
python -m pip install -r requirements-dev.txt -r requirements-build.txt
ã³ãã³ãã©ã€ã³åŒæ°ã®è©³çŽ°ã¯ä»¥äžã®ã³ãã³ãã§ç¢ºèªããŠãã ããã
python run.py --help
# 補åç VOICEVOX ã§ãµãŒããŒãèµ·å
VOICEVOX_DIR="C:/path/to/voicevox" # 補åç VOICEVOX ãã£ã¬ã¯ããªã®ãã¹
python run.py --voicevox_dir=$VOICEVOX_DIR
# ã¢ãã¯ã§ãµãŒããŒèµ·å
python run.py --enable_mock
# ãã°ãUTF8ã«å€æŽ
python run.py --output_log_utf8
# ããã㯠VV_OUTPUT_LOG_UTF8=1 python run.py
CPU ã¹ã¬ããæ°ãæªæå®ã®å Žåã¯ãè«çã³ã¢æ°ã®ååã䜿ãããŸããïŒæ®ã©ã® CPU ã§ãããã¯å
šäœã®åŠçèœåã®ååã§ãïŒ
ãã IaaS äžã§å®è¡ããŠããããå°çšãµãŒããŒã§å®è¡ããŠããå Žåãªã©ã
ãšã³ãžã³ã䜿ãåŠçèœåã調ç¯ãããå Žåã¯ãCPU ã¹ã¬ããæ°ãæå®ããããšã§å®çŸã§ããŸãã
- å®è¡æåŒæ°ã§æå®ãã
python run.py --voicevox_dir=$VOICEVOX_DIR --cpu_num_threads=4
- ç°å¢å€æ°ã§æå®ãã
export VV_CPU_NUM_THREADS=4 python run.py --voicevox_dir=$VOICEVOX_DIR
VOICEVOX Core 0.5.4 以éã®ã³ã¢ã䜿çšããäºãå¯èœã§ãã
Mac ã§ã® libtorch çã³ã¢ã®ãµããŒãã¯ããŠããŸããã
補åç VOICEVOX ãããã¯ã³ã³ãã€ã«æžã¿ãšã³ãžã³ã®ãã£ã¬ã¯ããªã--voicevox_dir
åŒæ°ã§æå®ãããšããã®ããŒãžã§ã³ã®ã³ã¢ã䜿çšãããŸãã
python run.py --voicevox_dir="/path/to/voicevox"
Mac ã§ã¯ãDYLD_LIBRARY_PATH
ã®æå®ãå¿
èŠã§ãã
DYLD_LIBRARY_PATH="/path/to/voicevox" python run.py --voicevox_dir="/path/to/voicevox"
VOICEVOX Core ã® zip ãã¡ã€ã«ã解åãããã£ã¬ã¯ããªã--voicelib_dir
åŒæ°ã§æå®ããŸãã
ãŸããã³ã¢ã®ããŒãžã§ã³ã«åãããŠãlibtorchãonnxruntime (å
±æã©ã€ãã©ãª) ã®ãã£ã¬ã¯ããªã--runtime_dir
åŒæ°ã§æå®ããŸãã
ãã ããã·ã¹ãã ã®æ¢çŽ¢ãã¹äžã« libtorchãonnxruntime ãããå Žåã--runtime_dir
åŒæ°ã®æå®ã¯äžèŠã§ãã
--voicelib_dir
åŒæ°ã--runtime_dir
åŒæ°ã¯è€æ°å䜿çšå¯èœã§ãã
API ãšã³ããã€ã³ãã§ã³ã¢ã®ããŒãžã§ã³ãæå®ããå Žåã¯core_version
åŒæ°ãæå®ããŠãã ãããïŒæªæå®ã®å Žåã¯ææ°ã®ã³ã¢ã䜿çšãããŸãïŒ
python run.py --voicelib_dir="/path/to/voicevox_core" --runtime_dir="/path/to/libtorch_or_onnx"
Mac ã§ã¯ã--runtime_dir
åŒæ°ã®ä»£ããã«DYLD_LIBRARY_PATH
ã®æå®ãå¿
èŠã§ãã
DYLD_LIBRARY_PATH="/path/to/onnx" python run.py --voicelib_dir="/path/to/voicevox_core"
以äžã®ãã£ã¬ã¯ããªã«ããé³å£°ã©ã€ãã©ãªã¯èªåã§èªã¿èŸŒãŸããŸãã
- ãã«ãç:
<user_data_dir>/voicevox-engine/core_libraries/
- Python ç:
<user_data_dir>/voicevox-engine-dev/core_libraries/
<user_data_dir>
㯠OS ã«ãã£ãŠç°ãªããŸãã
- Windows:
C:\Users\<username>\AppData\Local\
- macOS:
/Users/<username>/Library/Application\ Support/
- Linux:
/home/<username>/.local/share/
pyinstaller
ãçšããããã±ãŒãžåãš Dockerfile ãçšããã³ã³ããåã«ããããŒã«ã«ã§ãã«ããå¯èœã§ãã
æé ã®è©³çŽ°ã¯ è²¢ç®è
ã¬ã€ã#ãã«ã ã埡芧ãã ããã
GitHub ãçšããå Žåãfork ãããªããžããªã§ GitHub Actions ã«ãããã«ããå¯èœã§ãã
Actions ã ON ã«ããworkflow_dispatch ã§build-engine-package.yml
ãèµ·åããã°ãã«ãã§ããŸãã
ææç©ã¯ Release ã«ã¢ããããŒããããŸãã
ãã«ãã«å¿
èŠãª GitHub Actions ã®èšå®ã¯ è²¢ç®è
ã¬ã€ã#GitHub Actions ã埡芧ãã ããã
pytest
ãçšãããã¹ããšåçš®ãªã³ã¿ãŒãçšããéç解æãå¯èœã§ãã
æé ã®è©³çŽ°ã¯ è²¢ç®è
ã¬ã€ã#ãã¹ã, è²¢ç®è
ã¬ã€ã#éç解æ ã埡芧ãã ããã
äŸåé¢ä¿ã¯ poetry
ã§ç®¡çãããŠããŸãããŸããå°å
¥å¯èœãªäŸåã©ã€ãã©ãªã«ã¯ã©ã€ã»ã³ã¹äžã®å¶çŽããããŸãã
詳现㯠貢ç®è
ã¬ã€ã#ããã±ãŒãž ã埡芧ãã ããã
VOICEVOX ãšãã£ã¿ãŒã§ã¯ãè€æ°ã®ãšã³ãžã³ãåæã«èµ·åããããšãã§ããŸãã ãã®æ©èœãå©çšããããšã§ãèªäœã®é³å£°åæãšã³ãžã³ãæ¢åã®é³å£°åæãšã³ãžã³ã VOICEVOX ãšãã£ã¿ãŒäžã§åããããšãå¯èœã§ãã
VOICEVOX API ã«æºæ ããè€æ°ã®ãšã³ãžã³ã® Web API ãããŒããåããŠèµ·åããçµ±äžçã«æ±ãããšã§ãã«ããšã³ãžã³æ©èœãå®çŸããŠããŸãã ãšãã£ã¿ãŒãããããã®ãšã³ãžã³ãå®è¡ãã€ããªçµç±ã§èµ·åããEngineID ãšçµã³ã€ããŠèšå®ãç¶æ ãåå¥ç®¡çããŸãã
VOICEVOX API æºæ ãšã³ãžã³ãèµ·åããå®è¡ãã€ããªãäœãããšã§å¯Ÿå¿ãå¯èœã§ãã VOICEVOX ENGINE ãªããžããªã fork ããäžéšã®æ©èœãæ¹é ããã®ãç°¡åã§ãã
æ¹é ãã¹ãç¹ã¯ãšã³ãžã³æ å ±ã»ãã£ã©ã¯ã¿ãŒæ å ±ã»é³å£°åæã®ïŒç¹ã§ãã
ãšã³ãžã³ã®æ
å ±ã¯ã«ãŒãçŽäžã®ãããã§ã¹ããã¡ã€ã«ïŒengine_manifest.json
ïŒã§ç®¡çãããŠããŸãã
ãã®åœ¢åŒã®ãããã§ã¹ããã¡ã€ã«ã¯ VOICEVOX API æºæ ãšã³ãžã³ã«å¿
é ã§ãã
ãããã§ã¹ããã¡ã€ã«å
ã®æ
å ±ãèŠãŠé©å®å€æŽããŠãã ããã
é³å£°åæææ³ã«ãã£ãŠã¯ãäŸãã°ã¢ãŒãã£ã³ã°æ©èœãªã©ãVOICEVOX ãšåãæ©èœãæã€ããšãã§ããªãå ŽåããããŸãã
ãã®å Žåã¯ãããã§ã¹ããã¡ã€ã«å
ã®supported_features
å
ã®æ
å ±ãé©å®å€æŽããŠãã ããã
ãã£ã©ã¯ã¿ãŒæ
å ±ã¯resources/character_info
ãã£ã¬ã¯ããªå
ã®ãã¡ã€ã«ã§ç®¡çãããŠããŸãã
ãããŒã®ã¢ã€ã³ã³ãªã©ãçšæãããŠããã®ã§é©å®å€æŽããŠãã ããã
é³å£°åæã¯voicevox_engine/tts_pipeline/tts_engine.py
ã§è¡ãããŠããŸãã
VOICEVOX API ã§ã®é³å£°åæã¯ããšã³ãžã³åŽã§é³å£°åæçšã®ã¯ãšãª AudioQuery
ã®åæå€ãäœæããŠãŠãŒã¶ãŒã«è¿ãããŠãŒã¶ãŒãå¿
èŠã«å¿ããŠã¯ãšãªãç·šéããããšããšã³ãžã³ãã¯ãšãªã«åŸã£ãŠé³å£°åæããããšã§å®çŸããŠããŸãã
ã¯ãšãªäœæã¯/audio_query
ãšã³ããã€ã³ãã§ãé³å£°åæã¯/synthesis
ãšã³ããã€ã³ãã§è¡ã£ãŠãããæäœãã®ïŒã€ã«å¯Ÿå¿ããã° VOICEVOX API ã«æºæ ããããšã«ãªããŸãã
VVPP ãã¡ã€ã«ãšããŠé
åžããã®ãããããã§ãã
VVPP ã¯ãVOICEVOX ãã©ã°ã€ã³ããã±ãŒãžãã®ç¥ã§ãäžèº«ã¯ãã«ããããšã³ãžã³ãªã©ãå«ãã ãã£ã¬ã¯ããªã® Zip ãã¡ã€ã«ã§ãã
æ¡åŒµåã.vvpp
ã«ãããšãããã«ã¯ãªãã¯ã§ VOICEVOX ãšãã£ã¿ãŒã«ã€ã³ã¹ããŒã«ã§ããŸãã
ãšãã£ã¿ãŒåŽã¯åãåã£ã VVPP ãã¡ã€ã«ãããŒã«ã«ãã£ã¹ã¯äžã« Zip å±éããããšãã«ãŒãã®çŽäžã«ããengine_manifest.json
ã«åŸã£ãŠãã¡ã€ã«ãæ¢æ»ããŸãã
VOICEVOX ãšãã£ã¿ãŒã«ããŸãèªã¿èŸŒãŸããããªããšãã¯ããšãã£ã¿ãŒã®ãšã©ãŒãã°ãåç
§ããŠãã ããã
ãŸããxxx.vvpp
ã¯åå²ããŠé£çªãä»ããxxx.0.vvppp
ãã¡ã€ã«ãšããŠé
åžããããšãå¯èœã§ãã
ããã¯ãã¡ã€ã«å®¹éã倧ãããŠé
åžãå°é£ãªå Žåã«æçšã§ãã
ã€ã³ã¹ããŒã«ã«å¿
èŠãªvvpp
ããã³vvppp
ãã¡ã€ã«ã¯vvpp.txt
ãã¡ã€ã«ã«ãªã¹ãã¢ããããŠããŸãã
voicevox-client @voicevox-client  VOICEVOX ENGINE ã®åèšèªåã API ã©ãããŒ
LGPL v3 ãšããœãŒã¹ã³ãŒãã®å
¬éãäžèŠãªå¥ã©ã€ã»ã³ã¹ã®ãã¥ã¢ã«ã©ã€ã»ã³ã¹ã§ãã
å¥ã©ã€ã»ã³ã¹ãååŸãããå Žåã¯ãããã«æ±ããŠãã ããã
X ã¢ã«ãŠã³ã: @hiho_karuta
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for AivisSpeech-Engine
Similar Open Source Tools
AivisSpeech
AivisSpeech is a Japanese text-to-speech software based on the VOICEVOX editor UI. It incorporates the AivisSpeech Engine for generating emotionally rich voices easily. It supports AIVMX format voice synthesis model files and specific model architectures like Style-Bert-VITS2. Users can download AivisSpeech and AivisSpeech Engine for Windows and macOS PCs, with minimum memory requirements specified. The development follows the latest version of VOICEVOX, focusing on minimal modifications, rebranding only where necessary, and avoiding refactoring. The project does not update documentation, maintain test code, or refactor unused features to prevent conflicts with VOICEVOX.
chatgpt-on-wechat
This project is a smart chatbot based on a large model, supporting WeChat, WeChat Official Account, Feishu, and DingTalk access. You can choose from GPT3.5/GPT4.0/Claude/Wenxin Yanyi/Xunfei Xinghuo/Tongyi Qianwen/Gemini/LinkAI/ZhipuAI, which can process text, voice, and images, and access external resources such as operating systems and the Internet through plugins, supporting the development of enterprise AI applications based on proprietary knowledge bases.
Streamer-Sales
Streamer-Sales is a large model for live streamers that can explain products based on their characteristics and inspire users to make purchases. It is designed to enhance sales efficiency and user experience, whether for online live sales or offline store promotions. The model can deeply understand product features and create tailored explanations in vivid and precise language, sparking user's desire to purchase. It aims to revolutionize the shopping experience by providing detailed and unique product descriptions to engage users effectively.
chatgpt-web-sea
ChatGPT Web Sea is an open-source project based on ChatGPT-web for secondary development. It supports all models that comply with the OpenAI interface standard, allows for model selection, configuration, and extension, and is compatible with OneAPI. The tool includes a Chinese ChatGPT tuning guide, supports file uploads, and provides model configuration options. Users can interact with the tool through a web interface, configure models, and perform tasks such as model selection, API key management, and chat interface setup. The project also offers Docker deployment options and instructions for manual packaging.
GitHubSentinel
GitHub Sentinel is an intelligent information retrieval and high-value content mining AI Agent designed for the era of large models (LLMs). It is aimed at users who need frequent and large-scale information retrieval, especially open source enthusiasts, individual developers, and investors. The main features include subscription management, update retrieval, notification system, report generation, multi-model support, scheduled tasks, graphical interface, containerization, continuous integration, and the ability to track and analyze the latest dynamics of GitHub open source projects and expand to other information channels like Hacker News for comprehensive information mining and analysis capabilities.
MoneyPrinterTurbo
MoneyPrinterTurbo is a tool that can automatically generate video content based on a provided theme or keyword. It can create video scripts, materials, subtitles, and background music, and then compile them into a high-definition short video. The tool features a web interface and an API interface, supporting AI-generated video scripts, customizable scripts, multiple HD video sizes, batch video generation, customizable video segment duration, multilingual video scripts, multiple voice synthesis options, subtitle generation with font customization, background music selection, access to high-definition and copyright-free video materials, and integration with various AI models like OpenAI, moonshot, Azure, and more. The tool aims to simplify the video creation process and offers future plans to enhance voice synthesis, add video transition effects, provide more video material sources, offer video length options, include free network proxies, enable real-time voice and music previews, support additional voice synthesis services, and facilitate automatic uploads to YouTube platform.
AI-Guide-and-Demos-zh_CN
This is a Chinese AI/LLM introductory project that aims to help students overcome the initial difficulties of accessing foreign large models' APIs. The project uses the OpenAI SDK to provide a more compatible learning experience. It covers topics such as AI video summarization, LLM fine-tuning, and AI image generation. The project also offers a CodePlayground for easy setup and one-line script execution to experience the charm of AI. It includes guides on API usage, LLM configuration, building AI applications with Gradio, customizing prompts for better model performance, understanding LoRA, and more.
xlings
Xlings is a developer tool for programming learning, development, and course building. It provides features such as software installation, one-click environment setup, project dependency management, and cross-platform language package management. Additionally, it offers real-time compilation and running, AI code suggestions, tutorial project creation, automatic code checking for practice, and demo examples collection.
AirPower4T
AirPower4T is a development base library based on Vue3 TypeScript Element Plus Vite, using decorators, object-oriented, Hook and other front-end development methods. It provides many common components and some feedback components commonly used in background management systems, and provides a lot of enums and decorators.
LangChain-SearXNG
LangChain-SearXNG is an open-source AI search engine built on LangChain and SearXNG. It supports faster and more accurate search and question-answering functionalities. Users can deploy SearXNG and set up Python environment to run LangChain-SearXNG. The tool integrates AI models like OpenAI and ZhipuAI for search queries. It offers two search modes: Searxng and ZhipuWebSearch, allowing users to control the search workflow based on input parameters. LangChain-SearXNG v2 version enhances response speed and content quality compared to the previous version, providing a detailed configuration guide and showcasing the effectiveness of different search modes through comparisons.
one-api
One API æ¯äžäžªåŒæºé¡¹ç®ïŒå®éè¿æ åç OpenAI API æ ŒåŒè®¿é®ææç倧暡åïŒåŒç®±å³çšãå®æ¯æå€ç§å€§æš¡åïŒå æ¬ OpenAI ChatGPT ç³»åæš¡åãAnthropic Claude ç³»åæš¡åãGoogle PaLM2/Gemini ç³»åæš¡åãMistral ç³»åæš¡åãçŸåºŠæå¿äžèšç³»åæš¡åãé¿ééä¹åé®ç³»åæš¡åã讯é£æç«è®€ç¥å€§æš¡åãæºè°± ChatGLM ç³»åæš¡åã360 æºèãè Ÿè®¯æ··å 倧暡åãMoonshot AIãçŸå·å€§æš¡åãMINIMAXãGroqãOllamaãé¶äžäžç©ãé¶è·æ蟰ãOne API è¿æ¯æé 眮éå以åäŒå€ç¬¬äžæ¹ä»£çæå¡ïŒæ¯æéè¿èŽèœœåè¡¡çæ¹åŒè®¿é®å€äžªæž éïŒæ¯æ stream æš¡åŒïŒæ¯æå€æºéšçœ²ïŒæ¯æ什ç管çïŒæ¯æå æ¢ç 管çïŒæ¯ææž é管çïŒæ¯æçšæ·åç»ä»¥åæž éåç»ïŒæ¯ææž é讟眮暡ååè¡šïŒæ¯ææ¥çé¢åºŠæç»ïŒæ¯æçšæ·é请å¥å±ïŒæ¯æ以çŸå 䞺åäœæŸç€ºé¢åºŠïŒæ¯æååžå ¬åïŒè®Ÿçœ®å åŒéŸæ¥ïŒè®Ÿçœ®æ°çšæ·åå§é¢åºŠïŒæ¯ææš¡åæ å°ïŒæ¯æ倱莥èªåšéè¯ïŒæ¯æç»åŸæ¥å£ïŒæ¯æ Cloudflare AI GatewayïŒæ¯æäž°å¯çèªå®ä¹è®Ÿçœ®ïŒæ¯æéè¿ç³»ç»è®¿é®ä»€çè°çšç®¡ç APIïŒè¿è**åšæ éäºåŒçæ åµäžæ©å±åèªå®ä¹** One API çåèœïŒæ¯æ Cloudflare Turnstile çšæ·æ ¡éªïŒæ¯æçšæ·ç®¡çïŒæ¯æå€ç§çšæ·ç»åœæ³šåæ¹åŒïŒæ¯æäž»é¢åæ¢ïŒé å Message Pusher å¯å°æ¥èŠä¿¡æ¯æšéå°å€ç§ App äžã
wechat-bot
WeChat Bot is a simple and easy-to-use WeChat robot based on chatgpt and wechaty. It can help you automatically reply to WeChat messages or manage WeChat groups/friends. The tool requires configuration of AI services such as Xunfei, Kimi, or ChatGPT. Users can customize the tool to automatically reply to group or private chat messages based on predefined conditions. The tool supports running in Docker for easy deployment and provides a convenient way to interact with various AI services for WeChat automation.
Code-Interpreter-Api
Code Interpreter API is a project that combines a scheduling center with a sandbox environment, dedicated to creating the world's best code interpreter. It aims to provide a secure, reliable API interface for remotely running code and obtaining execution results, accelerating the development of various AI agents, and being a boon to many AI enthusiasts. The project innovatively combines Docker container technology to achieve secure isolation and execution of Python code. Additionally, the project supports storing generated image data in a PostgreSQL database and accessing it through API endpoints, providing rich data processing and storage capabilities.
meet-libai
The 'meet-libai' project aims to promote and popularize the cultural heritage of the Chinese poet Li Bai by constructing a knowledge graph of Li Bai and training a professional AI intelligent body using large models. The project includes features such as data preprocessing, knowledge graph construction, question-answering system development, and visualization exploration of the graph structure. It also provides code implementations for large models and RAG retrieval enhancement.