
AivisSpeech-Engine
AivisSpeech Engine: AI Voice Imitation System - Text to Speech Engine
Stars: 97

AivisSpeech-Engine is a powerful open-source tool for speech recognition and synthesis. It provides state-of-the-art algorithms for converting speech to text and text to speech. The tool is designed to be user-friendly and customizable, allowing developers to easily integrate speech capabilities into their applications. With AivisSpeech-Engine, users can transcribe audio recordings, create voice-controlled interfaces, and generate natural-sounding speech output. Whether you are building a virtual assistant, developing a speech-to-text application, or experimenting with voice technology, AivisSpeech-Engine offers a comprehensive solution for all your speech processing needs.
README:
ð AivisSpeech Engine: AI Voice Imitation System - Text to Speech Engine
AivisSpeech Engine ã¯ãVOICEVOX ENGINE ãããŒã¹ã«ãããæ¥æ¬èªé³å£°åæãšã³ãžã³ã§ãã
æ¥æ¬èªé³å£°åæãœãããŠã§ã¢ã® AivisSpeech ã«çµã¿èŸŒãŸããŠãããããããã«ææ
è±ããªé³å£°ãçæã§ããŸãã
- ãŠãŒã¶ãŒã®æ¹ãž
- åäœç°å¢
- ãµããŒããããŠããé³å£°åæã¢ãã«
- å°å ¥æ¹æ³
- é³å£°åæ API ã䜿ã
- VOICEVOX API ãšã®äºææ§ã«ã€ããŠ
- ãããã質å / Q&A
- éçºæ¹é
- éçºç°å¢ã®æ§ç¯
- éçº
- ã©ã€ã»ã³ã¹
AivisSpeech ã®äœ¿ãæ¹ããæ¢ãã®æ¹ã¯ãAivisSpeech å ¬åŒãµã€ã ãã芧ãã ããã
ãã®ããŒãžã§ã¯ãäž»ã«éçºè
åãã®æ
å ±ãæ²èŒããŠããŸãã
以äžã¯ãŠãŒã¶ãŒã®æ¹åãã®ããã¥ã¡ã³ãã§ãã
Windowsã»macOSã»Linux æèŒã® PC ã«å¯Ÿå¿ããŠããŸãã
AivisSpeech Engine ãèµ·åããã«ã¯ãPC ã« 1.5GB 以äžã®ç©ºãã¡ã¢ãª (RAM) ãå¿
èŠã§ãã
- Windows: Windows 10 (22H2 以é)ã»Windows 11
- macOS: macOS 13 Ventura 以é
- Linux: Ubuntu 20.04 以é
[!TIP] ãã¹ã¯ãããã¢ããªã§ãã AivisSpeech ã¯ãWindowsã»macOS ã®ã¿ãµããŒã察象ãšããŠããŸãã
äžæ¹ãé³å£°åæ API ãµãŒããŒã§ãã AivisSpeech Engine ã¯ãUbuntu / Debian 系㮠Linux ã§ãå©çšã§ããŸãã
[!NOTE] Intel CPU æèŒ Mac ã§ã®åäœã¯ç©æ¥µçã«æ€èšŒããŠããŸããã
Intel CPU æèŒ Mac ã¯ãã§ã«è£œé ãçµäºããŠãããæ€èšŒç°å¢ããã«ãç°å¢ã®çšæèªäœãé£ãããªã£ãŠããŠããŸãããªãã¹ã Apple Silicon æèŒ Mac ã§ã®å©çšãããããããããŸãã
[!WARNING] Windows 10 ã§ã¯ãããŒãžã§ã³ 22H2 ã§ã®åäœç¢ºèªã®ã¿è¡ã£ãŠããŸãã
ãµããŒããçµäºãã Windows 10 ã®å€ãããŒãžã§ã³ã LTSC (Long Term Servicing Channel) çã® Windows 10 ã§ã¯ãAivisSpeech Engine ãã¯ã©ãã·ã¥ãèµ·åã«å€±æããäºäŸãå ±åãããŠããŸãã
ã»ãã¥ãªãã£äžã®èŠ³ç¹ããããWindows 10 ç°å¢ã®æ¹ã¯ãæäœéããŒãžã§ã³ 22H2 ãŸã§æŽæ°ããŠããã®å©çšã匷ãããããããããŸãã
AivisSpeech Engine ã¯ãAIVMX (Aivis Voice Model for ONNX) (æ¡åŒµå .aivmx
) ãã©ãŒãããã®é³å£°åæã¢ãã«ãã¡ã€ã«ããµããŒãããŠããŸãã
AIVM (Aivis Voice Model) / AIVMX (Aivis Voice Model for ONNX) ã¯ãåŠç¿æžã¿ã¢ãã«ã»ãã€ããŒãã©ã¡ãŒã¿ã»ã¹ã¿ã€ã«ãã¯ãã«ã»è©±è ã¡ã¿ããŒã¿ïŒååã»æŠèŠã»ã©ã€ã»ã³ã¹ã»ã¢ã€ã³ã³ã»ãã€ã¹ãµã³ãã« ãªã©ïŒã 1 ã€ã®ãã¡ã€ã«ã«ã®ã¥ããšãŸãšãããAI é³å£°åæã¢ãã«çšãªãŒãã³ãã¡ã€ã«ãã©ãŒãããã§ãã
AIVM ä»æ§ã AIVM / AIVMX ãã¡ã€ã«ã«ã€ããŠã®è©³çŽ°ã¯ãAivis Project ã«ãŠçå®ãã AIVM ä»æ§ ããåç §ãã ããã
[!NOTE]
ãAIVMãã¯ãAIVM / AIVMX äž¡æ¹ã®ãã©ãŒãããä»æ§ã»ã¡ã¿ããŒã¿ä»æ§ã®ç·ç§°ã§ããããŸãã
å ·äœçã«ã¯ãAIVM ãã¡ã€ã«ã¯ãAIVM ã¡ã¿ããŒã¿ãè¿œå ãã Safetensors 圢åŒããAIVMX ãã¡ã€ã«ã¯ãAIVM ã¡ã¿ããŒã¿ãè¿œå ãã ONNX 圢åŒãã®ã¢ãã«ãã¡ã€ã«ã§ãã
ãAIVM ã¡ã¿ããŒã¿ããšã¯ãAIVM ä»æ§ã«å®çŸ©ãããŠãããåŠç¿æžã¿ã¢ãã«ã«çŽã¥ãåçš®ã¡ã¿ããŒã¿ã®ããšããããŸãã
[!IMPORTANT]
AivisSpeech Engine 㯠AIVM ä»æ§ã®ãªãã¡ã¬ã³ã¹å®è£ ã§ããããŸãããæ¢ã㊠AIVMX ãã¡ã€ã«ã®ã¿ããµããŒãããèšèšãšããŠããŸãã
ããã«ãããPyTorch ãžã®äŸåãæé€ããŠã€ã³ã¹ããŒã«ãµã€ãºãåæžããONNX Runtime ã«ããé«é㪠CPU æšè«ãå®çŸããŠããŸãã
[!TIP]
AIVM Generator ã䜿ããšãæ¢åã®é³å£°åæã¢ãã«ãã AIVM / AIVMX ãã¡ã€ã«ãçæããããæ¢åã® AIVM / AIVMX ãã¡ã€ã«ã®ã¡ã¿ããŒã¿ãç·šéãããã§ããŸãïŒ
以äžã®ã¢ãã«ã¢ãŒããã¯ãã£ã® AIVMX ãã¡ã€ã«ãå©çšã§ããŸãã
Style-Bert-VITS2
Style-Bert-VITS2 (JP-Extra)
[!NOTE] AIVM ã¡ã¿ããŒã¿ã®ä»æ§äžã¯å€èšèªå¯Ÿå¿ã®è©±è ãå®çŸ©ã§ããŸãããAivisSpeech Engine 㯠VOICEVOX ENGINE ãšåæ§ã«ãæ¥æ¬èªé³å£°åæã®ã¿ã«å¯Ÿå¿ããŠããŸãã
ãã®ãããè±èªãäžåœèªã«å¯Ÿå¿ããé³å£°åæã¢ãã«ã§ãã£ãŠããæ¥æ¬èªä»¥å€ã®é³å£°åæã¯ã§ããŸããã
AIVMX ãã¡ã€ã«ã¯ãOS ããšã«ä»¥äžã®ãã©ã«ãã«é 眮ããŠãã ããã
-
Windows:
C:\Users\(ãŠãŒã¶ãŒå)\AppData\Roaming\AivisSpeech-Engine\Models
-
macOS:
~/Library/Application Support/AivisSpeech-Engine/Models
-
Linux:
~/.local/share/AivisSpeech-Engine/Models
å®éã®ãã©ã«ããã¹ã¯ãAivisSpeech Engine ã®èµ·åçŽåŸã®ãã°ã« Models directory:
ãšããŠè¡šç€ºãããŸãã
[!TIP]
AivisSpeech å©çšæã¯ãAivisSpeech ã® UI ç»é¢ããç°¡åã«é³å£°åæã¢ãã«ãè¿œå ã§ããŸãïŒ
ãšã³ããŠãŒã¶ãŒã®æ¹ã¯ãåºæ¬çã«ãã¡ãã®æ¹æ³ã§é³å£°åæã¢ãã«ãè¿œå ããããšãããããããŸãã
[!IMPORTANT] éçºç (PyInstaller ã§ãã«ããããŠããªãç¶æ ã§å®è¡ããŠããå Žå) ã®é 眮ãã©ã«ãã¯ã
AivisSpeech-Engine
以äžã§ã¯ãªãAivisSpeech-Engine-Dev
以äžãšãªããŸãã
AivisSpeech Engine ã§ã¯ã以äžã®ãããªäŸ¿å©ãªã³ãã³ãã©ã€ã³ãªãã·ã§ã³ãå©çšã§ããŸãïŒ
-
--host 0.0.0.0
ãæå®ãããšãåäžãããã¯ãŒã¯å ã®ä»ã®ç«¯æ«ããã AivisSpeech Engine ãžã¢ã¯ã»ã¹ã§ããããã«ãªããŸãã -
--cors_policy_mode all
ãæå®ãããšããã¹ãŠã®ãã¡ã€ã³ããã® CORS ãªã¯ãšã¹ããèš±å¯ããŸãã -
--load_all_models
ãæå®ãããšãAivisSpeech Engine ã®èµ·åæã«ãã€ã³ã¹ããŒã«ãããŠãããã¹ãŠã®é³å£°åæã¢ãã«ãäºåã«ããŒãããŸãã -
--help
ãæå®ãããšãå©çšå¯èœãªãã¹ãŠã®ãªãã·ã§ã³ã®äžèŠ§ãšèª¬æã衚瀺ããŸãã
ãã®ä»ã«ãå€ãã®ãªãã·ã§ã³ãçšæãããŠããŸãã詳现㯠--help
ãªãã·ã§ã³ã§ã確èªãã ããã
[!TIP]
--use_gpu
ãªãã·ã§ã³ãä»ããŠå®è¡ãããšãWindows ã§ã¯ DirectML ãLinux ã§ã¯ NVIDIA GPU (CUDA) ã掻çšããé«éã«é³å£°åæãè¡ããŸãã
ãªããWindows ç°å¢ã§ã¯ CPU å èµã® GPU (iGPU) ã®ã¿ã® PC ã§ã DirectML æšè«ãè¡ããŸãããã»ãšãã©ã®å Žå CPU æšè«ãããããªãé ããªã£ãŠããŸããããããããã§ããŸããã
詳现㯠ãããã質å ãåç §ããŠãã ããã
[!NOTE] AivisSpeech Engine ã¯ãããã©ã«ãã§ã¯ããŒãçªå·
10101
ã§åäœããŸãã
ä»ã®ã¢ããªã±ãŒã·ã§ã³ãšç«¶åããå Žåã¯ã--port
ãªãã·ã§ã³ã§ä»»æã®ããŒãçªå·ã«å€æŽã§ããŸãã
[!WARNING] VOICEVOX ENGINE ãšç°ãªããäžéšã®ãªãã·ã§ã³ã¯ AivisSpeech Engine ã§ã¯æªå®è£ ã§ãã
Windows / macOS ã§ã¯ãAivisSpeech Engine ãåç¬ã§ã€ã³ã¹ããŒã«ããããšãã§ããŸãããAivisSpeech æ¬äœã«ä»å±ãã AivisSpeech Engine ãåç¬ã§èµ·åãããæ¹ãããç°¡åã§ãã
AivisSpeech ã«å梱ãããŠãã AivisSpeech Engine ã®å®è¡ãã¡ã€ã« (run.exe
/ run
) ã®ãã¹ã¯ä»¥äžã®ãšããã§ãã
-
Windows:
C:\Program Files\AivisSpeech\AivisSpeech-Engine\run.exe
- ãŠãŒã¶ãŒæš©éã§ã€ã³ã¹ããŒã«ãããŠããå Žåã
C:\Users\(ãŠãŒã¶ãŒå)\AppData\Local\Programs\AivisSpeech\AivisSpeech-Engine\run.exe
ãšãªããŸãã
- ãŠãŒã¶ãŒæš©éã§ã€ã³ã¹ããŒã«ãããŠããå Žåã
-
macOS:
/Applications/AivisSpeech.app/Contents/Resources/AivisSpeech-Engine/run
- ãŠãŒã¶ãŒæš©éã§ã€ã³ã¹ããŒã«ãããŠããå Žåã
~/Applications/AivisSpeech.app/Contents/Resources/AivisSpeech-Engine/run
ãšãªããŸãã
- ãŠãŒã¶ãŒæš©éã§ã€ã³ã¹ããŒã«ãããŠããå Žåã
[!NOTE] ååèµ·åæã¯ããã©ã«ãã¢ãã« (çŽ 250MB) ãšæšè«æã«å¿ èŠãª BERT ã¢ãã« (çŽ 650MB) ãèªåçã«ããŠã³ããŒããããé¢ä¿ã§ãèµ·åå®äºãŸã§æ倧æ°åã»ã©ããããŸãã
èµ·åå®äºãŸã§ãã°ãããåŸ ã¡ãã ããã
AivisSpeech Engine ã«é³å£°åæã¢ãã«ãè¿œå ããã«ã¯ãã¢ãã«ãã¡ã€ã«ã®é
çœ®å Žæ ãã芧ãã ããã
AivisSpeech å
ã®ãèšå®ãâãé³å£°åæã¢ãã«ã®ç®¡çãããè¿œå ããããšãå¯èœã§ãã
Linux + NVIDIA GPU ç°å¢ã§å®è¡ããéã¯ãONNX Runtime ã察å¿ãã CUDA / cuDNN ããŒãžã§ã³ãšãã¹ãç°å¢ã® CUDA / cuDNN ããŒãžã§ã³ãäžèŽããŠããå¿
èŠããããåäœæ¡ä»¶ãå³ããã§ãã
å
·äœçã«ã¯ãAivisSpeech Engine ã§å©çšããŠãã ONNX Runtime 㯠CUDA 12.x / cuDNN 9.x 以äžãèŠæ±ããŸãã
Docker ã§ããã°ãã¹ã OS ã®ç°å¢ã«é¢ãããåäœããŸãã®ã§ãDocker ã§ã®å°å ¥ãããããããŸãã
Docker ã³ã³ãããå®è¡ããéã¯ãåžžã« ~/.local/share/AivisSpeech-Engine
ãã³ã³ããå
ã® /home/user/.local/share/AivisSpeech-Engine-Dev
ã«ããŠã³ãããŠãã ããã
ããããããšã§ãã³ã³ãããåæ¢ã»åèµ·åããåŸã§ããã€ã³ã¹ããŒã«ããé³å£°åæã¢ãã«ã BERT ã¢ãã«ãã£ãã·ã¥ (çŽ 650MB) ãç¶æã§ããŸãã
Docker ç°å¢ã® AivisSpeech Engine ã«é³å£°åæã¢ãã«ãè¿œå ããã«ã¯ããã¹ãç°å¢ã® ~/.local/share/AivisSpeech-Engine/Models
以äžã«ã¢ãã«ãã¡ã€ã« (.aivmx) ãé
眮ããŠãã ããã
[!IMPORTANT] å¿ ã
/home/user/.local/share/AivisSpeech-Engine-Dev
ã«å¯ŸããŠããŠã³ãããŠãã ããã
Docker ã€ã¡ãŒãžäžã® AivisSpeech Engine 㯠PyInstaller ã§ãã«ããããŠããªããããããŒã¿ãã©ã«ãåã«ã¯-Dev
ã® Suffix ãä»äžããAivisSpeech-Engine-Dev
ãšãªããŸãã
docker pull ghcr.io/aivis-project/aivisspeech-engine:cpu-latest
docker run --rm -p '10101:10101' \
-v ~/.local/share/AivisSpeech-Engine:/home/user/.local/share/AivisSpeech-Engine-Dev \
ghcr.io/aivis-project/aivisspeech-engine:cpu-latest
docker pull ghcr.io/aivis-project/aivisspeech-engine:nvidia-latest
docker run --rm --gpus all -p '10101:10101' \
-v ~/.local/share/AivisSpeech-Engine:/home/user/.local/share/AivisSpeech-Engine-Dev \
ghcr.io/aivis-project/aivisspeech-engine:nvidia-latest
Bash ã§ä»¥äžã®ã¯ã³ã©ã€ããŒãå®è¡ãããšãaudio.wav
ã«é³å£°åæãã WAV ãã¡ã€ã«ãåºåãããŸãã
[!IMPORTANT]
äºåã« AivisSpeech Engine ãèµ·åããŠããŠããã€ãã°ã«è¡šç€ºãããModels directory:
以äžã®ãã£ã¬ã¯ããªã«ãã¹ã¿ã€ã« ID ã«å¯Ÿå¿ããé³å£°åæã¢ãã« (.aivmx) ãæ ŒçŽãããŠããããšãåæã§ãã
# STYLE_ID ã¯é³å£°åæ察象ã®ã¹ã¿ã€ã« ID ãå¥é /speakers API ããååŸãå¿
èŠ
STYLE_ID=888753760 && \
echo -n "ããã«ã¡ã¯ãé³å£°åæã®äžçãžããããïŒ" > text.txt && \
curl -s -X POST "127.0.0.1:10101/audio_query?speaker=$STYLE_ID" --get --data-urlencode [email protected] > query.json && \
curl -s -H "Content-Type: application/json" -X POST -d @query.json "127.0.0.1:10101/synthesis?speaker=$STYLE_ID" > audio.wav && \
rm text.txt query.json
[!TIP] 詳ãã API ãªã¯ãšã¹ãã»ã¬ã¹ãã³ã¹ä»æ§ã¯ API ããã¥ã¡ã³ã ã VOICEVOX API ãšã®äºææ§ã«ã€ã㊠ããåç §ãã ãããAPI ããã¥ã¡ã³ãã§ã¯ãææ°ã®éçºçã§ã®å€æŽãéæåæ ããŠããŸãã
èµ·åäžã® AivisSpeech Engine ã® API ããã¥ã¡ã³ã (Swagger UI) ã¯ãAivisSpeech Engine ããã㯠AivisSpeech ãšãã£ã¿ãèµ·åããç¶æ ã§ãhttp://127.0.0.1:10101/docs ã«ã¢ã¯ã»ã¹ãããšç¢ºèªã§ããŸãã
AivisSpeech Engine ã¯ãæŠã VOICEVOX ENGINE ã® HTTP API ãšäºææ§ããããŸãã
VOICEVOX ENGINE ã® HTTP API ã«å¯Ÿå¿ãããœãããŠã§ã¢ã§ããã°ãAPI URL ã http://127.0.0.1:10101
ã«å·®ãæ¿ããã ãã§ãAivisSpeech Engine ã«å¯Ÿå¿ã§ããã¯ãã§ãã
[!IMPORTANT]
ãã ããAPI ã¯ã©ã€ã¢ã³ãåŽã§/audio_query
API ããååŸããAudioQuery
ã®å 容ãç·šéããŠãã/synthesis
API ã«æž¡ããŠããå Žåã¯ãä»æ§å·®ç°ã«ããæ£åžžã«é³å£°åæã§ããªãå ŽåããããŸã (åŸè¿°) ããã®é¢ä¿ã§ãAivisSpeech ãšãã£ã¿ã¯ AivisSpeech Engine ãš VOICEVOX ENGINE ã®äž¡æ¹ãå©çšã§ããŸããïŒãã«ããšã³ãžã³æ©èœå©çšæïŒãVOICEVOX ãšãã£ã¿ãã AivisSpeech Engine ãå©çšããããšã¯ã§ããŸããã
VOICEVOX ãšãã£ã¿ã§ AivisSpeech Engine ãå©çšãããšããšãã£ã¿ã®å®è£ äžã®å¶éã«ããé³å£°åæã®å質ãèããäœäžããŸããAivisSpeech Engine ç¬èªã®ãã©ã¡ãŒã¿ã掻çšã§ããªããªãã»ããé察å¿æ©èœã®åŒã³åºãã§ãšã©ãŒãçºçããå¯èœæ§ããããŸãã
ããè¯ãé³å£°åæçµæãåŸããããAivisSpeech ãšãã£ã¿ã§ã®å©çšã匷ãããããããŸãã
[!NOTE]
äžè¬ç㪠API ãŠãŒã¹ã±ãŒã¹ã«ãããŠã¯æŠãäºææ§ãããã¯ãã§ãããæ ¹æ¬çã«ç°ãªãã¢ãã«ã¢ãŒããã¯ãã£ã®é³å£°åæã·ã¹ãã ã匷åŒã«åäžã® API ä»æ§ã«åããŠããé¢ä¿ã§ãäžèšä»¥å€ã«ãäºææ§ã®ãªã API ããããããããŸããã
Issue ã«ãŠå ±åé ããã°ãäºææ§æ¹åãå¯èœãªãã®ã«é¢ããŠã¯ä¿®æ£ããããŸãã
VOICEVOX ENGINE ããã® API ä»æ§ã®å€æŽç¹ã¯æ¬¡ã®ãšããã§ãã
AIVMX ãã¡ã€ã«ã«å«ãŸãã AIVM ãããã§ã¹ãå
ã®è©±è
ã¹ã¿ã€ã«ã®ããŒã«ã« ID ã¯ã話è
ããšã« 0 ããå§ãŸãé£çªã§ç®¡çãããŠããŸãã
Style-Bert-VITS2 ã¢ãŒããã¯ãã£ã®é³å£°åæã¢ãã«ã§ã¯ããã®å€ã¯ã¢ãã«ã®ãã€ããŒãã©ã¡ãŒã¿ data.style2id
ã®å€ãšäžèŽããŸãã
äžæ¹ãVOICEVOX ENGINE ã® API ã§ã¯ãæŽå²ççµç·¯ãããã話è
UUIDã(speaker_uuid
) ãæå®ããããã¹ã¿ã€ã« IDã(style_id
) ã®ã¿ãé³å£°åæ API ã«æž¡ãä»æ§ãšãªã£ãŠããŸãã
VOICEVOX ENGINE ã§ã¯æèŒãããŠãã話è
ãã¹ã¿ã€ã«ã¯åºå®ã®ãããéçºåŽã§ãã¹ã¿ã€ã« IDããäžæã«ç®¡çã§ããŠããŸããã
äžæ¹ãAivisSpeech Engine ã§ã¯ããŠãŒã¶ãŒãèªç±ã«é³å£°åæã¢ãã«ãè¿œå ã§ããä»æ§ãšãªã£ãŠããŸãã
ãã®ãããVOICEVOX API äºæã®ãã¹ã¿ã€ã« IDãã¯ãã©ã®ãããªé³å£°åæã¢ãã«ãè¿œå ãããŠãäžæãªå€ã§ããå¿
èŠããããŸãã
ããã¯ãäžæãªå€ã§ãªãå Žåãæ°ããé³å£°åæã¢ãã«ãè¿œå ããéã«æ¢åã®ã¢ãã«ã«å«ãŸãã話è
ã¹ã¿ã€ã«ãšã¹ã¿ã€ã« ID ãéè€ããŠããŸãå¯èœæ§ãããããã§ãã
ãã㧠AivisSpeech Engine ã§ã¯ãAIVM ãããã§ã¹ãäžã®è©±è
UUID ãšã¹ã¿ã€ã« ID ãçµã¿åãããŠãVOICEVOX API äºæã®ã°ããŒãã«ã«äžæãªãã¹ã¿ã€ã« IDããçæããŠããŸãã
å
·äœçãªçææ¹æ³ã¯ä»¥äžã®ãšããã§ãã
- 話è UUID ã MD5 ããã·ã¥å€ã«å€æãã
- ãã®ããã·ã¥å€ã®äžäœ 27bit ãšããŒã«ã«ã¹ã¿ã€ã« ID ã® 5bit (0 ~ 31) ãçµã¿åããã
- 32bit 笊å·ä»ãæŽæ°ã«å€æãã
[!WARNING]
ãã®é¢ä¿ã§ããã¹ã¿ã€ã« IDãã« 32bit 笊å·ä»ãæŽæ°ãå ¥ãããšãæ³å®ããŠããªã VOICEVOX API 察å¿ãœãããŠã§ã¢ã§ã¯ãäºæãã¬äžå ·åãçºçããå¯èœæ§ããããŸãã
[!WARNING]
32bit 笊å·ä»ãæŽæ°ã®ç¯å²ã«åããããã«è©±è UUID ã®ã°ããŒãã«ãªäžææ§ãç ç²ã«ããŠããããã極ããŠäœã確çã§ãããç°ãªã話è ã®ã¹ã¿ã€ã« ID ãéè€ïŒè¡çªïŒããå¯èœæ§ããããŸãã
çŸæç¹ã§ã¹ã¿ã€ã« ID ãéè€ããéã®åé¿çã¯ãããŸããããçŸå®çã«ã¯ã»ãšãã©ã®ã±ãŒã¹ã§åé¡ã«ãªããªããšèããããŸãã
[!TIP]
AivisSpeech Engine ã«ãã£ãŠèªåçæããã VOICEVOX API äºæã®ãã¹ã¿ã€ã« IDãã¯ã/speakers
API ããååŸã§ããŸãã
ãã® API ã¯ãAivisSpeech Engine ã«ã€ã³ã¹ããŒã«ãããŠãã話è æ å ±ã®äžèŠ§ãè¿ããŸãã
AudioQuery
åã¯ãããã¹ããé³çŽ åãæå®ããŠé³å£°åæãè¡ãããã®ã¯ãšãªã§ãã
VOICEVOX ENGINE ã® AudioQuery
åããã®äž»ãªå€æŽç¹ã¯ä»¥äžã®ãšããã§ãã
-
intonationScale
ãã£ãŒã«ãã®æå³ãç°ãªããŸãã- VOICEVOX ENGINE ã§ã¯ãå šäœã®ææããè¡šããã©ã¡ãŒã¿ã§ããããAivisSpeech Engine ã§ã¯ãå šäœã®ã¹ã¿ã€ã«ã®åŒ·ãããè¡šããã©ã¡ãŒã¿ãšãªã£ãŠããŸãã
- 話è ã¹ã¿ã€ã«ã®å£°è²ã®åŒ·åŒ±ã 0.0 ~ 2.0 ã®ç¯å²ã§æå®ããŸã (ããã©ã«ã: 1.0) ã
- å€ã倧ããã»ã©ãéžæããã¹ã¿ã€ã«ã«è¿ãææãã€ãã声ã«ãªããŸãã
- äŸãã°ããããããã¹ã¿ã€ã«ãªããå€ã倧ããã»ã©ããå¬ããããªæãã話ãæ¹ã«ãªããŸãã
- ãã ãã話è ãã¹ã¿ã€ã«ã«ãã£ãŠã¯æ°å€ãäžãããããšäžèªç¶ãªå£°ã«ãªãå ŽåããããŸãã
- å šã¹ã¿ã€ã«ã®å¹³åã§ããããŒãã«ã¹ã¿ã€ã«ã«ã¯æå®ã§ããŸããïŒå€ã«ãããããç¡èŠãããŸãïŒã
- Style-Bert-VITS2 ã«ããããã¹ã¿ã€ã«ã®åŒ·ãããã©ã¡ãŒã¿ã¯ãAivisSpeech Engine ã®
intonationScale
ã«å€æãããéã«ä»¥äžã®ããã«å€æãããŸãã-
intonationScale
ã 0.0 ~ 1.0 ã®å ŽåãStyle-Bert-VITS2 ã§ã¯ 0.0 ~ 1.0 ã®ç¯å²ã«çžåœããŸãã -
intonationScale
ã 1.0 ~ 2.0 ã®å ŽåãStyle-Bert-VITS2 ã§ã¯ 1.0 ~ 10.0 ã®ç¯å²ã«çžåœããŸãã
-
-
tempoDynamicsScale
ãã£ãŒã«ããç¬èªã«è¿œå ãããŸããã- AivisSpeech Engine åºæã®ãã©ã¡ãŒã¿ã§ãã話ãéãã®ç·©æ¥ã®åŒ·åŒ±ã 0.0 ~ 2.0 ã®ç¯å²ã§æå®ã§ããŸãïŒããã©ã«ã: 1.0ïŒã
- å€ã倧ããã»ã©ãããæ©å£ã§çã£ãœãææãã€ãã声ã«ãªããŸãã
- Style-Bert-VITS2 ã«ãããããã³ãã®ç·©æ¥ããã©ã¡ãŒã¿ã¯ãAivisSpeech Engine ã®
tempoDynamicsScale
ã«å€æãããéã«ä»¥äžã®ããã«å€æãããŸãã-
tempoDynamicsScale
ã 0.0 ~ 1.0 ã®å ŽåãStyle-Bert-VITS2 ã§ã¯ 0.0 ~ 0.2 ã®ç¯å²ã«çžåœããŸãã -
tempoDynamicsScale
ã 1.0 ~ 2.0 ã®å ŽåãStyle-Bert-VITS2 ã§ã¯ 0.2 ~ 1.0 ã®ç¯å²ã«çžåœããŸãã
-
-
pitchScale
ãã£ãŒã«ãã®ä»æ§ãç°ãªããŸãã- VOICEVOX ENGINE ãšç°ãªãããã®å€ã 0.0 ããå€æŽãããšé³è³ªãå£åããå¯èœæ§ããããŸãã
-
pauseLength
ããã³pauseLengthScale
ãã£ãŒã«ãã¯ãµããŒããããŠããŸããã- äºææ§ã®ãããã£ãŒã«ããšããŠååšã¯ããŸãããåžžã«ç¡èŠãããŸãã
-
kana
ãã£ãŒã«ãã®ä»æ§ãç°ãªããŸãã- VOICEVOX ENGINE ã§ã¯ AquesTalk 颚èšæ³ããã¹ããå ¥ãèªã¿åãå°çšãã£ãŒã«ãã§ããããAivisSpeech Engine ã§ã¯éåžžã®èªã¿äžãããã¹ããæå®ãããã£ãŒã«ããšããŠå©çšããŠããŸãã
- null ã空æååãæå®ãããå Žåã¯ãã¢ã¯ã»ã³ãå¥ããèªåçæãããã²ãããªæååãèªã¿äžãããã¹ããšãªããŸãããäžèªç¶ãªã€ã³ãããŒã·ã§ã³ã«ãªãå¯èœæ§ããããŸãã
- ããèªç¶ãªé³å£°åæçµæãåŸããããå¯èœãªéãéåžžã®èªã¿äžãããã¹ããæå®ããããšãæšå¥šããŸãã
å€æŽç¹ã®è©³çŽ°ã¯ãmodel.py ãåç §ããŠãã ããã
Mora
åã¯ãèªã¿äžãããã¹ãã®ã¢ãŒã©ãè¡šãããŒã¿æ§é ã§ãã
[!TIP]
ã¢ãŒã©ãšã¯ãå®éã«çºé³ãããéã®é³ã®ãŸãšãŸãã®æå°åäœïŒããããããããããªã©ïŒã®ããšã§ãã
Mora
ååç¬ã§ API ãªã¯ãšã¹ãã»ã¬ã¹ãã³ã¹ã«äœ¿ãããããšã¯ãªããåžžã«AudioQuery.accent_phrases[n].moras
ãŸãã¯AudioQuery.accent_phrases[n].pause_mora
ãéããŠéæ¥çã«å©çšãããŸãã
VOICEVOX ENGINE ã® Mora
åããã®äž»ãªå€æŽç¹ã¯ä»¥äžã®ãšããã§ãã
-
èšå·ãã¢ãŒã©ãšããŠæ±ãããŸãã
- VOICEVOX ENGINE ã§ã¯æå笊ã»å¥èªç¹ãªã©ã®èšå·ã¯
pause_mora
ãšããŠæ±ãããŠããŸããããAivisSpeech Engine ã§ã¯éåžžã®ã¢ãŒã©ãšããŠæ±ãããŸãã - èšå·ã¢ãŒã©ã®å Žåã
text
ã«ã¯èšå·ããã®ãŸãŸãvowel
ã«ã¯ "pau" ãèšå®ãããŸãã
- VOICEVOX ENGINE ã§ã¯æå笊ã»å¥èªç¹ãªã©ã®èšå·ã¯
-
consonant
/vowel
ãã£ãŒã«ãã¯èªã¿åãå°çšã§ãã- é³å£°åææã®ããã¹ãã®èªã¿ã«ã¯ãåžžã«
text
ãã£ãŒã«ãã®å€ãå©çšãããŸãã - ãããã®ãã£ãŒã«ãã®å€ãå€æŽããŠããé³å£°åæçµæã«ã¯åœ±é¿ããŸããã
- é³å£°åææã®ããã¹ãã®èªã¿ã«ã¯ãåžžã«
-
consonant_length
/vowel_length
/pitch
ãã£ãŒã«ãã¯ãµããŒããããŠããŸããã- AivisSpeech Engine ã®å®è£ äžããããã®å€ãç®åºããããšãã§ããªããããåžžã«ãããŒå€ãšã㊠0.0 ãè¿ãããŸãã
- äºææ§ã®ãããã£ãŒã«ããšããŠååšã¯ããŸãããåžžã«ç¡èŠãããŸãã
å€æŽç¹ã®è©³çŽ°ã¯ãtts_pipeline/model.py ãåç §ããŠãã ããã
Preset
åã¯ããšãã£ã¿åŽã§é³å£°åæã¯ãšãªã®åæå€ã決å®ããããã®ããªã»ããæ
å ±ã§ãã
å€æŽç¹ã¯ãAudioQuery
åã§èª¬æãã intonationScale
/ tempoDynamicsScale
/ pitchScale
/ pauseLength
/ pauseLengthScale
ã®ãã£ãŒã«ãã®ä»æ§å€æŽã«æŠã察å¿ããŠããŸãã
å€æŽç¹ã®è©³çŽ°ã¯ãpreset/model.py ãåç §ããŠãã ããã
[!WARNING]
æ声åæç³» API ãšããã£ã³ã»ã«å¯èœãªé³å£°åæ API ã¯ãµããŒããããŠããŸããã
äºææ§ã®ãããšã³ããã€ã³ããšããŠååšã¯ããŸãããåžžã«501 Not Implemented
ãè¿ããŸãã
詳现㯠app/routers/character.py / app/routers/tts_pipeline.py ã確èªããŠãã ããã
- GET
/singers
- GET
/singer_info
- POST
/cancellable_synthesis
- POST
/sing_frame_audio_query
- POST
/sing_frame_volume
- POST
/frame_synthesis
[!WARNING]
ã¢ãŒãã£ã³ã°æ©èœãæäŸãã/synthesis_morphing
API ã¯ãµããŒããããŠããŸããã
話è ããšã«çºå£°ã¿ã€ãã³ã°ãç°ãªãé¢ä¿ã§å®è£ äžå¯èœãªããïŒåäœãããããèŽãã«èããªãïŒãåžžã«400 Bad Request
ãè¿ããŸãã
å話è ããšã«ã¢ãŒãã£ã³ã°ã®å©çšå¯åŠãè¿ã/morphable_targets
API ã§ã¯ããã¹ãŠã®è©±è ã§ã¢ãŒãã£ã³ã°çŠæ¢æ±ããšããŠããŸãã
詳现㯠app/routers/morphing.py ã確èªããŠãã ããã
- POST
/synthesis_morphing
- POST
/morphable_targets
[!WARNING]
äºææ§ã®ãããã©ã¡ãŒã¿ãšããŠååšã¯ããŸãããåžžã«ç¡èŠãããŸãã
詳现㯠app/routers/character.py / app/routers/tts_pipeline.py ã確èªããŠãã ããã
-
core_version
ãã©ã¡ãŒã¿- VOICEVOX CORE ã®ããŒãžã§ã³ãæå®ãããã©ã¡ãŒã¿ã§ãã
- AivisSpeech Engine ã§ã¯ VOICEVOX CORE ã«å¯Ÿå¿ããã³ã³ããŒãã³ãããªããããåžžã«ç¡èŠãããŸãã
-
enable_interrogative_upspeak
ãã©ã¡ãŒã¿- çåç³»ã®ããã¹ããäžãããããèªå°Ÿãèªå調æŽãããã®ãã©ã¡ãŒã¿ã§ãã
- AivisSpeech Engine ã§ã¯ãåžžã«ãïŒããïŒããâŠãããããªã©ã®ããã¹ãã«å«ãŸããèšå·ã«å¯Ÿå¿ãããèªç¶ãªææã§èªã¿äžããããŸãã
- ãããã£ãŠã
ã©ãã§ããâŠïŒ
ã®ããã«èªã¿äžãããã¹ãã®æ«å°Ÿã«ãïŒããä»äžããã ãã§ãçåç³»ã®ææã§èªã¿äžããããšãã§ããŸãã
[!TIP]
AivisSpeech ãšãã£ã¿ã® ãããã質å / Q&A ãããããŠã芧ãã ããã
Q. ãã¹ã¿ã€ã«ã®åŒ·ãã(intonationScale
) ã®å€ãäžãããšçºå£°ããããããªããŸãã
AivisSpeech Engine ã§å¯Ÿå¿ããŠãããStyle-Bert-VITS2 ã¢ãã«ã¢ãŒããã¯ãã£ã®çŸæç¹ã§ã®ä»æ§ã«ãªããŸãã
話è
ãã¹ã¿ã€ã«ã«ããããŸãããintonationScale
ã®å€ãäžãããããšçºå£°ããããããªã£ãããæ£èªã¿ã§äžèªç¶ãªå£°ã«ãªãå ŽåããããŸãã
ã¡ãããšçºå£°ã§ãã intonationScale
ã®å€ã®äžéã¯ã話è
ãã¹ã¿ã€ã«ã«ãã£ãŠç°ãªããŸããæé©ãªå€ã«é©å®èª¿æŽããŠãã ããã
AivisSpeech Engine ã§ã¯ãªãã¹ãäžçºã§æ£ããèªã¿ã»æ£ããã¢ã¯ã»ã³ãã«ãªãããåŠçã工倫ããŠããŸãããã©ãããŠãééã£ãèªã¿ã»ã¢ã¯ã»ã³ãã«ãªãå ŽåããããŸãã
ããŸã䜿ãããªãåºæåè©ã人åïŒç¹ã«ãã©ãã©ããŒã ïŒãªã©ãå
èµèŸæžã«ç»é²ãããŠããªãåèªã¯ãæ£ããèªã¿ã«ãªããªãããšãå€ãã§ãã
ããããåèªã®èªã¿æ¹ã¯èŸæžç»é²ã§å€æŽã§ããŸããAivisSpeech ãšãã£ã¿ãŸã㯠API ããåèªãç»é²ããŠã¿ãŠãã ããã
ãªããè€åèªãè±åèªã«é¢ããŠã¯ãåèªã®åªå
床ã«ããããããèŸæžãžã®ç»é²å
容ãåæ ãããªãããšããããŸããããã¯çŸæç¹ã§ã®ä»æ§ã«ãªããŸãã
Q. é·ãæç« ãäžåºŠã«é³å£°åæ API ã«éããšãé³å£°ãäžèªç¶ã«ãªã£ããã¡ã¢ãªãªãŒã¯ãçºçããŸãã
AivisSpeech Engine ã¯ãäžæãæå³ã®ãŸãšãŸããªã©ãæ¯èŒççãæã®åäœã§é³å£°åæããããšãæ³å®ããŠèšèšãããŠããŸãã
ãã®ããã1000 æåãè¶
ãããããªé·ãæç« ãäžåºŠã« /synthesis
API ã«éããšã以äžã®ãããªåé¡ãçºçããå¯èœæ§ããããŸãã
- ã¡ã¢ãªäœ¿çšéãæ¥æ¿ã«å¢å ããPC ã®åäœãé ããªã
- ã¡ã¢ãªãªãŒã¯ãçºçããAivisSpeech Engine ãã¯ã©ãã·ã¥ãã
- é³å£°ã®ææãäžèªç¶ã«ãªããæ£èªã¿ã®ãããªå£°ã«ãªã
é·ãæç« ãé³å£°åæããå Žåã¯ã以äžã®ãããªäœçœ®ã§æç« ãåºåã£ãŠãããããé³å£°åæ API ã«éä¿¡ããããšãããããããŸãã
ããŒããªãããã¯ãããŸããããé³å£°åæ1åã«ã€ã 500 æå以å
ãæãŸããã§ãã
- å¥èªç¹ïŒããããããïŒã®äœçœ®
- æã®æå³ã®åãç®ïŒæ®µèœã®åºåããªã©ïŒ
- äŒè©±æã®åºåãïŒããã§å²ãŸããéšåïŒ
[!TIP]
æã®æå³ã®åãç®ã§åå²ãããšãããèªç¶ãªææã®é³å£°ãçæã§ããåŸåããããŸãã
ããã¯ãäžåºŠã«é³å£°åæ API ã«éãããæç« å šäœã«ãããã¹ãã®å 容ã«å¯Ÿå¿ããææ è¡šçŸãææãé©çšãããããã§ãã
æç« ãé©åã«åå²ããããšã§ãåæã®ææ è¡šçŸãã€ã³ãããŒã·ã§ã³ããªã»ããããããèªç¶ãªèªã¿äžããå®çŸã§ããŸãã
AivisSpeech ãã¯ãããŠèµ·åãããšãã®ã¿ãã¢ãã«ããŒã¿ã®ããŠã³ããŒãã®ãããã€ã³ã¿ãŒãããã¢ã¯ã»ã¹ãå¿
èŠã«ãªããŸãã
2åç®ä»¥éã®èµ·åã§ã¯ãPC ããªãã©ã€ã³ã§ãã䜿ãããã ããŸãã
èµ·åäžã® AivisSpeech Engine ã®èšå®ç»é¢ã§è¡ããŸãã
AivisSpeech Engine èµ·åäžã«ãã©ãŠã¶ãã http://127.0.0.1:[AivisSpeech Engine ã®ããŒãçªå·]/setting
ã«ã¢ã¯ã»ã¹ãããšãAivisSpeech Engine ã®èšå®ç»é¢ãéããŸãã
AivisSpeech Engine ã®ããŒãçªå·ã®ããã©ã«ã㯠10101
ã§ãã
Q. GPU ã¢ãŒã (--use_gpu
) ã«åãæ¿ããã®ã«é³å£°çæã CPU ã¢ãŒããããé
ãã§ãã
CPU å èµã® GPU (iGPU) ã®ã¿ã® PC ã§ã GPU ã¢ãŒãã¯äœ¿ããŸãããã»ãšãã©ã®å Žå CPU ã¢ãŒãããããªãé ããªã£ãŠããŸããããããããã§ããŸããã
ããã¯ãCPU å
èµã® GPU ã¯ç¬ç«ãã GPU (dGPU) ã«æ¯ã¹ãŠæ§èœãäœããAI é³å£°åæã®ãããªéãåŠçãèŠæãªããã§ãã
äžæ¹ã§ãæè¿ã® CPU ã¯æ§èœã倧å¹
ã«åäžããŠãããCPU ã ãã§ãååé«éã«é³å£°ãçæã§ããŸãã
ãã®ãããdGPU éæèŒã® PC ã§ã¯ CPU ã¢ãŒãã®å©çšãããããããŸãã
Intel ã®ç¬¬ 12 äžä»£ä»¥éã® CPUïŒP ã³ã¢ã»E ã³ã¢ã®ãã€ããªããæ§æïŒæèŒ PC ãã䜿ãã®å ŽåãWindows ã®é»æºèšå®ã«ãã£ãŠé³å£°çæã®æ§èœã倧ããå€ããããšããããŸãã
ããã¯ãããã©ã«ãã®ããã©ã³ã¹ãã¢ãŒãã§ã¯ãé³å£°çæã¿ã¹ã¯ãçé»åéèŠã® E ã³ã¢ã«å²ãåœãŠãããããããã§ãã
以äžã®æé ã§èšå®ãå€æŽãããšãP ã³ã¢ãš E ã³ã¢ã®äž¡æ¹ãæ倧é掻çšããé³å£°çæãããé«éã«è¡ããŸãã
- Windows 11 ã®èšå®ãéã
- ã·ã¹ãã â é»æº ãšé²ã
- ãé»æºã¢ãŒããããæé©ãªããã©ãŒãã³ã¹ãã«å€æŽãã
â» ã³ã³ãããŒã«ããã«å
ãé»æºãã©ã³ãã«ããé«ããã©ãŒãã³ã¹ãèšå®ããããŸãããèšå®å
容ãç°ãªããŸãã
Intel 第 12 äžä»£ä»¥éã® CPU ã§ã¯ãWindows 11 ã®èšå®ç»é¢ããã®ãé»æºã¢ãŒããã®å€æŽãããããããŸãã
AivisSpeech ã¯ãå©çšçšéãæçžãããªããèªç±ãª AI é³å£°åæãœãããŠã§ã¢ãç®æããŠããŸãã
ïŒææç©ã§äœ¿ã£ãé³å£°åæã¢ãã«ã®ã©ã€ã»ã³ã¹æ¬¡ç¬¬ã§ã¯ãããŸããïŒå°ãªããšããœãããŠã§ã¢æ¬äœã¯ã¯ã¬ãžããè¡šèšäžèŠã§ãå人ã»æ³äººã»åçšã»éåçšãåãããèªç±ã«ã䜿ãããã ããŸãã
âŠãšã¯ãããããå€ãã®æ¹ã« AivisSpeech ã®ããšãç¥ã£ãŠããã ãããæ°æã¡ããããŸãã
ãããããã°ãææç©ã®ã©ããã« AivisSpeech ã®ããšãã¯ã¬ãžããããŠããã ãããšå¬ããã§ããïŒã¯ã¬ãžããã®è¡šèšãã©ãŒãããã¯ãä»»ãããŸããïŒ
以äžã®ãã©ã«ãã«ä¿åãããŠããŸãã
-
Windows:
C:\Users\(ãŠãŒã¶ãŒå)\AppData\Roaming\AivisSpeech-Engine\Logs
-
Mac:
~/Library/Application Support/AivisSpeech-Engine/Logs
-
Linux:
~/.local/share/AivisSpeech-Engine/Logs
äžå ·åãèŠã€ããããæ¹ã¯ã以äžã®ããããã®æ¹æ³ã§ãå ±åãã ããã
-
GitHub Issue (æšå¥š)
GitHub ã¢ã«ãŠã³ãããæã¡ã®æ¹ã¯ãGitHub ã® Issue ãããå ±åããã ããŸããšãæ©æã®å¯Ÿå¿ãå¯èœã§ãã -
Twitter (X)
Aivis Project å ¬åŒã¢ã«ãŠã³ã ãžã®ãªãã©ã€ã DMããŸãã¯ããã·ã¥ã¿ã° #AivisSpeech ãä»ãããã€ãŒãã§ãå ±åããã ããŸãã -
ãåãåãããã©ãŒã
Aivis Project ãåãåãããã©ãŒã ããããå ±åããã ããŸãã
ãªãã¹ã以äžã®æ å ±ãæ·»ããŠãå ±åããã ããŸããšãããè¿ éãªå¯Ÿå¿ãå¯èœã§ãã
- äžå ·åã®å 容
- åçŸæé ïŒåç»ãåçãããã°æ·»ä»ããŠãã ããïŒ
- OS ã®çš®é¡ã»AivisSpeech ã®ããŒãžã§ã³
- 解決ã®ããã«è©Šãããããš
- ãŠã€ã«ã¹å¯Ÿçãœãããªã©ã®æç¡ïŒé¢ä¿ãããããã§ããã°ïŒ
- 衚瀺ããããšã©ãŒã¡ãã»ãŒãž
- ãšã©ãŒãã°
VOICEVOX ã¯éåžžã«å·šå€§ãªãœãããŠã§ã¢ã§ãããçŸåšã掻çºã«éçºãç¶ããããŠããŸãã
ãã®ãããAivisSpeech Engine ã§ã¯ VOICEVOX ENGINE ã®ææ°çãããŒã¹ã«ã以äžã®æ¹éã§éçºãè¡ã£ãŠããŸãã
- VOICEVOX ææ°çãžã®è¿œåŸã容æã«ãããããã§ããã ãæ¹å€ãå¿
èŠæå°éã«çãã
- VOICEVOX ENGINE ãã AivisSpeech Engine ãžã®ãªãã©ã³ãã£ã³ã°ã¯å¿ èŠãªç®æã®ã¿è¡ã
-
voicevox_engine
ãã£ã¬ã¯ããªããªããŒã ãããš import æã®å€æŽå·®åãèšå€§ã«ãªãããããããŠãªãã©ã³ãã£ã³ã°ãè¡ããªã
- ãªãã¡ã¯ã¿ãªã³ã°ãè¡ããªã
- VOICEVOX ENGINE ãšã®ã³ã³ããªã¯ããçºçããããšã容æã«äºæ³ãããäžãã³ãŒãå šäœã«ç²ŸéããŠããããã§ã¯ãªããã
- AivisSpeech ã§å©çšããªãæ©èœ (æ声åææ©èœãªã©) ã§ãã£ãŠããã³ãŒãã®åé€ã¯è¡ããªã
- ãããã³ã³ããªã¯ããåé¿ãããã
- å©çšããªãã³ãŒãã®ç¡å¹åã¯åé€ã§ã¯ãªããã³ã¡ã³ãã¢ãŠãã§è¡ã
- VOICEVOX ENGINE ãšã®å·®åãæå°éã«æããããã倧éã«ã³ã¡ã³ãã¢ãŠããå¿ èŠãªå Žåã¯ã# ã§ã¯ãªã """ """ ã䜿ã
- ãã ããDockerfile ã GitHub Actions ãªã©ã®æ§æãã¡ã€ã«ããã«ãããŒã«é¡ã¯ãã®éãã§ã¯ãªã
- å ã AivisSpeech Engine ã§ã®æ¹å€éã倧ããéšåã«ã€ããã³ã¡ã³ãã¢ãŠãã§ã¯éåžžã«éå€ãªã³ãŒãã«ãªããã
- ä¿å®ãè¿œåŸãå°é£ãªãããããã¥ã¡ã³ãã®æŽæ°ã¯è¡ããªã
- ãã®ããåããã¥ã¡ã³ãã¯äžåæŽæ°ãããŠããããAivisSpeech Engine ã§ã®å€æŽãåæ ããŠããªã
- AivisSpeech Engine åãã®æ¹å€ã«ãšããªããã¹ãã³ãŒãã®ç¶æãå°é£ãªããããã¹ãã³ãŒãã®è¿œå ã¯è¡ããªã
- æ¢åã®ãã¹ãã³ãŒãã®ã¿ããã¹ããéãããã«äžéšç®æã®ä¿®æ£ãã³ã¡ã³ãã¢ãŠããè¡ããæ¶æ¥µçã«ç¶æãã
- AivisSpeech Engine ã§ã®æ¹å€ã«ããããã¹ãçµæã®ã¹ãããã·ã§ãã㯠VOICEVOX ENGINE ãšç°ãªã
- AivisSpeech Engine ã§ã®æ¹å€ã«ããåããªããªã£ããã¹ãã®ä¿®æ£ã¯è¡ãããã³ã¡ã³ãã¢ãŠãã§å¯Ÿå¿ãã
- AivisSpeech Engine åãã«æ°èŠéçºããç®æã¯ãä¿å®ã³ã¹ããéã¿ãã¹ãã³ãŒããè¿œå ããªã
- æ¢åã®ãã¹ãã³ãŒãã®ã¿ããã¹ããéãããã«äžéšç®æã®ä¿®æ£ãã³ã¡ã³ãã¢ãŠããè¡ããæ¶æ¥µçã«ç¶æãã
æé ã¯ãªãªãžãã«ã® VOICEVOX ENGINE ãšå€§å¹
ã«ç°ãªããŸãã
äºåã« Python 3.11 ãã€ã³ã¹ããŒã«ãããŠããå¿
èŠããããŸãã
# Poetry ãš pre-commit ãã€ã³ã¹ããŒã«
pip install poetry poetry-plugin-export pre-commit
# pre-commit ãæå¹å
pre-commit install
# äŸåé¢ä¿ããã¹ãŠã€ã³ã¹ããŒã«
poetry install
æé ã¯ãªãªãžãã«ã® VOICEVOX ENGINE ãšå€§å¹ ã«ç°ãªããŸãã
# éçºç°å¢ã§ AivisSpeech Engine ãèµ·å
poetry run task serve
# AivisSpeech Engine ã®ãã«ãã衚瀺
poetry run task serve --help
# ã³ãŒããã©ãŒããããèªåä¿®æ£
poetry run task format
# ã³ãŒããã©ãŒãããããã§ãã¯
poetry run task lint
# typos ã«ããã¿ã€ããã§ãã¯
poetry run task typos
# ãã¹ããå®è¡
poetry run task test
# ãã¹ãã®ã¹ãããã·ã§ãããæŽæ°
poetry run task update-snapshots
# ã©ã€ã»ã³ã¹æ
å ±ãæŽæ°
poetry run task update-licenses
# AivisSpeech Engine ããã«ã
poetry run task build
ããŒã¹ã§ãã VOICEVOX ENGINE ã®ãã¥ã¢ã«ã©ã€ã»ã³ã¹ã®ãã¡ãLGPL-3.0 ã®ã¿ãåç¬ã§ç¶æ¿ããŸãã
äžèšãªãã³ã« docs/ 以äžã®ããã¥ã¡ã³ãã¯ãVOICEVOX ENGINE æ¬å®¶ã®ããã¥ã¡ã³ããæ¹å€ãªãã§ãã®ãŸãŸåŒãç¶ãã§ããŸãããããã®ããã¥ã¡ã³ãã®å 容ã AivisSpeech Engine ã«ãéçšãããã¯ä¿èšŒãããŸããã
AivisSpeech Engine ã¯ãå€ãã®çŽ æŽããããªãŒãã³ãœãŒã¹ãœãããŠã§ã¢ãšãã®è²¢ç®ã«æ·±ãæ¯ããããŠããŸãã
ãªãŒãã³ãœãŒã¹ãœãããŠã§ã¢ãéçºããŠãã ãã£ãå
šãŠã®æ¹ã
ãã³ãã¥ããã£ã®çæ§ã®è²¢ç®ãšãµããŒãã«ãå¿ããæè¬ããããŸãã
- @litagin02
- @Stardust-minus
- @tuna2134
- @googlefan256
- @WariHima
- VOICEVOX ENGINE Contributors
- Everyone in AI声ã¥ããæè¡ç 究äŒ
VOICEVOX ã®ãšã³ãžã³ã§ãã
å®æ
㯠HTTP ãµãŒããŒãªã®ã§ããªã¯ãšã¹ããéä¿¡ããã°ããã¹ãé³å£°åæã§ããŸãã
ïŒãšãã£ã¿ãŒã¯ VOICEVOX ã ã³ã¢ã¯ VOICEVOX CORE ã å šäœæ§æ㯠ãã¡ã ã«è©³çŽ°ããããŸããïŒ
ç®çã«åãããã¬ã€ãã¯ãã¡ãã§ãã
- ãŠãŒã¶ãŒã¬ã€ã: é³å£°åæããããæ¹åã
- è²¢ç®è ã¬ã€ã: ã³ã³ããªãã¥ãŒããããæ¹åã
- éçºè ã¬ã€ã: ã³ãŒããå©çšãããæ¹åã
ãã¡ããã察å¿ãããšã³ãžã³ãããŠã³ããŒãããŠãã ããã
API ããã¥ã¡ã³ãããåç §ãã ããã
VOICEVOX ãšã³ãžã³ãããã¯ãšãã£ã¿ãèµ·åããç¶æ
㧠http://127.0.0.1:50021/docs ã«ã¢ã¯ã»ã¹ãããšãèµ·åäžã®ãšã³ãžã³ã®ããã¥ã¡ã³ãã確èªã§ããŸãã
ä»åŸã®æ¹éãªã©ã«ã€ããŠã¯ VOICEVOX é³å£°åæãšã³ãžã³ãšã®é£æº ãåèã«ãªããããããŸããã
docker pull voicevox/voicevox_engine:cpu-latest
docker run --rm -p '127.0.0.1:50021:50021' voicevox/voicevox_engine:cpu-latest
docker pull voicevox/voicevox_engine:nvidia-latest
docker run --rm --gpus all -p '127.0.0.1:50021:50021' voicevox/voicevox_engine:nvidia-latest
GPU çãå©çšããå Žåãç°å¢ã«ãã£ãŠãšã©ãŒãçºçããããšããããŸãããã®å Žåã--runtime=nvidia
ãdocker run
ã«ã€ããŠå®è¡ãããšè§£æ±ºã§ããããšããããŸãã
echo -n "ããã«ã¡ã¯ãé³å£°åæã®äžçãžãããã" >text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1"\
--get --data-urlencode [email protected] \
> query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio.wav
çæãããé³å£°ã¯ãµã³ããªã³ã°ã¬ãŒãã 24000Hz ãšå°ãç¹æ®ãªãããé³å£°ãã¬ãŒã€ãŒã«ãã£ãŠã¯åçã§ããªãå ŽåããããŸãã
speaker
ã«æå®ããå€ã¯ /speakers
ãšã³ããã€ã³ãã§åŸããã style_id
ã§ããäºææ§ã®ããã« speaker
ãšããååã«ãªã£ãŠããŸãã
/audio_query
ã§åŸãããé³å£°åæçšã®ã¯ãšãªã®ãã©ã¡ãŒã¿ãç·šéããããšã§ãé³å£°ã調æŽã§ããŸãã
äŸãã°ã話éã 1.5 åéã«ããŠã¿ãŸãã
echo -n "ããã«ã¡ã¯ãé³å£°åæã®äžçãžãããã" >text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1" \
--get --data-urlencode [email protected] \
> query.json
# sed ã䜿çšã㊠speedScale ã®å€ã 1.5 ã«å€æŽ
sed -i -r 's/"speedScale":[0-9.]+/"speedScale":1.5/' query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio_fast.wav
ãAquesTalk 颚èšæ³ãã¯ã«ã¿ã«ããšèšå·ã ãã§èªã¿æ¹ãæå®ããèšæ³ã§ããAquesTalk æ¬å®¶ã®èšæ³ãšã¯äžéšãç°ãªããŸãã
AquesTalk 颚èšæ³ã¯æ¬¡ã®ã«ãŒã«ã«åŸããŸãïŒ
- å šãŠã®ã«ãã¯ã«ã¿ã«ãã§èšè¿°ããã
- ã¢ã¯ã»ã³ãå¥ã¯
/
ãŸãã¯ã
ã§åºåããã
ã§åºåã£ãå Žåã«éãç¡é³åºéãæ¿å ¥ãããã - ã«ãã®æåã«
_
ãå ¥ãããšãã®ã«ãã¯ç¡å£°åããã - ã¢ã¯ã»ã³ãäœçœ®ã
'
ã§æå®ãããå šãŠã®ã¢ã¯ã»ã³ãå¥ã«ã¯ã¢ã¯ã»ã³ãäœçœ®ã 1 ã€æå®ããå¿ èŠãããã - ã¢ã¯ã»ã³ãå¥æ«ã«
ïŒ
(å šè§)ãå ¥ããããšã«ããçåæã®çºé³ãã§ãã
/audio_query
ã®ã¬ã¹ãã³ã¹ã«ã¯ãšã³ãžã³ãå€æããèªã¿æ¹ãAquesTalk 颚èšæ³ã§èšè¿°ãããŸãã
ãããä¿®æ£ããããšã§é³å£°ã®èªã¿ä»®åãã¢ã¯ã»ã³ããå¶åŸ¡ã§ããŸãã
# èªãŸãããæç« ãutf-8ã§text.txtã«æžãåºã
echo -n "ãã£ãŒãã©ãŒãã³ã°ã¯äžèœè¬ã§ã¯ãããŸãã" >text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1" \
--get --data-urlencode [email protected] \
> query.json
cat query.json | grep -o -E "\"kana\":\".*\""
# çµæ... "kana":"ãã£'ã€ã/ã©'ã¢ãã³ã°ã¯/ãã³ããªã€ã¯ãã¯ã¢ãªãã»'ã³"
# "ãã£ã€ãã©'ã¢ãã³ã°ã¯/ãã³ã'ãªã€ã¯ãã¯/ã¢ãªãã»'ã³"ãšèªãŸãããã®ã§ã
# is_kana=trueãã€ããŠã€ã³ãããŒã·ã§ã³ãååŸãnewphrases.jsonã«ä¿å
echo -n "ãã£ã€ãã©'ã¢ãã³ã°ã¯/ãã³ã'ãªã€ã¯ãã¯/ã¢ãªãã»'ã³" > kana.txt
curl -s \
-X POST \
"127.0.0.1:50021/accent_phrases?speaker=1&is_kana=true" \
--get --data-urlencode [email protected] \
> newphrases.json
# query.jsonã®"accent_phrases"ã®å
容ãnewphrases.jsonã®å
容ã«çœ®ãæãã
cat query.json | sed -e "s/\[{.*}\]/$(cat newphrases.json)/g" > newquery.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @newquery.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio.wav
API ãããŠãŒã¶ãŒèŸæžã®åç §ãåèªã®è¿œå ãç·šéãåé€ãè¡ãããšãã§ããŸãã
/user_dict
ã« GET ãªã¯ãšã¹ããæããããšã§ãŠãŒã¶ãŒèŸæžã®äžèŠ§ãååŸããããšãã§ããŸãã
curl -s -X GET "127.0.0.1:50021/user_dict"
/user_dict_word
ã« POST ãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã«åèªãè¿œå ããããšãã§ããŸãã
URL ãã©ã¡ãŒã¿ãšããŠã以äžãå¿
èŠã§ãã
- surface ïŒèŸæžã«ç»é²ããåèªïŒ
- pronunciation ïŒã«ã¿ã«ãã§ã®èªã¿æ¹ïŒ
- accent_type ïŒã¢ã¯ã»ã³ãæ žäœçœ®ãæŽæ°ïŒ
ã¢ã¯ã»ã³ãæ žäœçœ®ã«ã€ããŠã¯ããã¡ãã®æç« ãåèã«ãªãããšæããŸãã
ãåãšãªã£ãŠããæ°åã®éšåãã¢ã¯ã»ã³ãæ žäœçœ®ã«ãªããŸãã
https://tdmelodic.readthedocs.io/ja/latest/pages/introduction.html
æåããå Žåã®è¿ãå€ã¯åèªã«å²ãåœãŠããã UUID ã®æååã«ãªããŸãã
surface="test"
pronunciation="ãã¹ã"
accent_type="1"
curl -s -X POST "127.0.0.1:50021/user_dict_word" \
--get \
--data-urlencode "surface=$surface" \
--data-urlencode "pronunciation=$pronunciation" \
--data-urlencode "accent_type=$accent_type"
/user_dict_word/{word_uuid}
ã« PUT ãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã®åèªãä¿®æ£ããããšãã§ããŸãã
URL ãã©ã¡ãŒã¿ãšããŠã以äžãå¿
èŠã§ãã
- surface ïŒèŸæžã«ç»é²ããã¯ãŒãïŒ
- pronunciation ïŒã«ã¿ã«ãã§ã®èªã¿æ¹ïŒ
- accent_type ïŒã¢ã¯ã»ã³ãæ žäœçœ®ãæŽæ°ïŒ
word_uuid ã¯åèªè¿œå æã«ç¢ºèªã§ããã»ãããŠãŒã¶ãŒèŸæžãåç
§ããããšã§ã確èªã§ããŸãã
æåããå Žåã®è¿ãå€ã¯204 No Content
ã«ãªããŸãã
surface="test2"
pronunciation="ãã¹ãããŒ"
accent_type="2"
# ç°å¢ã«ãã£ãŠword_uuidã¯é©å®æžãæããŠãã ãã
word_uuid="cce59b5f-86ab-42b9-bb75-9fd3407f1e2d"
curl -s -X PUT "127.0.0.1:50021/user_dict_word/$word_uuid" \
--get \
--data-urlencode "surface=$surface" \
--data-urlencode "pronunciation=$pronunciation" \
--data-urlencode "accent_type=$accent_type"
/user_dict_word/{word_uuid}
ã« DELETE ãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã®åèªãåé€ããããšãã§ããŸãã
word_uuid ã¯åèªè¿œå æã«ç¢ºèªã§ããã»ãããŠãŒã¶ãŒèŸæžãåç
§ããããšã§ã確èªã§ããŸãã
æåããå Žåã®è¿ãå€ã¯204 No Content
ã«ãªããŸãã
# ç°å¢ã«ãã£ãŠword_uuidã¯é©å®æžãæããŠãã ãã
word_uuid="cce59b5f-86ab-42b9-bb75-9fd3407f1e2d"
curl -s -X DELETE "127.0.0.1:50021/user_dict_word/$word_uuid"
ãšã³ãžã³ã®èšå®ããŒãžå ã®ããŠãŒã¶ãŒèŸæžã®ãšã¯ã¹ããŒã&ã€ã³ããŒããç¯ã§ããŠãŒã¶ãŒèŸæžã®ã€ã³ããŒã&ãšã¯ã¹ããŒããå¯èœã§ãã
ä»ã«ã API ã§ãŠãŒã¶ãŒèŸæžã®ã€ã³ããŒã&ãšã¯ã¹ããŒããå¯èœã§ãã
ã€ã³ããŒãã«ã¯ POST /import_user_dict
ããšã¯ã¹ããŒãã«ã¯ GET /user_dict
ãå©çšããŸãã
åŒæ°çã®è©³çŽ°ã¯ API ããã¥ã¡ã³ããã芧ãã ããã
ãŠãŒã¶ãŒãã£ã¬ã¯ããªã«ããpresets.yaml
ãç·šéããããšã§ãã£ã©ã¯ã¿ãŒã話éãªã©ã®ããªã»ããã䜿ãããšãã§ããŸãã
echo -n "ããªã»ãããããŸã掻çšããã°ããµãŒãããŒãã£éã§åãèšå®ã䜿ãããšãã§ããŸã" >text.txt
# ããªã»ããæ
å ±ãååŸ
curl -s -X GET "127.0.0.1:50021/presets" > presets.json
preset_id=$(cat presets.json | sed -r 's/^.+"id"\:\s?([0-9]+?).+$/\1/g')
style_id=$(cat presets.json | sed -r 's/^.+"style_id"\:\s?([0-9]+?).+$/\1/g')
# é³å£°åæçšã®ã¯ãšãªãååŸ
curl -s \
-X POST \
"127.0.0.1:50021/audio_query_from_preset?preset_id=$preset_id"\
--get --data-urlencode [email protected] \
> query.json
# é³å£°åæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=$style_id" \
> audio.wav
-
speaker_uuid
ã¯ã/speakers
ã§ç¢ºèªã§ããŸã -
id
ã¯éè€ããŠã¯ãããŸãã - ãšã³ãžã³èµ·ååŸã«ãã¡ã€ã«ãæžãæãããšãšã³ãžã³ã«åæ ãããŸã
/synthesis_morphing
ã§ã¯ã2 çš®é¡ã®ã¹ã¿ã€ã«ã§ããããåæãããé³å£°ãå
ã«ãã¢ãŒãã£ã³ã°ããé³å£°ãçæããŸãã
echo -n "ã¢ãŒãã£ã³ã°ãå©çšããããšã§ãïŒçš®é¡ã®å£°ãæ··ããããšãã§ããŸãã" > text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=8"\
--get --data-urlencode [email protected] \
> query.json
# å
ã®ã¹ã¿ã€ã«ã§ã®åæçµæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=8" \
> audio.wav
export MORPH_RATE=0.5
# ã¹ã¿ã€ã«2çš®é¡åã®é³å£°åæ+WORLDã«ããé³å£°åæãå
¥ãããæéãæããã®ã§æ³šæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis_morphing?base_speaker=8&target_speaker=10&morph_rate=$MORPH_RATE" \
> audio.wav
export MORPH_RATE=0.9
# queryãbase_speakerãtarget_speakerãåãå Žåã¯ãã£ãã·ã¥ã䜿çšãããããæ¯èŒçé«éã«çæããã
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis_morphing?base_speaker=8&target_speaker=10&morph_rate=$MORPH_RATE" \
> audio.wav
è¿œå æ
å ±ã®äžã® portrait.png ãååŸããã³ãŒãã§ãã
ïŒjqã䜿çšã㊠json ãããŒã¹ããŠããŸããïŒ
curl -s -X GET "127.0.0.1:50021/speaker_info?speaker_uuid=7ffcb7ce-00ec-4bdc-82cd-45a8889e43ff" \
| jq -r ".portrait" \
| base64 -d \
> portrait.png
/cancellable_synthesis
ã§ã¯éä¿¡ãåæããå Žåã«å³åº§ã«èšç®ãªãœãŒã¹ãéæŸãããŸãã
(/synthesis
ã§ã¯éä¿¡ãåæããŠãæåŸãŸã§é³å£°åæã®èšç®ãè¡ãããŸã)
ãã® API ã¯å®éšçæ©èœã§ããããšã³ãžã³èµ·åæã«åŒæ°ã§--enable_cancellable_synthesis
ãæå®ããªããšæå¹åãããŸããã
é³å£°åæã«å¿
èŠãªãã©ã¡ãŒã¿ã¯/synthesis
ãšåæ§ã§ãã
echo -n '{
"notes": [
{ "key": null, "frame_length": 15, "lyric": "" },
{ "key": 60, "frame_length": 45, "lyric": "ã" },
{ "key": 62, "frame_length": 45, "lyric": "ã¬" },
{ "key": 64, "frame_length": 45, "lyric": "ã" },
{ "key": null, "frame_length": 15, "lyric": "" }
]
}' > score.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @score.json \
"127.0.0.1:50021/sing_frame_audio_query?speaker=6000" \
> query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/frame_synthesis?speaker=3001" \
> audio.wav
楜èã®key
㯠MIDI çªå·ã§ãã
lyric
ã¯æè©ã§ãä»»æã®æååãæå®ã§ããŸããããšã³ãžã³ã«ãã£ãŠã¯ã²ãããªã»ã«ã¿ã«ãïŒã¢ãŒã©ä»¥å€ã®æååã¯ãšã©ãŒã«ãªãããšããããŸãã
ãã¬ãŒã ã¬ãŒãã¯ããã©ã«ãã 93.75Hz ã§ããšã³ãžã³ãããã§ã¹ãã®frame_rate
ã§ååŸã§ããŸãã
ïŒã€ç®ã®ããŒãã¯ç¡é³ã§ããå¿
èŠããããŸãã
/sing_frame_audio_query
ã§æå®ã§ããspeaker
ã¯ã/singers
ã§ååŸã§ããã¹ã¿ã€ã«ã®å
ãçš®é¡ãsing
ãsinging_teacher
ãªã¹ã¿ã€ã«ã®style_id
ã§ãã
/frame_synthesis
ã§æå®ã§ããspeaker
ã¯ã/singers
ã§ååŸã§ããã¹ã¿ã€ã«ã®å
ãçš®é¡ãframe_decode
ã®style_id
ã§ãã
åŒæ°ã speaker
ãšããååã«ãªã£ãŠããã®ã¯ãä»ã® API ãšäžè²«æ§ãããããããã§ãã
/sing_frame_audio_query
ãš/frame_synthesis
ã«ç°ãªãã¹ã¿ã€ã«ãæå®ããããšãå¯èœã§ãã
VOICEVOX ã§ã¯ã»ãã¥ãªãã£ä¿è·ã®ããlocalhost
ã»127.0.0.1
ã»app://
ã»ãã©ãŠã¶æ¡åŒµ URIã»Origin ãªã以å€ã® Origin ãããªã¯ãšã¹ããåãå
¥ããªãããã«ãªã£ãŠããŸãã
ãã®ãããäžéšã®ãµãŒãããŒãã£ã¢ããªããã®ã¬ã¹ãã³ã¹ãåãåããªãå¯èœæ§ããããŸãã
ãããåé¿ããæ¹æ³ãšããŠããšã³ãžã³ããèšå®ã§ãã UI ãçšæããŠããŸãã
- http://127.0.0.1:50021/setting ã«ã¢ã¯ã»ã¹ããŸãã
- å©çšããã¢ããªã«åãããŠèšå®ãå€æŽãè¿œå ããŠãã ããã
- ä¿åãã¿ã³ãæŒããŠãå€æŽã確å®ããŠãã ããã
- èšå®ã®é©çšã«ã¯ãšã³ãžã³ã®åèµ·åãå¿ èŠã§ããå¿ èŠã«å¿ããŠåèµ·åãããŠãã ããã
å®è¡æåŒæ°--disable_mutable_api
ãç°å¢å€æ°VV_DISABLE_MUTABLE_API=1
ãæå®ããããšã§ããšã³ãžã³ã®èšå®ãèŸæžãªã©ãå€æŽãã API ãç¡å¹ã«ã§ããŸãã
ãªã¯ãšã¹ãã»ã¬ã¹ãã³ã¹ã®æåã³ãŒãã¯ãã¹ãŠ UTF-8 ã§ãã
ãšã³ãžã³èµ·åæã«åŒæ°ãæå®ã§ããŸãã詳ããããšã¯-h
åŒæ°ã§ãã«ãã確èªããŠãã ããã
$ python run.py -h
usage: run.py [-h] [--host HOST] [--port PORT] [--use_gpu] [--voicevox_dir VOICEVOX_DIR] [--voicelib_dir VOICELIB_DIR] [--runtime_dir RUNTIME_DIR] [--enable_mock] [--enable_cancellable_synthesis]
[--init_processes INIT_PROCESSES] [--load_all_models] [--cpu_num_threads CPU_NUM_THREADS] [--output_log_utf8] [--cors_policy_mode {CorsPolicyMode.all,CorsPolicyMode.localapps}]
[--allow_origin [ALLOW_ORIGIN ...]] [--setting_file SETTING_FILE] [--preset_file PRESET_FILE] [--disable_mutable_api]
VOICEVOX ã®ãšã³ãžã³ã§ãã
options:
-h, --help show this help message and exit
--host HOST æ¥ç¶ãåãä»ãããã¹ãã¢ãã¬ã¹ã§ãã
--port PORT æ¥ç¶ãåãä»ããããŒãçªå·ã§ãã
--use_gpu GPUã䜿ã£ãŠé³å£°åæããããã«ãªããŸãã
--voicevox_dir VOICEVOX_DIR
VOICEVOXã®ãã£ã¬ã¯ããªãã¹ã§ãã
--voicelib_dir VOICELIB_DIR
VOICEVOX COREã®ãã£ã¬ã¯ããªãã¹ã§ãã
--runtime_dir RUNTIME_DIR
VOICEVOX COREã§äœ¿çšããã©ã€ãã©ãªã®ãã£ã¬ã¯ããªãã¹ã§ãã
--enable_mock VOICEVOX COREã䜿ããã¢ãã¯ã§é³å£°åæãè¡ããŸãã
--enable_cancellable_synthesis
é³å£°åæãéäžã§ãã£ã³ã»ã«ã§ããããã«ãªããŸãã
--init_processes INIT_PROCESSES
cancellable_synthesisæ©èœã®åæåæã«çæããããã»ã¹æ°ã§ãã
--load_all_models èµ·åæã«å
šãŠã®é³å£°åæã¢ãã«ãèªã¿èŸŒã¿ãŸãã
--cpu_num_threads CPU_NUM_THREADS
é³å£°åæãè¡ãã¹ã¬ããæ°ã§ããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_CPU_NUM_THREADS ã®å€ã䜿ãããŸããVV_CPU_NUM_THREADS ã空æååã§ãªãæ°å€ã§ããªãå Žåã¯ãšã©ãŒçµäºããŸãã
--output_log_utf8 ãã°åºåãUTF-8ã§ãããªããŸããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_OUTPUT_LOG_UTF8 ã®å€ã䜿ãããŸããVV_OUTPUT_LOG_UTF8 ã®å€ã1ã®å Žåã¯UTF-8ã§ã0ãŸãã¯ç©ºæåãå€ããªãå Žåã¯ç°å¢ã«ãã£ãŠèªåçã«æ±ºå®ãããŸãã
--cors_policy_mode {CorsPolicyMode.all,CorsPolicyMode.localapps}
CORSã®èš±å¯ã¢ãŒããallãŸãã¯localappsãæå®ã§ããŸããallã¯ãã¹ãŠãèš±å¯ããŸããlocalappsã¯ãªãªãžã³éãªãœãŒã¹å
±æããªã·ãŒããapp://.ãšlocalhosté¢é£ã«éå®ããŸãããã®ä»ã®ãªãªãžã³ã¯allow_originãªãã·ã§ã³ã§è¿œå ã§ããŸããããã©ã«ãã¯localappsããã®ãªãã·ã§ã³ã¯--
setting_fileã§æå®ãããèšå®ãã¡ã€ã«ãããåªå
ãããŸãã
--allow_origin [ALLOW_ORIGIN ...]
èš±å¯ãããªãªãžã³ãæå®ããŸããã¹ããŒã¹ã§åºåãããšã§è€æ°æå®ã§ããŸãããã®ãªãã·ã§ã³ã¯--setting_fileã§æå®ãããèšå®ãã¡ã€ã«ãããåªå
ãããŸãã
--setting_file SETTING_FILE
èšå®ãã¡ã€ã«ãæå®ã§ããŸãã
--preset_file PRESET_FILE
ããªã»ãããã¡ã€ã«ãæå®ã§ããŸããæå®ããªãå Žåãç°å¢å€æ° VV_PRESET_FILEããŠãŒã¶ãŒãã£ã¬ã¯ããªã®presets.yamlãé ã«æ¢ããŸãã
--disable_mutable_api
èŸæžç»é²ãèšå®å€æŽãªã©ããšã³ãžã³ã®éçãªããŒã¿ãå€æŽããAPIãç¡å¹åããŸããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_DISABLE_MUTABLE_API ã®å€ã䜿ãããŸããVV_DISABLE_MUTABLE_API ã®å€ã1ã®å Žåã¯ç¡å¹åã§ã0ãŸãã¯ç©ºæåãå€ããªãå Žåã¯ç¡èŠãããŸãã
ãšã³ãžã³ãã£ã¬ã¯ããªå ã«ãããã¡ã€ã«ãå šãŠæ¶å»ããæ°ãããã®ã«çœ®ãæããŠãã ããã
VOICEVOX ENGINE ã¯çããã®ã³ã³ããªãã¥ãŒã·ã§ã³ããåŸ
ã¡ããŠããŸãïŒ
詳现㯠CONTRIBUTING.md ãã芧ãã ããã
ãŸã VOICEVOX éå
¬åŒ Discord ãµãŒããŒã«ãŠãéçºã®è°è«ãéè«ãè¡ã£ãŠããŸããæ°è»œã«ãåå ãã ããã
ãªããIssue ã解決ãããã«ãªã¯ãšã¹ããäœæãããéã¯ãå¥ã®æ¹ãšåã Issue ã«åãçµãããšãé¿ãããããIssue åŽã§åãçµã¿å§ããããšãäŒããããæåã« Draft ãã«ãªã¯ãšã¹ããäœæããããšãæšå¥šããŠããŸãã
Python 3.11.9
ãçšããŠéçºãããŠããŸãã
ã€ã³ã¹ããŒã«ããã«ã¯ãå OS ããšã® C/C++ ã³ã³ãã€ã©ãCMake ãå¿
èŠã«ãªããŸãã
# å®è¡ç°å¢ã®ã€ã³ã¹ããŒã«
python -m pip install -r requirements.txt
# éçºç°å¢ã»ãã¹ãç°å¢ã»ãã«ãç°å¢ã®ã€ã³ã¹ããŒã«
python -m pip install -r requirements-dev.txt -r requirements-build.txt
ã³ãã³ãã©ã€ã³åŒæ°ã®è©³çŽ°ã¯ä»¥äžã®ã³ãã³ãã§ç¢ºèªããŠãã ããã
python run.py --help
# 補åç VOICEVOX ã§ãµãŒããŒãèµ·å
VOICEVOX_DIR="C:/path/to/VOICEVOX/vv-engine" # 補åç VOICEVOX ãã£ã¬ã¯ããªå
ã® ENGINE ã®ãã¹
python run.py --voicevox_dir=$VOICEVOX_DIR
# ã¢ãã¯ã§ãµãŒããŒèµ·å
python run.py --enable_mock
# ãã°ãUTF8ã«å€æŽ
python run.py --output_log_utf8
# ããã㯠VV_OUTPUT_LOG_UTF8=1 python run.py
CPU ã¹ã¬ããæ°ãæªæå®ã®å Žåã¯ãè«çã³ã¢æ°ã®ååã䜿ãããŸããïŒæ®ã©ã® CPU ã§ãããã¯å
šäœã®åŠçèœåã®ååã§ãïŒ
ãã IaaS äžã§å®è¡ããŠããããå°çšãµãŒããŒã§å®è¡ããŠããå Žåãªã©ã
ãšã³ãžã³ã䜿ãåŠçèœåã調ç¯ãããå Žåã¯ãCPU ã¹ã¬ããæ°ãæå®ããããšã§å®çŸã§ããŸãã
- å®è¡æåŒæ°ã§æå®ãã
python run.py --voicevox_dir=$VOICEVOX_DIR --cpu_num_threads=4
- ç°å¢å€æ°ã§æå®ãã
export VV_CPU_NUM_THREADS=4 python run.py --voicevox_dir=$VOICEVOX_DIR
VOICEVOX Core 0.5.4 以éã®ã³ã¢ã䜿çšããäºãå¯èœã§ãã
Mac ã§ã® libtorch çã³ã¢ã®ãµããŒãã¯ããŠããŸããã
補åç VOICEVOX ãããã¯ã³ã³ãã€ã«æžã¿ãšã³ãžã³ã®ãã£ã¬ã¯ããªã--voicevox_dir
åŒæ°ã§æå®ãããšããã®ããŒãžã§ã³ã®ã³ã¢ã䜿çšãããŸãã
python run.py --voicevox_dir="/path/to/VOICEVOX/vv-engine"
Mac ã§ã¯ãDYLD_LIBRARY_PATH
ã®æå®ãå¿
èŠã§ãã
DYLD_LIBRARY_PATH="/path/to/voicevox" python run.py --voicevox_dir="/path/to/VOICEVOX/vv-engine"
VOICEVOX Core ã® zip ãã¡ã€ã«ã解åãããã£ã¬ã¯ããªã--voicelib_dir
åŒæ°ã§æå®ããŸãã
ãŸããã³ã¢ã®ããŒãžã§ã³ã«åãããŠãlibtorchãonnxruntime (å
±æã©ã€ãã©ãª) ã®ãã£ã¬ã¯ããªã--runtime_dir
åŒæ°ã§æå®ããŸãã
ãã ããã·ã¹ãã ã®æ¢çŽ¢ãã¹äžã« libtorchãonnxruntime ãããå Žåã--runtime_dir
åŒæ°ã®æå®ã¯äžèŠã§ãã
--voicelib_dir
åŒæ°ã--runtime_dir
åŒæ°ã¯è€æ°å䜿çšå¯èœã§ãã
API ãšã³ããã€ã³ãã§ã³ã¢ã®ããŒãžã§ã³ãæå®ããå Žåã¯core_version
åŒæ°ãæå®ããŠãã ãããïŒæªæå®ã®å Žåã¯ææ°ã®ã³ã¢ã䜿çšãããŸãïŒ
python run.py --voicelib_dir="/path/to/voicevox_core" --runtime_dir="/path/to/libtorch_or_onnx"
Mac ã§ã¯ã--runtime_dir
åŒæ°ã®ä»£ããã«DYLD_LIBRARY_PATH
ã®æå®ãå¿
èŠã§ãã
DYLD_LIBRARY_PATH="/path/to/onnx" python run.py --voicelib_dir="/path/to/voicevox_core"
以äžã®ãã£ã¬ã¯ããªã«ããé³å£°ã©ã€ãã©ãªã¯èªåã§èªã¿èŸŒãŸããŸãã
- ãã«ãç:
<user_data_dir>/voicevox-engine/core_libraries/
- Python ç:
<user_data_dir>/voicevox-engine-dev/core_libraries/
<user_data_dir>
㯠OS ã«ãã£ãŠç°ãªããŸãã
- Windows:
C:\Users\<username>\AppData\Local\
- macOS:
/Users/<username>/Library/Application\ Support/
- Linux:
/home/<username>/.local/share/
pyinstaller
ãçšããããã±ãŒãžåãš Dockerfile ãçšããã³ã³ããåã«ããããŒã«ã«ã§ãã«ããå¯èœã§ãã
æé ã®è©³çŽ°ã¯ è²¢ç®è
ã¬ã€ã#ãã«ã ã埡芧ãã ããã
GitHub ãçšããå Žåãfork ãããªããžããªã§ GitHub Actions ã«ãããã«ããå¯èœã§ãã
Actions ã ON ã«ããworkflow_dispatch ã§build-engine-package.yml
ãèµ·åããã°ãã«ãã§ããŸãã
ææç©ã¯ Release ã«ã¢ããããŒããããŸãã
ãã«ãã«å¿
èŠãª GitHub Actions ã®èšå®ã¯ è²¢ç®è
ã¬ã€ã#GitHub Actions ã埡芧ãã ããã
pytest
ãçšãããã¹ããšåçš®ãªã³ã¿ãŒãçšããéç解æãå¯èœã§ãã
æé ã®è©³çŽ°ã¯ è²¢ç®è
ã¬ã€ã#ãã¹ã, è²¢ç®è
ã¬ã€ã#éç解æ ã埡芧ãã ããã
äŸåé¢ä¿ã¯ poetry
ã§ç®¡çãããŠããŸãããŸããå°å
¥å¯èœãªäŸåã©ã€ãã©ãªã«ã¯ã©ã€ã»ã³ã¹äžã®å¶çŽããããŸãã
詳现㯠貢ç®è
ã¬ã€ã#ããã±ãŒãž ã埡芧ãã ããã
VOICEVOX ãšãã£ã¿ãŒã§ã¯ãè€æ°ã®ãšã³ãžã³ãåæã«èµ·åããããšãã§ããŸãã ãã®æ©èœãå©çšããããšã§ãèªäœã®é³å£°åæãšã³ãžã³ãæ¢åã®é³å£°åæãšã³ãžã³ã VOICEVOX ãšãã£ã¿ãŒäžã§åããããšãå¯èœã§ãã
VOICEVOX API ã«æºæ ããè€æ°ã®ãšã³ãžã³ã® Web API ãããŒããåããŠèµ·åããçµ±äžçã«æ±ãããšã§ãã«ããšã³ãžã³æ©èœãå®çŸããŠããŸãã ãšãã£ã¿ãŒãããããã®ãšã³ãžã³ãå®è¡ãã€ããªçµç±ã§èµ·åããEngineID ãšçµã³ã€ããŠèšå®ãç¶æ ãåå¥ç®¡çããŸãã
VOICEVOX API æºæ ãšã³ãžã³ãèµ·åããå®è¡ãã€ããªãäœãããšã§å¯Ÿå¿ãå¯èœã§ãã VOICEVOX ENGINE ãªããžããªã fork ããäžéšã®æ©èœãæ¹é ããã®ãç°¡åã§ãã
æ¹é ãã¹ãç¹ã¯ãšã³ãžã³æ å ±ã»ãã£ã©ã¯ã¿ãŒæ å ±ã»é³å£°åæã®ïŒç¹ã§ãã
ãšã³ãžã³ã®æ
å ±ã¯ã«ãŒãçŽäžã®ãããã§ã¹ããã¡ã€ã«ïŒengine_manifest.json
ïŒã§ç®¡çãããŠããŸãã
ãã®åœ¢åŒã®ãããã§ã¹ããã¡ã€ã«ã¯ VOICEVOX API æºæ ãšã³ãžã³ã«å¿
é ã§ãã
ãããã§ã¹ããã¡ã€ã«å
ã®æ
å ±ãèŠãŠé©å®å€æŽããŠãã ããã
é³å£°åæææ³ã«ãã£ãŠã¯ãäŸãã°ã¢ãŒãã£ã³ã°æ©èœãªã©ãVOICEVOX ãšåãæ©èœãæã€ããšãã§ããªãå ŽåããããŸãã
ãã®å Žåã¯ãããã§ã¹ããã¡ã€ã«å
ã®supported_features
å
ã®æ
å ±ãé©å®å€æŽããŠãã ããã
ãã£ã©ã¯ã¿ãŒæ
å ±ã¯resources/character_info
ãã£ã¬ã¯ããªå
ã®ãã¡ã€ã«ã§ç®¡çãããŠããŸãã
ãããŒã®ã¢ã€ã³ã³ãªã©ãçšæãããŠããã®ã§é©å®å€æŽããŠãã ããã
é³å£°åæã¯voicevox_engine/tts_pipeline/tts_engine.py
ã§è¡ãããŠããŸãã
VOICEVOX API ã§ã®é³å£°åæã¯ããšã³ãžã³åŽã§é³å£°åæçšã®ã¯ãšãª AudioQuery
ã®åæå€ãäœæããŠãŠãŒã¶ãŒã«è¿ãããŠãŒã¶ãŒãå¿
èŠã«å¿ããŠã¯ãšãªãç·šéããããšããšã³ãžã³ãã¯ãšãªã«åŸã£ãŠé³å£°åæããããšã§å®çŸããŠããŸãã
ã¯ãšãªäœæã¯/audio_query
ãšã³ããã€ã³ãã§ãé³å£°åæã¯/synthesis
ãšã³ããã€ã³ãã§è¡ã£ãŠãããæäœãã®ïŒã€ã«å¯Ÿå¿ããã° VOICEVOX API ã«æºæ ããããšã«ãªããŸãã
VVPP ãã¡ã€ã«ãšããŠé
åžããã®ãããããã§ãã
VVPP ã¯ãVOICEVOX ãã©ã°ã€ã³ããã±ãŒãžãã®ç¥ã§ãäžèº«ã¯ãã«ããããšã³ãžã³ãªã©ãå«ãã ãã£ã¬ã¯ããªã® Zip ãã¡ã€ã«ã§ãã
æ¡åŒµåã.vvpp
ã«ãããšãããã«ã¯ãªãã¯ã§ VOICEVOX ãšãã£ã¿ãŒã«ã€ã³ã¹ããŒã«ã§ããŸãã
ãšãã£ã¿ãŒåŽã¯åãåã£ã VVPP ãã¡ã€ã«ãããŒã«ã«ãã£ã¹ã¯äžã« Zip å±éããããšãã«ãŒãã®çŽäžã«ããengine_manifest.json
ã«åŸã£ãŠãã¡ã€ã«ãæ¢æ»ããŸãã
VOICEVOX ãšãã£ã¿ãŒã«ããŸãèªã¿èŸŒãŸããããªããšãã¯ããšãã£ã¿ãŒã®ãšã©ãŒãã°ãåç
§ããŠãã ããã
ãŸããxxx.vvpp
ã¯åå²ããŠé£çªãä»ããxxx.0.vvppp
ãã¡ã€ã«ãšããŠé
åžããããšãå¯èœã§ãã
ããã¯ãã¡ã€ã«å®¹éã倧ãããŠé
åžãå°é£ãªå Žåã«æçšã§ãã
ã€ã³ã¹ããŒã«ã«å¿
èŠãªvvpp
ããã³vvppp
ãã¡ã€ã«ã¯vvpp.txt
ãã¡ã€ã«ã«ãªã¹ãã¢ããããŠããŸãã
voicevox-client @voicevox-client  VOICEVOX ENGINE ã®åèšèªåã API ã©ãããŒ
LGPL v3 ãšããœãŒã¹ã³ãŒãã®å
¬éãäžèŠãªå¥ã©ã€ã»ã³ã¹ã®ãã¥ã¢ã«ã©ã€ã»ã³ã¹ã§ãã
å¥ã©ã€ã»ã³ã¹ãååŸãããå Žåã¯ãããã«æ±ããŠãã ããã
X ã¢ã«ãŠã³ã: @hiho_karuta
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for AivisSpeech-Engine
Similar Open Source Tools

AivisSpeech-Engine
AivisSpeech-Engine is a powerful open-source tool for speech recognition and synthesis. It provides state-of-the-art algorithms for converting speech to text and text to speech. The tool is designed to be user-friendly and customizable, allowing developers to easily integrate speech capabilities into their applications. With AivisSpeech-Engine, users can transcribe audio recordings, create voice-controlled interfaces, and generate natural-sounding speech output. Whether you are building a virtual assistant, developing a speech-to-text application, or experimenting with voice technology, AivisSpeech-Engine offers a comprehensive solution for all your speech processing needs.

AivisSpeech
AivisSpeech is a Japanese text-to-speech software based on the VOICEVOX editor UI. It incorporates the AivisSpeech Engine for generating emotionally rich voices easily. It supports AIVMX format voice synthesis model files and specific model architectures like Style-Bert-VITS2. Users can download AivisSpeech and AivisSpeech Engine for Windows and macOS PCs, with minimum memory requirements specified. The development follows the latest version of VOICEVOX, focusing on minimal modifications, rebranding only where necessary, and avoiding refactoring. The project does not update documentation, maintain test code, or refactor unused features to prevent conflicts with VOICEVOX.

chatgpt-on-wechat
This project is a smart chatbot based on a large model, supporting WeChat, WeChat Official Account, Feishu, and DingTalk access. You can choose from GPT3.5/GPT4.0/Claude/Wenxin Yanyi/Xunfei Xinghuo/Tongyi Qianwen/Gemini/LinkAI/ZhipuAI, which can process text, voice, and images, and access external resources such as operating systems and the Internet through plugins, supporting the development of enterprise AI applications based on proprietary knowledge bases.

Streamer-Sales
Streamer-Sales is a large model for live streamers that can explain products based on their characteristics and inspire users to make purchases. It is designed to enhance sales efficiency and user experience, whether for online live sales or offline store promotions. The model can deeply understand product features and create tailored explanations in vivid and precise language, sparking user's desire to purchase. It aims to revolutionize the shopping experience by providing detailed and unique product descriptions to engage users effectively.

GitHubSentinel
GitHub Sentinel is an intelligent information retrieval and high-value content mining AI Agent designed for the era of large models (LLMs). It is aimed at users who need frequent and large-scale information retrieval, especially open source enthusiasts, individual developers, and investors. The main features include subscription management, update retrieval, notification system, report generation, multi-model support, scheduled tasks, graphical interface, containerization, continuous integration, and the ability to track and analyze the latest dynamics of GitHub open source projects and expand to other information channels like Hacker News for comprehensive information mining and analysis capabilities.

chatgpt-web-sea
ChatGPT Web Sea is an open-source project based on ChatGPT-web for secondary development. It supports all models that comply with the OpenAI interface standard, allows for model selection, configuration, and extension, and is compatible with OneAPI. The tool includes a Chinese ChatGPT tuning guide, supports file uploads, and provides model configuration options. Users can interact with the tool through a web interface, configure models, and perform tasks such as model selection, API key management, and chat interface setup. The project also offers Docker deployment options and instructions for manual packaging.

MoneyPrinterTurbo
MoneyPrinterTurbo is a tool that can automatically generate video content based on a provided theme or keyword. It can create video scripts, materials, subtitles, and background music, and then compile them into a high-definition short video. The tool features a web interface and an API interface, supporting AI-generated video scripts, customizable scripts, multiple HD video sizes, batch video generation, customizable video segment duration, multilingual video scripts, multiple voice synthesis options, subtitle generation with font customization, background music selection, access to high-definition and copyright-free video materials, and integration with various AI models like OpenAI, moonshot, Azure, and more. The tool aims to simplify the video creation process and offers future plans to enhance voice synthesis, add video transition effects, provide more video material sources, offer video length options, include free network proxies, enable real-time voice and music previews, support additional voice synthesis services, and facilitate automatic uploads to YouTube platform.

chatgpt-webui
ChatGPT WebUI is a user-friendly web graphical interface for various LLMs like ChatGPT, providing simplified features such as core ChatGPT conversation and document retrieval dialogues. It has been optimized for better RAG retrieval accuracy and supports various search engines. Users can deploy local language models easily and interact with different LLMs like GPT-4, Azure OpenAI, and more. The tool offers powerful functionalities like GPT4 API configuration, system prompt setup for role-playing, and basic conversation features. It also provides a history of conversations, customization options, and a seamless user experience with themes, dark mode, and PWA installation support.

dify-chat
Dify Chat Web is an AI conversation web app based on the Dify API, compatible with DeepSeek, Dify Chatflow/Workflow applications, and Agent Mind Chain output information. It supports multiple scenarios, flexible deployment without backend dependencies, efficient integration with reusable React components, and style customization for unique business system styles.

Nano
Nano is a Transformer-based autoregressive language model for personal enjoyment, research, modification, and alchemy. It aims to implement a specific and lightweight Transformer language model based on PyTorch, without relying on Hugging Face. Nano provides pre-training and supervised fine-tuning processes for models with 56M and 168M parameters, along with LoRA plugins. It supports inference on various computing devices and explores the potential of Transformer models in various non-NLP tasks. The repository also includes instructions for experiencing inference effects, installing dependencies, downloading and preprocessing data, pre-training, supervised fine-tuning, model conversion, and various other experiments.

gzm-design
Gzm Design is a free and open-source poster designer developed using the latest mainstream technologies such as Vue3, Vite4, TypeScript, etc. It provides features like PSD import, JSON import, multiple pages support, shortcut key support, template import, layer management, ruler tool, pen tool, element editing, preview, file download, canvas zooming and dragging, border stroke, filling, blending modes, text formatting, group handling, canvas size modification, rich text support, masking, shadow effects, undo/redo functionality, QR code tool, barcode tool, and ruler line npm package encapsulation.

AirPower4T
AirPower4T is a development base library based on Vue3 TypeScript Element Plus Vite, using decorators, object-oriented, Hook and other front-end development methods. It provides many common components and some feedback components commonly used in background management systems, and provides a lot of enums and decorators.

MINI_LLM
This project is a personal implementation and reproduction of a small-parameter Chinese LLM. It mainly refers to these two open source projects: https://github.com/charent/Phi2-mini-Chinese and https://github.com/DLLXW/baby-llama2-chinese. It includes the complete process of pre-training, SFT instruction fine-tuning, DPO, and PPO (to be done). I hope to share it with everyone and hope that everyone can work together to improve it!

wealth-tracker
Wealth Tracker is a personal finance management tool designed to help users track their income, expenses, and investments in one place. With intuitive features and customizable categories, users can easily monitor their financial health and make informed decisions. The tool provides detailed reports and visualizations to analyze spending patterns and set financial goals. Whether you are budgeting, saving for a big purchase, or planning for retirement, Wealth Tracker offers a comprehensive solution to manage your money effectively.

wechat-bot
WeChat Bot is a simple and easy-to-use WeChat robot based on chatgpt and wechaty. It can help you automatically reply to WeChat messages or manage WeChat groups/friends. The tool requires configuration of AI services such as Xunfei, Kimi, or ChatGPT. Users can customize the tool to automatically reply to group or private chat messages based on predefined conditions. The tool supports running in Docker for easy deployment and provides a convenient way to interact with various AI services for WeChat automation.
For similar tasks

AivisSpeech-Engine
AivisSpeech-Engine is a powerful open-source tool for speech recognition and synthesis. It provides state-of-the-art algorithms for converting speech to text and text to speech. The tool is designed to be user-friendly and customizable, allowing developers to easily integrate speech capabilities into their applications. With AivisSpeech-Engine, users can transcribe audio recordings, create voice-controlled interfaces, and generate natural-sounding speech output. Whether you are building a virtual assistant, developing a speech-to-text application, or experimenting with voice technology, AivisSpeech-Engine offers a comprehensive solution for all your speech processing needs.

npcsh
`npcsh` is a python-based command-line tool designed to integrate Large Language Models (LLMs) and Agents into one's daily workflow by making them available and easily configurable through the command line shell. It leverages the power of LLMs to understand natural language commands and questions, execute tasks, answer queries, and provide relevant information from local files and the web. Users can also build their own tools and call them like macros from the shell. `npcsh` allows users to take advantage of agents (i.e. NPCs) through a managed system, tailoring NPCs to specific tasks and workflows. The tool is extensible with Python, providing useful functions for interacting with LLMs, including explicit coverage for popular providers like ollama, anthropic, openai, gemini, deepseek, and openai-like providers. Users can set up a flask server to expose their NPC team for use as a backend service, run SQL models defined in their project, execute assembly lines, and verify the integrity of their NPC team's interrelations. Users can execute bash commands directly, use favorite command-line tools like VIM, Emacs, ipython, sqlite3, git, pipe the output of these commands to LLMs, or pass LLM results to bash commands.

agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.

zep-python
Zep is an open-source platform for building and deploying large language model (LLM) applications. It provides a suite of tools and services that make it easy to integrate LLMs into your applications, including chat history memory, embedding, vector search, and data enrichment. Zep is designed to be scalable, reliable, and easy to use, making it a great choice for developers who want to build LLM-powered applications quickly and easily.

lollms
LoLLMs Server is a text generation server based on large language models. It provides a Flask-based API for generating text using various pre-trained language models. This server is designed to be easy to install and use, allowing developers to integrate powerful text generation capabilities into their applications.

LlamaIndexTS
LlamaIndex.TS is a data framework for your LLM application. Use your own data with large language models (LLMs, OpenAI ChatGPT and others) in Typescript and Javascript.

semantic-kernel
Semantic Kernel is an SDK that integrates Large Language Models (LLMs) like OpenAI, Azure OpenAI, and Hugging Face with conventional programming languages like C#, Python, and Java. Semantic Kernel achieves this by allowing you to define plugins that can be chained together in just a few lines of code. What makes Semantic Kernel _special_ , however, is its ability to _automatically_ orchestrate plugins with AI. With Semantic Kernel planners, you can ask an LLM to generate a plan that achieves a user's unique goal. Afterwards, Semantic Kernel will execute the plan for the user.

botpress
Botpress is a platform for building next-generation chatbots and assistants powered by OpenAI. It provides a range of tools and integrations to help developers quickly and easily create and deploy chatbots for various use cases.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.