
AivisSpeech-Engine
AivisSpeech Engine: AI Voice Imitation System - Text to Speech Engine
Stars: 92

AivisSpeech-Engine is a powerful open-source tool for speech recognition and synthesis. It provides state-of-the-art algorithms for converting speech to text and text to speech. The tool is designed to be user-friendly and customizable, allowing developers to easily integrate speech capabilities into their applications. With AivisSpeech-Engine, users can transcribe audio recordings, create voice-controlled interfaces, and generate natural-sounding speech output. Whether you are building a virtual assistant, developing a speech-to-text application, or experimenting with voice technology, AivisSpeech-Engine offers a comprehensive solution for all your speech processing needs.
README:
ð AivisSpeech Engine: AI Voice Imitation System - Text to Speech Engine
AivisSpeech Engine ã¯ãVOICEVOX ENGINE ãããŒã¹ã«ãããæ¥æ¬èªé³å£°åæãšã³ãžã³ã§ãã
æ¥æ¬èªé³å£°åæãœãããŠã§ã¢ã® AivisSpeech ã«çµã¿èŸŒãŸããŠãããããããã«ææ
è±ããªé³å£°ãçæã§ããŸãã
- ãŠãŒã¶ãŒã®æ¹ãž
- åäœç°å¢
- ãµããŒããããŠããé³å£°åæã¢ãã«
- å°å ¥æ¹æ³
- é³å£°åæ API ã䜿ã
- VOICEVOX API ãšã®äºææ§ã«ã€ããŠ
- ãããã質å / Q&A
- éçºæ¹é
- éçºç°å¢ã®æ§ç¯
- éçº
- ã©ã€ã»ã³ã¹
AivisSpeech ã®äœ¿ãæ¹ããæ¢ãã®æ¹ã¯ãAivisSpeech å ¬åŒãµã€ã ãã芧ãã ããã
ãã®ããŒãžã§ã¯ãäž»ã«éçºè
åãã®æ
å ±ãæ²èŒããŠããŸãã
以äžã¯ãŠãŒã¶ãŒã®æ¹åãã®ããã¥ã¡ã³ãã§ãã
Windowsã»macOSã»Linux æèŒã® PC ã«å¯Ÿå¿ããŠããŸãã
AivisSpeech Engine ãèµ·åããã«ã¯ãPC ã« 1.5GB 以äžã®ç©ºãã¡ã¢ãª (RAM) ãå¿
èŠã§ãã
- Windows: Windows 10 (22H2 以é)ã»Windows 11
- macOS: macOS 13 Ventura 以é
- Linux: Ubuntu 20.04 以é
[!TIP] ãã¹ã¯ãããã¢ããªã§ãã AivisSpeech ã¯ãWindowsã»macOS ã®ã¿ãµããŒã察象ãšããŠããŸãã
äžæ¹ãé³å£°åæ API ãµãŒããŒã§ãã AivisSpeech Engine ã¯ãUbuntu / Debian 系㮠Linux ã§ãå©çšã§ããŸãã
[!NOTE] Intel CPU æèŒ Mac ã§ã®åäœã¯ç©æ¥µçã«æ€èšŒããŠããŸããã
Intel CPU æèŒ Mac ã¯ãã§ã«è£œé ãçµäºããŠãããæ€èšŒç°å¢ããã«ãç°å¢ã®çšæèªäœãé£ãããªã£ãŠããŠããŸãããªãã¹ã Apple Silicon æèŒ Mac ã§ã®å©çšãããããããããŸãã
[!WARNING] Windows 10 ã§ã¯ãããŒãžã§ã³ 22H2 ã§ã®åäœç¢ºèªã®ã¿è¡ã£ãŠããŸãã
ãµããŒããçµäºãã Windows 10 ã®å€ãããŒãžã§ã³ã§ã¯ãAivisSpeech Engine ãã¯ã©ãã·ã¥ãèµ·åã«å€±æããäºäŸãå ±åãããŠããŸãã
ã»ãã¥ãªãã£äžã®èŠ³ç¹ããããWindows 10 ç°å¢ã®æ¹ã¯ãæäœéããŒãžã§ã³ 22H2 ãŸã§æŽæ°ããŠããã®å©çšã匷ãããããããããŸãã
AivisSpeech Engine ã¯ãAIVMX (Aivis Voice Model for ONNX) (æ¡åŒµå .aivmx
) ãã©ãŒãããã®é³å£°åæã¢ãã«ãã¡ã€ã«ããµããŒãããŠããŸãã
AIVM (Aivis Voice Model) / AIVMX (Aivis Voice Model for ONNX) ã¯ãåŠç¿æžã¿ã¢ãã«ã»ãã€ããŒãã©ã¡ãŒã¿ã»ã¹ã¿ã€ã«ãã¯ãã«ã»è©±è ã¡ã¿ããŒã¿ïŒååã»æŠèŠã»ã©ã€ã»ã³ã¹ã»ã¢ã€ã³ã³ã»ãã€ã¹ãµã³ãã« ãªã©ïŒã 1 ã€ã®ãã¡ã€ã«ã«ã®ã¥ããšãŸãšãããAI é³å£°åæã¢ãã«çšãªãŒãã³ãã¡ã€ã«ãã©ãŒãããã§ãã
AIVM ä»æ§ã AIVM / AIVMX ãã¡ã€ã«ã«ã€ããŠã®è©³çŽ°ã¯ãAivis Project ã«ãŠçå®ãã AIVM ä»æ§ ããåç §ãã ããã
[!NOTE]
ãAIVMãã¯ãAIVM / AIVMX äž¡æ¹ã®ãã©ãŒãããä»æ§ã»ã¡ã¿ããŒã¿ä»æ§ã®ç·ç§°ã§ããããŸãã
å ·äœçã«ã¯ãAIVM ãã¡ã€ã«ã¯ãAIVM ã¡ã¿ããŒã¿ãè¿œå ãã Safetensors 圢åŒããAIVMX ãã¡ã€ã«ã¯ãAIVM ã¡ã¿ããŒã¿ãè¿œå ãã ONNX 圢åŒãã®ã¢ãã«ãã¡ã€ã«ã§ãã
ãAIVM ã¡ã¿ããŒã¿ããšã¯ãAIVM ä»æ§ã«å®çŸ©ãããŠãããåŠç¿æžã¿ã¢ãã«ã«çŽã¥ãåçš®ã¡ã¿ããŒã¿ã®ããšããããŸãã
[!IMPORTANT]
AivisSpeech Engine 㯠AIVM ä»æ§ã®ãªãã¡ã¬ã³ã¹å®è£ ã§ããããŸãããæ¢ã㊠AIVMX ãã¡ã€ã«ã®ã¿ããµããŒãããèšèšãšããŠããŸãã
ããã«ãããPyTorch ãžã®äŸåãæé€ããŠã€ã³ã¹ããŒã«ãµã€ãºãåæžããONNX Runtime ã«ããé«é㪠CPU æšè«ãå®çŸããŠããŸãã
[!TIP]
AIVM Generator ã䜿ããšãæ¢åã®é³å£°åæã¢ãã«ãã AIVM / AIVMX ãã¡ã€ã«ãçæããããæ¢åã® AIVM / AIVMX ãã¡ã€ã«ã®ã¡ã¿ããŒã¿ãç·šéãããã§ããŸãïŒ
以äžã®ã¢ãã«ã¢ãŒããã¯ãã£ã® AIVMX ãã¡ã€ã«ãå©çšã§ããŸãã
Style-Bert-VITS2
Style-Bert-VITS2 (JP-Extra)
[!NOTE] AIVM ã¡ã¿ããŒã¿ã®ä»æ§äžã¯å€èšèªå¯Ÿå¿ã®è©±è ãå®çŸ©ã§ããŸãããAivisSpeech Engine 㯠VOICEVOX ENGINE ãšåæ§ã«ãæ¥æ¬èªé³å£°åæã®ã¿ã«å¯Ÿå¿ããŠããŸãã
ãã®ãããè±èªãäžåœèªã«å¯Ÿå¿ããé³å£°åæã¢ãã«ã§ãã£ãŠããæ¥æ¬èªä»¥å€ã®é³å£°åæã¯ã§ããŸããã
AIVMX ãã¡ã€ã«ã¯ãOS ããšã«ä»¥äžã®ãã©ã«ãã«é 眮ããŠãã ããã
-
Windows:
C:\Users\(ãŠãŒã¶ãŒå)\AppData\Roaming\AivisSpeech-Engine\Models
-
macOS:
~/Library/Application Support/AivisSpeech-Engine/Models
-
Linux:
~/.local/share/AivisSpeech-Engine/Models
å®éã®ãã©ã«ããã¹ã¯ãAivisSpeech Engine ã®èµ·åçŽåŸã®ãã°ã« Models directory:
ãšããŠè¡šç€ºãããŸãã
[!TIP]
AivisSpeech å©çšæã¯ãAivisSpeech ã® UI ç»é¢ããç°¡åã«é³å£°åæã¢ãã«ãè¿œå ã§ããŸãïŒ
ãšã³ããŠãŒã¶ãŒã®æ¹ã¯ãåºæ¬çã«ãã¡ãã®æ¹æ³ã§é³å£°åæã¢ãã«ãè¿œå ããããšãããããããŸãã
[!IMPORTANT] éçºç (PyInstaller ã§ãã«ããããŠããªãç¶æ ã§å®è¡ããŠããå Žå) ã®é 眮ãã©ã«ãã¯ã
AivisSpeech-Engine
以äžã§ã¯ãªãAivisSpeech-Engine-Dev
以äžãšãªããŸãã
AivisSpeech Engine ã§ã¯ã以äžã®ãããªäŸ¿å©ãªã³ãã³ãã©ã€ã³ãªãã·ã§ã³ãå©çšã§ããŸãïŒ
-
--host 0.0.0.0
ãæå®ãããšãåäžãããã¯ãŒã¯å ã®ä»ã®ç«¯æ«ããã AivisSpeech Engine ãžã¢ã¯ã»ã¹ã§ããããã«ãªããŸãã -
--cors_policy_mode all
ãæå®ãããšããã¹ãŠã®ãã¡ã€ã³ããã® CORS ãªã¯ãšã¹ããèš±å¯ããŸãã -
--load_all_models
ãæå®ãããšãAivisSpeech Engine ã®èµ·åæã«ãã€ã³ã¹ããŒã«ãããŠãããã¹ãŠã®é³å£°åæã¢ãã«ãäºåã«ããŒãããŸãã -
--help
ãæå®ãããšãå©çšå¯èœãªãã¹ãŠã®ãªãã·ã§ã³ã®äžèŠ§ãšèª¬æã衚瀺ããŸãã
ãã®ä»ã«ãå€ãã®ãªãã·ã§ã³ãçšæãããŠããŸãã詳现㯠--help
ãªãã·ã§ã³ã§ã確èªãã ããã
[!TIP]
--use_gpu
ãªãã·ã§ã³ãä»ããŠå®è¡ãããšãWindows ã§ã¯ DirectML ãLinux ã§ã¯ NVIDIA GPU (CUDA) ã掻çšããé«éã«é³å£°åæãè¡ããŸãã
ãªããWindows ç°å¢ã§ã¯ CPU å èµã® GPU (iGPU) ã®ã¿ã® PC ã§ã DirectML æšè«ãè¡ããŸãããã»ãšãã©ã®å Žå CPU æšè«ãããããªãé ããªã£ãŠããŸããããããããã§ããŸããã
詳现㯠ãããã質å ãåç §ããŠãã ããã
[!NOTE] AivisSpeech Engine ã¯ãããã©ã«ãã§ã¯ããŒãçªå·
10101
ã§åäœããŸãã
ä»ã®ã¢ããªã±ãŒã·ã§ã³ãšç«¶åããå Žåã¯ã--port
ãªãã·ã§ã³ã§ä»»æã®ããŒãçªå·ã«å€æŽã§ããŸãã
[!WARNING] VOICEVOX ENGINE ãšç°ãªããäžéšã®ãªãã·ã§ã³ã¯ AivisSpeech Engine ã§ã¯æªå®è£ ã§ãã
Windows / macOS ã§ã¯ãAivisSpeech Engine ãåç¬ã§ã€ã³ã¹ããŒã«ããããšãã§ããŸãããAivisSpeech æ¬äœã«ä»å±ãã AivisSpeech Engine ãåç¬ã§èµ·åãããæ¹ãããç°¡åã§ãã
AivisSpeech ã«å梱ãããŠãã AivisSpeech Engine ã®å®è¡ãã¡ã€ã« (run.exe
/ run
) ã®ãã¹ã¯ä»¥äžã®ãšããã§ãã
-
Windows:
C:\Program Files\AivisSpeech\AivisSpeech-Engine\run.exe
- ãŠãŒã¶ãŒæš©éã§ã€ã³ã¹ããŒã«ãããŠããå Žåã
C:\Users\(ãŠãŒã¶ãŒå)\AppData\Local\Programs\AivisSpeech\AivisSpeech-Engine\run.exe
ãšãªããŸãã
- ãŠãŒã¶ãŒæš©éã§ã€ã³ã¹ããŒã«ãããŠããå Žåã
-
macOS:
/Applications/AivisSpeech.app/Contents/Resources/AivisSpeech-Engine/run
- ãŠãŒã¶ãŒæš©éã§ã€ã³ã¹ããŒã«ãããŠããå Žåã
~/Applications/AivisSpeech.app/Contents/Resources/AivisSpeech-Engine/run
ãšãªããŸãã
- ãŠãŒã¶ãŒæš©éã§ã€ã³ã¹ããŒã«ãããŠããå Žåã
[!NOTE] ååèµ·åæã¯ããã©ã«ãã¢ãã« (çŽ 250MB) ãšæšè«æã«å¿ èŠãª BERT ã¢ãã« (çŽ 650MB) ãèªåçã«ããŠã³ããŒããããé¢ä¿ã§ãèµ·åå®äºãŸã§æ倧æ°åã»ã©ããããŸãã
èµ·åå®äºãŸã§ãã°ãããåŸ ã¡ãã ããã
AivisSpeech Engine ã«é³å£°åæã¢ãã«ãè¿œå ããã«ã¯ãã¢ãã«ãã¡ã€ã«ã®é
çœ®å Žæ ãã芧ãã ããã
AivisSpeech å
ã®ãèšå®ãâãé³å£°åæã¢ãã«ã®ç®¡çãããè¿œå ããããšãå¯èœã§ãã
Linux + NVIDIA GPU ç°å¢ã§å®è¡ããéã¯ãONNX Runtime ã察å¿ãã CUDA / cuDNN ããŒãžã§ã³ãšãã¹ãç°å¢ã® CUDA / cuDNN ããŒãžã§ã³ãäžèŽããŠããå¿
èŠããããåäœæ¡ä»¶ãå³ããã§ãã
å
·äœçã«ã¯ãAivisSpeech Engine ã§å©çšããŠãã ONNX Runtime 㯠CUDA 12.x / cuDNN 9.x 以äžãèŠæ±ããŸãã
Docker ã§ããã°ãã¹ã OS ã®ç°å¢ã«é¢ãããåäœããŸãã®ã§ãDocker ã§ã®å°å ¥ãããããããŸãã
Docker ã³ã³ãããå®è¡ããéã¯ãåžžã« ~/.local/share/AivisSpeech-Engine
ãã³ã³ããå
ã® /home/user/.local/share/AivisSpeech-Engine-Dev
ã«ããŠã³ãããŠãã ããã
ããããããšã§ãã³ã³ãããåæ¢ã»åèµ·åããåŸã§ããã€ã³ã¹ããŒã«ããé³å£°åæã¢ãã«ã BERT ã¢ãã«ãã£ãã·ã¥ (çŽ 650MB) ãç¶æã§ããŸãã
Docker ç°å¢ã® AivisSpeech Engine ã«é³å£°åæã¢ãã«ãè¿œå ããã«ã¯ããã¹ãç°å¢ã® ~/.local/share/AivisSpeech-Engine/Models
以äžã«ã¢ãã«ãã¡ã€ã« (.aivmx) ãé
眮ããŠãã ããã
[!IMPORTANT] å¿ ã
/home/user/.local/share/AivisSpeech-Engine-Dev
ã«å¯ŸããŠããŠã³ãããŠãã ããã
Docker ã€ã¡ãŒãžäžã® AivisSpeech Engine 㯠PyInstaller ã§ãã«ããããŠããªããããããŒã¿ãã©ã«ãåã«ã¯-Dev
ã® Suffix ãä»äžããAivisSpeech-Engine-Dev
ãšãªããŸãã
docker pull ghcr.io/aivis-project/aivisspeech-engine:cpu-latest
docker run --rm -p '10101:10101' \
-v ~/.local/share/AivisSpeech-Engine:/home/user/.local/share/AivisSpeech-Engine-Dev \
ghcr.io/aivis-project/aivisspeech-engine:cpu-latest
docker pull ghcr.io/aivis-project/aivisspeech-engine:nvidia-latest
docker run --rm --gpus all -p '10101:10101' \
-v ~/.local/share/AivisSpeech-Engine:/home/user/.local/share/AivisSpeech-Engine-Dev \
ghcr.io/aivis-project/aivisspeech-engine:nvidia-latest
Bash ã§ä»¥äžã®ã¯ã³ã©ã€ããŒãå®è¡ãããšãaudio.wav
ã«é³å£°åæãã WAV ãã¡ã€ã«ãåºåãããŸãã
[!IMPORTANT]
äºåã« AivisSpeech Engine ãèµ·åããŠããŠããã€ãã°ã«è¡šç€ºãããModels directory:
以äžã®ãã£ã¬ã¯ããªã«ãã¹ã¿ã€ã« ID ã«å¯Ÿå¿ããé³å£°åæã¢ãã« (.aivmx) ãæ ŒçŽãããŠããããšãåæã§ãã
# STYLE_ID ã¯é³å£°åæ察象ã®ã¹ã¿ã€ã« ID ãå¥é /speakers API ããååŸãå¿
èŠ
STYLE_ID=888753760 && \
echo -n "ããã«ã¡ã¯ãé³å£°åæã®äžçãžããããïŒ" > text.txt && \
curl -s -X POST "127.0.0.1:10101/audio_query?speaker=$STYLE_ID" --get --data-urlencode [email protected] > query.json && \
curl -s -H "Content-Type: application/json" -X POST -d @query.json "127.0.0.1:10101/synthesis?speaker=$STYLE_ID" > audio.wav && \
rm text.txt query.json
[!TIP] 詳ãã API ãªã¯ãšã¹ãã»ã¬ã¹ãã³ã¹ä»æ§ã¯ API ããã¥ã¡ã³ã ã VOICEVOX API ãšã®äºææ§ã«ã€ã㊠ããåç §ãã ãããAPI ããã¥ã¡ã³ãã§ã¯ãææ°ã®éçºçã§ã®å€æŽãéæåæ ããŠããŸãã
èµ·åäžã® AivisSpeech Engine ã® API ããã¥ã¡ã³ã (Swagger UI) ã¯ãAivisSpeech Engine ããã㯠AivisSpeech ãšãã£ã¿ãèµ·åããç¶æ ã§ãhttp://127.0.0.1:10101/docs ã«ã¢ã¯ã»ã¹ãããšç¢ºèªã§ããŸãã
AivisSpeech Engine ã¯ãæŠã VOICEVOX ENGINE ã® HTTP API ãšäºææ§ããããŸãã
VOICEVOX ENGINE ã® HTTP API ã«å¯Ÿå¿ãããœãããŠã§ã¢ã§ããã°ãAPI URL ã http://127.0.0.1:10101
ã«å·®ãæ¿ããã ãã§ãAivisSpeech Engine ã«å¯Ÿå¿ã§ããã¯ãã§ãã
[!IMPORTANT]
ãã ããAPI ã¯ã©ã€ã¢ã³ãåŽã§/audio_query
API ããååŸããAudioQuery
ã®å 容ãç·šéããŠãã/synthesis
API ã«æž¡ããŠããå Žåã¯ãä»æ§å·®ç°ã«ããæ£åžžã«é³å£°åæã§ããªãå ŽåããããŸã (åŸè¿°) ããã®é¢ä¿ã§ãAivisSpeech ãšãã£ã¿ã¯ AivisSpeech Engine ãš VOICEVOX ENGINE ã®äž¡æ¹ãå©çšã§ããŸããïŒãã«ããšã³ãžã³æ©èœå©çšæïŒãVOICEVOX ãšãã£ã¿ãã AivisSpeech Engine ãå©çšããããšã¯ã§ããŸããã
VOICEVOX ãšãã£ã¿ã§ AivisSpeech Engine ãå©çšãããšããšãã£ã¿ã®å®è£ äžã®å¶éã«ããé³å£°åæã®å質ãèããäœäžããŸããAivisSpeech Engine ç¬èªã®ãã©ã¡ãŒã¿ã掻çšã§ããªããªãã»ããé察å¿æ©èœã®åŒã³åºãã§ãšã©ãŒãçºçããå¯èœæ§ããããŸãã
ããè¯ãé³å£°åæçµæãåŸããããAivisSpeech ãšãã£ã¿ã§ã®å©çšã匷ãããããããŸãã
[!NOTE]
äžè¬ç㪠API ãŠãŒã¹ã±ãŒã¹ã«ãããŠã¯æŠãäºææ§ãããã¯ãã§ãããæ ¹æ¬çã«ç°ãªãã¢ãã«ã¢ãŒããã¯ãã£ã®é³å£°åæã·ã¹ãã ã匷åŒã«åäžã® API ä»æ§ã«åããŠããé¢ä¿ã§ãäžèšä»¥å€ã«ãäºææ§ã®ãªã API ããããããããŸããã
Issue ã«ãŠå ±åé ããã°ãäºææ§æ¹åãå¯èœãªãã®ã«é¢ããŠã¯ä¿®æ£ããããŸãã
VOICEVOX ENGINE ããã® API ä»æ§ã®å€æŽç¹ã¯æ¬¡ã®ãšããã§ãã
AIVMX ãã¡ã€ã«ã«å«ãŸãã AIVM ãããã§ã¹ãå
ã®è©±è
ã¹ã¿ã€ã«ã®ããŒã«ã« ID ã¯ã話è
ããšã« 0 ããå§ãŸãé£çªã§ç®¡çãããŠããŸãã
Style-Bert-VITS2 ã¢ãŒããã¯ãã£ã®é³å£°åæã¢ãã«ã§ã¯ããã®å€ã¯ã¢ãã«ã®ãã€ããŒãã©ã¡ãŒã¿ data.style2id
ã®å€ãšäžèŽããŸãã
äžæ¹ãVOICEVOX ENGINE ã® API ã§ã¯ãæŽå²ççµç·¯ãããã話è
UUIDã(speaker_uuid
) ãæå®ããããã¹ã¿ã€ã« IDã(style_id
) ã®ã¿ãé³å£°åæ API ã«æž¡ãä»æ§ãšãªã£ãŠããŸãã
VOICEVOX ENGINE ã§ã¯æèŒãããŠãã話è
ãã¹ã¿ã€ã«ã¯åºå®ã®ãããéçºåŽã§ãã¹ã¿ã€ã« IDããäžæã«ç®¡çã§ããŠããŸããã
äžæ¹ãAivisSpeech Engine ã§ã¯ããŠãŒã¶ãŒãèªç±ã«é³å£°åæã¢ãã«ãè¿œå ã§ããä»æ§ãšãªã£ãŠããŸãã
ãã®ãããVOICEVOX API äºæã®ãã¹ã¿ã€ã« IDãã¯ãã©ã®ãããªé³å£°åæã¢ãã«ãè¿œå ãããŠãäžæãªå€ã§ããå¿
èŠããããŸãã
ããã¯ãäžæãªå€ã§ãªãå Žåãæ°ããé³å£°åæã¢ãã«ãè¿œå ããéã«æ¢åã®ã¢ãã«ã«å«ãŸãã話è
ã¹ã¿ã€ã«ãšã¹ã¿ã€ã« ID ãéè€ããŠããŸãå¯èœæ§ãããããã§ãã
ãã㧠AivisSpeech Engine ã§ã¯ãAIVM ãããã§ã¹ãäžã®è©±è
UUID ãšã¹ã¿ã€ã« ID ãçµã¿åãããŠãVOICEVOX API äºæã®ã°ããŒãã«ã«äžæãªãã¹ã¿ã€ã« IDããçæããŠããŸãã
å
·äœçãªçææ¹æ³ã¯ä»¥äžã®ãšããã§ãã
- 話è UUID ã MD5 ããã·ã¥å€ã«å€æãã
- ãã®ããã·ã¥å€ã®äžäœ 27bit ãšããŒã«ã«ã¹ã¿ã€ã« ID ã® 5bit (0 ~ 31) ãçµã¿åããã
- 32bit 笊å·ä»ãæŽæ°ã«å€æãã
[!WARNING]
ãã®é¢ä¿ã§ããã¹ã¿ã€ã« IDãã« 32bit 笊å·ä»ãæŽæ°ãå ¥ãããšãæ³å®ããŠããªã VOICEVOX API 察å¿ãœãããŠã§ã¢ã§ã¯ãäºæãã¬äžå ·åãçºçããå¯èœæ§ããããŸãã
[!WARNING]
32bit 笊å·ä»ãæŽæ°ã®ç¯å²ã«åããããã«è©±è UUID ã®ã°ããŒãã«ãªäžææ§ãç ç²ã«ããŠããããã極ããŠäœã確çã§ãããç°ãªã話è ã®ã¹ã¿ã€ã« ID ãéè€ïŒè¡çªïŒããå¯èœæ§ããããŸãã
çŸæç¹ã§ã¹ã¿ã€ã« ID ãéè€ããéã®åé¿çã¯ãããŸããããçŸå®çã«ã¯ã»ãšãã©ã®ã±ãŒã¹ã§åé¡ã«ãªããªããšèããããŸãã
[!TIP]
AivisSpeech Engine ã«ãã£ãŠèªåçæããã VOICEVOX API äºæã®ãã¹ã¿ã€ã« IDãã¯ã/speakers
API ããååŸã§ããŸãã
ãã® API ã¯ãAivisSpeech Engine ã«ã€ã³ã¹ããŒã«ãããŠãã話è æ å ±ã®äžèŠ§ãè¿ããŸãã
AudioQuery
åã¯ãããã¹ããé³çŽ åãæå®ããŠé³å£°åæãè¡ãããã®ã¯ãšãªã§ãã
VOICEVOX ENGINE ã® AudioQuery
åããã®äž»ãªå€æŽç¹ã¯ä»¥äžã®ãšããã§ãã
-
intonationScale
ãã£ãŒã«ãã®æå³ãç°ãªããŸãã- VOICEVOX ENGINE ã§ã¯ãå šäœã®ææããè¡šããã©ã¡ãŒã¿ã§ããããAivisSpeech Engine ã§ã¯ãå šäœã®ã¹ã¿ã€ã«ã®åŒ·ãããè¡šããã©ã¡ãŒã¿ãšãªã£ãŠããŸãã
- 話è ã¹ã¿ã€ã«ã®å£°è²ã®åŒ·åŒ±ã 0.0 ~ 2.0 ã®ç¯å²ã§æå®ããŸã (ããã©ã«ã: 1.0) ã
- å€ã倧ããã»ã©ãéžæããã¹ã¿ã€ã«ã«è¿ãææãã€ãã声ã«ãªããŸãã
- äŸãã°ããããããã¹ã¿ã€ã«ãªããå€ã倧ããã»ã©ããå¬ããããªæãã話ãæ¹ã«ãªããŸãã
- ãã ãã話è ãã¹ã¿ã€ã«ã«ãã£ãŠã¯æ°å€ãäžãããããšäžèªç¶ãªå£°ã«ãªãå ŽåããããŸãã
- å šã¹ã¿ã€ã«ã®å¹³åã§ããããŒãã«ã¹ã¿ã€ã«ã«ã¯æå®ã§ããŸããïŒå€ã«ãããããç¡èŠãããŸãïŒã
- Style-Bert-VITS2 ã«ããããã¹ã¿ã€ã«ã®åŒ·ãããã©ã¡ãŒã¿ã¯ãAivisSpeech Engine ã®
intonationScale
ã«å€æãããéã«ä»¥äžã®ããã«å€æãããŸãã-
intonationScale
ã 0.0 ~ 1.0 ã®å ŽåãStyle-Bert-VITS2 ã§ã¯ 0.0 ~ 1.0 ã®ç¯å²ã«çžåœããŸãã -
intonationScale
ã 1.0 ~ 2.0 ã®å ŽåãStyle-Bert-VITS2 ã§ã¯ 1.0 ~ 10.0 ã®ç¯å²ã«çžåœããŸãã
-
-
tempoDynamicsScale
ãã£ãŒã«ããç¬èªã«è¿œå ãããŸããã- AivisSpeech Engine åºæã®ãã©ã¡ãŒã¿ã§ãã話ãéãã®ç·©æ¥ã®åŒ·åŒ±ã 0.0 ~ 2.0 ã®ç¯å²ã§æå®ã§ããŸãïŒããã©ã«ã: 1.0ïŒã
- å€ã倧ããã»ã©ãããæ©å£ã§çã£ãœãææãã€ãã声ã«ãªããŸãã
- Style-Bert-VITS2 ã«ãããããã³ãã®ç·©æ¥ããã©ã¡ãŒã¿ã¯ãAivisSpeech Engine ã®
tempoDynamicsScale
ã«å€æãããéã«ä»¥äžã®ããã«å€æãããŸãã-
tempoDynamicsScale
ã 0.0 ~ 1.0 ã®å ŽåãStyle-Bert-VITS2 ã§ã¯ 0.0 ~ 0.2 ã®ç¯å²ã«çžåœããŸãã -
tempoDynamicsScale
ã 1.0 ~ 2.0 ã®å ŽåãStyle-Bert-VITS2 ã§ã¯ 0.2 ~ 1.0 ã®ç¯å²ã«çžåœããŸãã
-
-
pitchScale
ãã£ãŒã«ãã®ä»æ§ãç°ãªããŸãã- VOICEVOX ENGINE ãšç°ãªãããã®å€ã 0.0 ããå€æŽãããšé³è³ªãå£åããå¯èœæ§ããããŸãã
-
pauseLength
ããã³pauseLengthScale
ãã£ãŒã«ãã¯ãµããŒããããŠããŸããã- äºææ§ã®ãããã£ãŒã«ããšããŠååšã¯ããŸãããåžžã«ç¡èŠãããŸãã
-
kana
ãã£ãŒã«ãã®ä»æ§ãç°ãªããŸãã- VOICEVOX ENGINE ã§ã¯ AquesTalk 颚èšæ³ããã¹ããå ¥ãèªã¿åãå°çšãã£ãŒã«ãã§ããããAivisSpeech Engine ã§ã¯éåžžã®èªã¿äžãããã¹ããæå®ãããã£ãŒã«ããšããŠå©çšããŠããŸãã
- null ã空æååãæå®ãããå Žåã¯ãã¢ã¯ã»ã³ãå¥ããèªåçæãããã²ãããªæååãèªã¿äžãããã¹ããšãªããŸãããäžèªç¶ãªã€ã³ãããŒã·ã§ã³ã«ãªãå¯èœæ§ããããŸãã
- ããèªç¶ãªé³å£°åæçµæãåŸããããå¯èœãªéãéåžžã®èªã¿äžãããã¹ããæå®ããããšãæšå¥šããŸãã
å€æŽç¹ã®è©³çŽ°ã¯ãmodel.py ãåç §ããŠãã ããã
Mora
åã¯ãèªã¿äžãããã¹ãã®ã¢ãŒã©ãè¡šãããŒã¿æ§é ã§ãã
[!TIP]
ã¢ãŒã©ãšã¯ãå®éã«çºé³ãããéã®é³ã®ãŸãšãŸãã®æå°åäœïŒããããããããããªã©ïŒã®ããšã§ãã
Mora
ååç¬ã§ API ãªã¯ãšã¹ãã»ã¬ã¹ãã³ã¹ã«äœ¿ãããããšã¯ãªããåžžã«AudioQuery.accent_phrases[n].moras
ãŸãã¯AudioQuery.accent_phrases[n].pause_mora
ãéããŠéæ¥çã«å©çšãããŸãã
VOICEVOX ENGINE ã® Mora
åããã®äž»ãªå€æŽç¹ã¯ä»¥äžã®ãšããã§ãã
-
èšå·ãã¢ãŒã©ãšããŠæ±ãããŸãã
- VOICEVOX ENGINE ã§ã¯æå笊ã»å¥èªç¹ãªã©ã®èšå·ã¯
pause_mora
ãšããŠæ±ãããŠããŸããããAivisSpeech Engine ã§ã¯éåžžã®ã¢ãŒã©ãšããŠæ±ãããŸãã - èšå·ã¢ãŒã©ã®å Žåã
text
ã«ã¯èšå·ããã®ãŸãŸãvowel
ã«ã¯ "pau" ãèšå®ãããŸãã
- VOICEVOX ENGINE ã§ã¯æå笊ã»å¥èªç¹ãªã©ã®èšå·ã¯
-
consonant
/vowel
ãã£ãŒã«ãã¯èªã¿åãå°çšã§ãã- é³å£°åææã®ããã¹ãã®èªã¿ã«ã¯ãåžžã«
text
ãã£ãŒã«ãã®å€ãå©çšãããŸãã - ãããã®ãã£ãŒã«ãã®å€ãå€æŽããŠããé³å£°åæçµæã«ã¯åœ±é¿ããŸããã
- é³å£°åææã®ããã¹ãã®èªã¿ã«ã¯ãåžžã«
-
consonant_length
/vowel_length
/pitch
ãã£ãŒã«ãã¯ãµããŒããããŠããŸããã- AivisSpeech Engine ã®å®è£ äžããããã®å€ãç®åºããããšãã§ããªããããåžžã«ãããŒå€ãšã㊠0.0 ãè¿ãããŸãã
- äºææ§ã®ãããã£ãŒã«ããšããŠååšã¯ããŸãããåžžã«ç¡èŠãããŸãã
å€æŽç¹ã®è©³çŽ°ã¯ãtts_pipeline/model.py ãåç §ããŠãã ããã
Preset
åã¯ããšãã£ã¿åŽã§é³å£°åæã¯ãšãªã®åæå€ã決å®ããããã®ããªã»ããæ
å ±ã§ãã
å€æŽç¹ã¯ãAudioQuery
åã§èª¬æãã intonationScale
/ tempoDynamicsScale
/ pitchScale
/ pauseLength
/ pauseLengthScale
ã®ãã£ãŒã«ãã®ä»æ§å€æŽã«æŠã察å¿ããŠããŸãã
å€æŽç¹ã®è©³çŽ°ã¯ãpreset/model.py ãåç §ããŠãã ããã
[!WARNING]
æ声åæç³» API ãšããã£ã³ã»ã«å¯èœãªé³å£°åæ API ã¯ãµããŒããããŠããŸããã
äºææ§ã®ãããšã³ããã€ã³ããšããŠååšã¯ããŸãããåžžã«501 Not Implemented
ãè¿ããŸãã
詳现㯠app/routers/character.py / app/routers/tts_pipeline.py ã確èªããŠãã ããã
- GET
/singers
- GET
/singer_info
- POST
/cancellable_synthesis
- POST
/sing_frame_audio_query
- POST
/sing_frame_volume
- POST
/frame_synthesis
[!WARNING]
ã¢ãŒãã£ã³ã°æ©èœãæäŸãã/synthesis_morphing
API ã¯ãµããŒããããŠããŸããã
話è ããšã«çºå£°ã¿ã€ãã³ã°ãç°ãªãé¢ä¿ã§å®è£ äžå¯èœãªããïŒåäœãããããèŽãã«èããªãïŒãåžžã«400 Bad Request
ãè¿ããŸãã
å話è ããšã«ã¢ãŒãã£ã³ã°ã®å©çšå¯åŠãè¿ã/morphable_targets
API ã§ã¯ããã¹ãŠã®è©±è ã§ã¢ãŒãã£ã³ã°çŠæ¢æ±ããšããŠããŸãã
詳现㯠app/routers/morphing.py ã確èªããŠãã ããã
- POST
/synthesis_morphing
- POST
/morphable_targets
[!WARNING]
äºææ§ã®ãããã©ã¡ãŒã¿ãšããŠååšã¯ããŸãããåžžã«ç¡èŠãããŸãã
詳现㯠app/routers/character.py / app/routers/tts_pipeline.py ã確èªããŠãã ããã
-
core_version
ãã©ã¡ãŒã¿- VOICEVOX CORE ã®ããŒãžã§ã³ãæå®ãããã©ã¡ãŒã¿ã§ãã
- AivisSpeech Engine ã§ã¯ VOICEVOX CORE ã«å¯Ÿå¿ããã³ã³ããŒãã³ãããªããããåžžã«ç¡èŠãããŸãã
-
enable_interrogative_upspeak
ãã©ã¡ãŒã¿- çåç³»ã®ããã¹ããäžãããããèªå°Ÿãèªå調æŽãããã®ãã©ã¡ãŒã¿ã§ãã
- AivisSpeech Engine ã§ã¯ãåžžã«ãïŒããïŒããâŠãããããªã©ã®ããã¹ãã«å«ãŸããèšå·ã«å¯Ÿå¿ãããèªç¶ãªææã§èªã¿äžããããŸãã
- ãããã£ãŠã
ã©ãã§ããâŠïŒ
ã®ããã«èªã¿äžãããã¹ãã®æ«å°Ÿã«ãïŒããä»äžããã ãã§ãçåç³»ã®ææã§èªã¿äžããããšãã§ããŸãã
[!TIP]
AivisSpeech ãšãã£ã¿ã® ãããã質å / Q&A ãããããŠã芧ãã ããã
Q. ãã¹ã¿ã€ã«ã®åŒ·ãã(intonationScale
) ã®å€ãäžãããšçºå£°ããããããªããŸãã
AivisSpeech Engine ã§å¯Ÿå¿ããŠãããStyle-Bert-VITS2 ã¢ãã«ã¢ãŒããã¯ãã£ã®çŸæç¹ã§ã®ä»æ§ã«ãªããŸãã
話è
ãã¹ã¿ã€ã«ã«ããããŸãããintonationScale
ã®å€ãäžãããããšçºå£°ããããããªã£ãããæ£èªã¿ã§äžèªç¶ãªå£°ã«ãªãå ŽåããããŸãã
ã¡ãããšçºå£°ã§ãã intonationScale
ã®å€ã®äžéã¯ã話è
ãã¹ã¿ã€ã«ã«ãã£ãŠç°ãªããŸããæé©ãªå€ã«é©å®èª¿æŽããŠãã ããã
AivisSpeech Engine ã§ã¯ãªãã¹ãäžçºã§æ£ããèªã¿ã»æ£ããã¢ã¯ã»ã³ãã«ãªãããåŠçã工倫ããŠããŸãããã©ãããŠãééã£ãèªã¿ã»ã¢ã¯ã»ã³ãã«ãªãå ŽåããããŸãã
ããŸã䜿ãããªãåºæåè©ã人åïŒç¹ã«ãã©ãã©ããŒã ïŒãªã©ãå
èµèŸæžã«ç»é²ãããŠããªãåèªã¯ãæ£ããèªã¿ã«ãªããªãããšãå€ãã§ãã
ããããåèªã®èªã¿æ¹ã¯èŸæžç»é²ã§å€æŽã§ããŸããAivisSpeech ãšãã£ã¿ãŸã㯠API ããåèªãç»é²ããŠã¿ãŠãã ããã
ãªããè€åèªãè±åèªã«é¢ããŠã¯ãåèªã®åªå
床ã«ããããããèŸæžãžã®ç»é²å
容ãåæ ãããªãããšããããŸããããã¯çŸæç¹ã§ã®ä»æ§ã«ãªããŸãã
Q. é·ãæç« ãäžåºŠã«é³å£°åæ API ã«éããšãé³å£°ãäžèªç¶ã«ãªã£ããã¡ã¢ãªãªãŒã¯ãçºçããŸãã
AivisSpeech Engine ã¯ãäžæãæå³ã®ãŸãšãŸããªã©ãæ¯èŒççãæã®åäœã§é³å£°åæããããšãæ³å®ããŠèšèšãããŠããŸãã
ãã®ããã1000 æåãè¶
ãããããªé·ãæç« ãäžåºŠã« /synthesis
API ã«éããšã以äžã®ãããªåé¡ãçºçããå¯èœæ§ããããŸãã
- ã¡ã¢ãªäœ¿çšéãæ¥æ¿ã«å¢å ããPC ã®åäœãé ããªã
- ã¡ã¢ãªãªãŒã¯ãçºçããAivisSpeech Engine ãã¯ã©ãã·ã¥ãã
- é³å£°ã®ææãäžèªç¶ã«ãªããæ£èªã¿ã®ãããªå£°ã«ãªã
é·ãæç« ãé³å£°åæããå Žåã¯ã以äžã®ãããªäœçœ®ã§æç« ãåºåã£ãŠãããããé³å£°åæ API ã«éä¿¡ããããšãããããããŸãã
ããŒããªãããã¯ãããŸããããé³å£°åæ1åã«ã€ã 500 æå以å
ãæãŸããã§ãã
- å¥èªç¹ïŒããããããïŒã®äœçœ®
- æã®æå³ã®åãç®ïŒæ®µèœã®åºåããªã©ïŒ
- äŒè©±æã®åºåãïŒããã§å²ãŸããéšåïŒ
[!TIP]
æã®æå³ã®åãç®ã§åå²ãããšãããèªç¶ãªææã®é³å£°ãçæã§ããåŸåããããŸãã
ããã¯ãäžåºŠã«é³å£°åæ API ã«éãããæç« å šäœã«ãããã¹ãã®å 容ã«å¯Ÿå¿ããææ è¡šçŸãææãé©çšãããããã§ãã
æç« ãé©åã«åå²ããããšã§ãåæã®ææ è¡šçŸãã€ã³ãããŒã·ã§ã³ããªã»ããããããèªç¶ãªèªã¿äžããå®çŸã§ããŸãã
AivisSpeech ãã¯ãããŠèµ·åãããšãã®ã¿ãã¢ãã«ããŒã¿ã®ããŠã³ããŒãã®ãããã€ã³ã¿ãŒãããã¢ã¯ã»ã¹ãå¿
èŠã«ãªããŸãã
2åç®ä»¥éã®èµ·åã§ã¯ãPC ããªãã©ã€ã³ã§ãã䜿ãããã ããŸãã
èµ·åäžã® AivisSpeech Engine ã®èšå®ç»é¢ã§è¡ããŸãã
AivisSpeech Engine èµ·åäžã«ãã©ãŠã¶ãã http://127.0.0.1:[AivisSpeech Engine ã®ããŒãçªå·]/setting
ã«ã¢ã¯ã»ã¹ãããšãAivisSpeech Engine ã®èšå®ç»é¢ãéããŸãã
AivisSpeech Engine ã®ããŒãçªå·ã®ããã©ã«ã㯠10101
ã§ãã
Q. GPU ã¢ãŒã (--use_gpu
) ã«åãæ¿ããã®ã«é³å£°çæã CPU ã¢ãŒããããé
ãã§ãã
CPU å èµã® GPU (iGPU) ã®ã¿ã® PC ã§ã GPU ã¢ãŒãã¯äœ¿ããŸãããã»ãšãã©ã®å Žå CPU ã¢ãŒãããããªãé ããªã£ãŠããŸããããããããã§ããŸããã
ããã¯ãCPU å
èµã® GPU ã¯ç¬ç«ãã GPU (dGPU) ã«æ¯ã¹ãŠæ§èœãäœããAI é³å£°åæã®ãããªéãåŠçãèŠæãªããã§ãã
äžæ¹ã§ãæè¿ã® CPU ã¯æ§èœã倧å¹
ã«åäžããŠãããCPU ã ãã§ãååé«éã«é³å£°ãçæã§ããŸãã
ãã®ãããdGPU éæèŒã® PC ã§ã¯ CPU ã¢ãŒãã®å©çšãããããããŸãã
Intel ã®ç¬¬ 12 äžä»£ä»¥éã® CPUïŒP ã³ã¢ã»E ã³ã¢ã®ãã€ããªããæ§æïŒæèŒ PC ãã䜿ãã®å ŽåãWindows ã®é»æºèšå®ã«ãã£ãŠé³å£°çæã®æ§èœã倧ããå€ããããšããããŸãã
ããã¯ãããã©ã«ãã®ããã©ã³ã¹ãã¢ãŒãã§ã¯ãé³å£°çæã¿ã¹ã¯ãçé»åéèŠã® E ã³ã¢ã«å²ãåœãŠãããããããã§ãã
以äžã®æé ã§èšå®ãå€æŽãããšãP ã³ã¢ãš E ã³ã¢ã®äž¡æ¹ãæ倧é掻çšããé³å£°çæãããé«éã«è¡ããŸãã
- Windows 11 ã®èšå®ãéã
- ã·ã¹ãã â é»æº ãšé²ã
- ãé»æºã¢ãŒããããæé©ãªããã©ãŒãã³ã¹ãã«å€æŽãã
â» ã³ã³ãããŒã«ããã«å
ãé»æºãã©ã³ãã«ããé«ããã©ãŒãã³ã¹ãèšå®ããããŸãããèšå®å
容ãç°ãªããŸãã
Intel 第 12 äžä»£ä»¥éã® CPU ã§ã¯ãWindows 11 ã®èšå®ç»é¢ããã®ãé»æºã¢ãŒããã®å€æŽãããããããŸãã
AivisSpeech ã¯ãå©çšçšéãæçžãããªããèªç±ãª AI é³å£°åæãœãããŠã§ã¢ãç®æããŠããŸãã
ïŒææç©ã§äœ¿ã£ãé³å£°åæã¢ãã«ã®ã©ã€ã»ã³ã¹æ¬¡ç¬¬ã§ã¯ãããŸããïŒå°ãªããšããœãããŠã§ã¢æ¬äœã¯ã¯ã¬ãžããè¡šèšäžèŠã§ãå人ã»æ³äººã»åçšã»éåçšãåãããèªç±ã«ã䜿ãããã ããŸãã
âŠãšã¯ãããããå€ãã®æ¹ã« AivisSpeech ã®ããšãç¥ã£ãŠããã ãããæ°æã¡ããããŸãã
ãããããã°ãææç©ã®ã©ããã« AivisSpeech ã®ããšãã¯ã¬ãžããããŠããã ãããšå¬ããã§ããïŒã¯ã¬ãžããã®è¡šèšãã©ãŒãããã¯ãä»»ãããŸããïŒ
以äžã®ãã©ã«ãã«ä¿åãããŠããŸãã
-
Windows:
C:\Users\(ãŠãŒã¶ãŒå)\AppData\Roaming\AivisSpeech-Engine\Logs
-
Mac:
~/Library/Application Support/AivisSpeech-Engine/Logs
-
Linux:
~/.local/share/AivisSpeech-Engine/Logs
äžå ·åãèŠã€ããããæ¹ã¯ã以äžã®ããããã®æ¹æ³ã§ãå ±åãã ããã
-
GitHub Issue (æšå¥š)
GitHub ã¢ã«ãŠã³ãããæã¡ã®æ¹ã¯ãGitHub ã® Issue ãããå ±åããã ããŸããšãæ©æã®å¯Ÿå¿ãå¯èœã§ãã -
Twitter (X)
Aivis Project å ¬åŒã¢ã«ãŠã³ã ãžã®ãªãã©ã€ã DMããŸãã¯ããã·ã¥ã¿ã° #AivisSpeech ãä»ãããã€ãŒãã§ãå ±åããã ããŸãã -
ãåãåãããã©ãŒã
Aivis Project ãåãåãããã©ãŒã ããããå ±åããã ããŸãã
ãªãã¹ã以äžã®æ å ±ãæ·»ããŠãå ±åããã ããŸããšãããè¿ éãªå¯Ÿå¿ãå¯èœã§ãã
- äžå ·åã®å 容
- åçŸæé ïŒåç»ãåçãããã°æ·»ä»ããŠãã ããïŒ
- OS ã®çš®é¡ã»AivisSpeech ã®ããŒãžã§ã³
- 解決ã®ããã«è©Šãããããš
- ãŠã€ã«ã¹å¯Ÿçãœãããªã©ã®æç¡ïŒé¢ä¿ãããããã§ããã°ïŒ
- 衚瀺ããããšã©ãŒã¡ãã»ãŒãž
- ãšã©ãŒãã°
VOICEVOX ã¯éåžžã«å·šå€§ãªãœãããŠã§ã¢ã§ãããçŸåšã掻çºã«éçºãç¶ããããŠããŸãã
ãã®ãããAivisSpeech Engine ã§ã¯ VOICEVOX ENGINE ã®ææ°çãããŒã¹ã«ã以äžã®æ¹éã§éçºãè¡ã£ãŠããŸãã
- VOICEVOX ææ°çãžã®è¿œåŸã容æã«ãããããã§ããã ãæ¹å€ãå¿
èŠæå°éã«çãã
- VOICEVOX ENGINE ãã AivisSpeech Engine ãžã®ãªãã©ã³ãã£ã³ã°ã¯å¿ èŠãªç®æã®ã¿è¡ã
-
voicevox_engine
ãã£ã¬ã¯ããªããªããŒã ãããš import æã®å€æŽå·®åãèšå€§ã«ãªãããããããŠãªãã©ã³ãã£ã³ã°ãè¡ããªã
- ãªãã¡ã¯ã¿ãªã³ã°ãè¡ããªã
- VOICEVOX ENGINE ãšã®ã³ã³ããªã¯ããçºçããããšã容æã«äºæ³ãããäžãã³ãŒãå šäœã«ç²ŸéããŠããããã§ã¯ãªããã
- AivisSpeech ã§å©çšããªãæ©èœ (æ声åææ©èœãªã©) ã§ãã£ãŠããã³ãŒãã®åé€ã¯è¡ããªã
- ãããã³ã³ããªã¯ããåé¿ãããã
- å©çšããªãã³ãŒãã®ç¡å¹åã¯åé€ã§ã¯ãªããã³ã¡ã³ãã¢ãŠãã§è¡ã
- VOICEVOX ENGINE ãšã®å·®åãæå°éã«æããããã倧éã«ã³ã¡ã³ãã¢ãŠããå¿ èŠãªå Žåã¯ã# ã§ã¯ãªã """ """ ã䜿ã
- ãã ããDockerfile ã GitHub Actions ãªã©ã®æ§æãã¡ã€ã«ããã«ãããŒã«é¡ã¯ãã®éãã§ã¯ãªã
- å ã AivisSpeech Engine ã§ã®æ¹å€éã倧ããéšåã«ã€ããã³ã¡ã³ãã¢ãŠãã§ã¯éåžžã«éå€ãªã³ãŒãã«ãªããã
- ä¿å®ãè¿œåŸãå°é£ãªãããããã¥ã¡ã³ãã®æŽæ°ã¯è¡ããªã
- ãã®ããåããã¥ã¡ã³ãã¯äžåæŽæ°ãããŠããããAivisSpeech Engine ã§ã®å€æŽãåæ ããŠããªã
- AivisSpeech Engine åãã®æ¹å€ã«ãšããªããã¹ãã³ãŒãã®ç¶æãå°é£ãªããããã¹ãã³ãŒãã®è¿œå ã¯è¡ããªã
- æ¢åã®ãã¹ãã³ãŒãã®ã¿ããã¹ããéãããã«äžéšç®æã®ä¿®æ£ãã³ã¡ã³ãã¢ãŠããè¡ããæ¶æ¥µçã«ç¶æãã
- AivisSpeech Engine ã§ã®æ¹å€ã«ããããã¹ãçµæã®ã¹ãããã·ã§ãã㯠VOICEVOX ENGINE ãšç°ãªã
- AivisSpeech Engine ã§ã®æ¹å€ã«ããåããªããªã£ããã¹ãã®ä¿®æ£ã¯è¡ãããã³ã¡ã³ãã¢ãŠãã§å¯Ÿå¿ãã
- AivisSpeech Engine åãã«æ°èŠéçºããç®æã¯ãä¿å®ã³ã¹ããéã¿ãã¹ãã³ãŒããè¿œå ããªã
- æ¢åã®ãã¹ãã³ãŒãã®ã¿ããã¹ããéãããã«äžéšç®æã®ä¿®æ£ãã³ã¡ã³ãã¢ãŠããè¡ããæ¶æ¥µçã«ç¶æãã
æé ã¯ãªãªãžãã«ã® VOICEVOX ENGINE ãšå€§å¹
ã«ç°ãªããŸãã
äºåã« Python 3.11 ãã€ã³ã¹ããŒã«ãããŠããå¿
èŠããããŸãã
# Poetry ãš pre-commit ãã€ã³ã¹ããŒã«
pip install poetry poetry-plugin-export pre-commit
# pre-commit ãæå¹å
pre-commit install
# äŸåé¢ä¿ããã¹ãŠã€ã³ã¹ããŒã«
poetry install
æé ã¯ãªãªãžãã«ã® VOICEVOX ENGINE ãšå€§å¹ ã«ç°ãªããŸãã
# éçºç°å¢ã§ AivisSpeech Engine ãèµ·å
poetry run task serve
# AivisSpeech Engine ã®ãã«ãã衚瀺
poetry run task serve --help
# ã³ãŒããã©ãŒããããèªåä¿®æ£
poetry run task format
# ã³ãŒããã©ãŒãããããã§ãã¯
poetry run task lint
# typos ã«ããã¿ã€ããã§ãã¯
poetry run task typos
# ãã¹ããå®è¡
poetry run task test
# ãã¹ãã®ã¹ãããã·ã§ãããæŽæ°
poetry run task update-snapshots
# ã©ã€ã»ã³ã¹æ
å ±ãæŽæ°
poetry run task update-licenses
# AivisSpeech Engine ããã«ã
poetry run task build
ããŒã¹ã§ãã VOICEVOX ENGINE ã®ãã¥ã¢ã«ã©ã€ã»ã³ã¹ã®ãã¡ãLGPL-3.0 ã®ã¿ãåç¬ã§ç¶æ¿ããŸãã
äžèšãªãã³ã« docs/ 以äžã®ããã¥ã¡ã³ãã¯ãVOICEVOX ENGINE æ¬å®¶ã®ããã¥ã¡ã³ããæ¹å€ãªãã§ãã®ãŸãŸåŒãç¶ãã§ããŸãããããã®ããã¥ã¡ã³ãã®å 容ã AivisSpeech Engine ã«ãéçšãããã¯ä¿èšŒãããŸããã
AivisSpeech Engine ã¯å€ãã®çŽ æŽããããªãŒãã³ãœãŒã¹ãœãããŠã§ã¢ãšãã®è²¢ç®ã«æ·±ãæ¯ããããŠããŸãã
ãªãŒãã³ãœãŒã¹ãœãããŠã§ã¢ãéçºããŠãã ãã£ãå
šãŠã®æ¹ã
ãã³ãã¥ããã£ã®çæ§ã®è²¢ç®ãšãµããŒãã«ãå¿ããæè¬ããããŸãã
- @litagin02
- @Stardust-minus
- @tuna2134
- @googlefan256
- @WariHima
- VOICEVOX ENGINE Contributors
- Everyone in AI声ã¥ããæè¡ç 究äŒ
VOICEVOX ã®ãšã³ãžã³ã§ãã
å®æ
㯠HTTP ãµãŒããŒãªã®ã§ããªã¯ãšã¹ããéä¿¡ããã°ããã¹ãé³å£°åæã§ããŸãã
ïŒãšãã£ã¿ãŒã¯ VOICEVOX ã ã³ã¢ã¯ VOICEVOX CORE ã å šäœæ§æ㯠ãã¡ã ã«è©³çŽ°ããããŸããïŒ
ç®çã«åãããã¬ã€ãã¯ãã¡ãã§ãã
- ãŠãŒã¶ãŒã¬ã€ã: é³å£°åæããããæ¹åã
- è²¢ç®è ã¬ã€ã: ã³ã³ããªãã¥ãŒããããæ¹åã
- éçºè ã¬ã€ã: ã³ãŒããå©çšãããæ¹åã
ãã¡ããã察å¿ãããšã³ãžã³ãããŠã³ããŒãããŠãã ããã
API ããã¥ã¡ã³ãããåç §ãã ããã
VOICEVOX ãšã³ãžã³ãããã¯ãšãã£ã¿ãèµ·åããç¶æ
㧠http://127.0.0.1:50021/docs ã«ã¢ã¯ã»ã¹ãããšãèµ·åäžã®ãšã³ãžã³ã®ããã¥ã¡ã³ãã確èªã§ããŸãã
ä»åŸã®æ¹éãªã©ã«ã€ããŠã¯ VOICEVOX é³å£°åæãšã³ãžã³ãšã®é£æº ãåèã«ãªããããããŸããã
docker pull voicevox/voicevox_engine:cpu-latest
docker run --rm -p '127.0.0.1:50021:50021' voicevox/voicevox_engine:cpu-latest
docker pull voicevox/voicevox_engine:nvidia-latest
docker run --rm --gpus all -p '127.0.0.1:50021:50021' voicevox/voicevox_engine:nvidia-latest
GPU çãå©çšããå Žåãç°å¢ã«ãã£ãŠãšã©ãŒãçºçããããšããããŸãããã®å Žåã--runtime=nvidia
ãdocker run
ã«ã€ããŠå®è¡ãããšè§£æ±ºã§ããããšããããŸãã
echo -n "ããã«ã¡ã¯ãé³å£°åæã®äžçãžãããã" >text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1"\
--get --data-urlencode [email protected] \
> query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio.wav
çæãããé³å£°ã¯ãµã³ããªã³ã°ã¬ãŒãã 24000Hz ãšå°ãç¹æ®ãªãããé³å£°ãã¬ãŒã€ãŒã«ãã£ãŠã¯åçã§ããªãå ŽåããããŸãã
speaker
ã«æå®ããå€ã¯ /speakers
ãšã³ããã€ã³ãã§åŸããã style_id
ã§ããäºææ§ã®ããã« speaker
ãšããååã«ãªã£ãŠããŸãã
/audio_query
ã§åŸãããé³å£°åæçšã®ã¯ãšãªã®ãã©ã¡ãŒã¿ãç·šéããããšã§ãé³å£°ã調æŽã§ããŸãã
äŸãã°ã話éã 1.5 åéã«ããŠã¿ãŸãã
echo -n "ããã«ã¡ã¯ãé³å£°åæã®äžçãžãããã" >text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1" \
--get --data-urlencode [email protected] \
> query.json
# sed ã䜿çšã㊠speedScale ã®å€ã 1.5 ã«å€æŽ
sed -i -r 's/"speedScale":[0-9.]+/"speedScale":1.5/' query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio_fast.wav
ãAquesTalk 颚èšæ³ãã¯ã«ã¿ã«ããšèšå·ã ãã§èªã¿æ¹ãæå®ããèšæ³ã§ããAquesTalk æ¬å®¶ã®èšæ³ãšã¯äžéšãç°ãªããŸãã
AquesTalk 颚èšæ³ã¯æ¬¡ã®ã«ãŒã«ã«åŸããŸãïŒ
- å šãŠã®ã«ãã¯ã«ã¿ã«ãã§èšè¿°ããã
- ã¢ã¯ã»ã³ãå¥ã¯
/
ãŸãã¯ã
ã§åºåããã
ã§åºåã£ãå Žåã«éãç¡é³åºéãæ¿å ¥ãããã - ã«ãã®æåã«
_
ãå ¥ãããšãã®ã«ãã¯ç¡å£°åããã - ã¢ã¯ã»ã³ãäœçœ®ã
'
ã§æå®ãããå šãŠã®ã¢ã¯ã»ã³ãå¥ã«ã¯ã¢ã¯ã»ã³ãäœçœ®ã 1 ã€æå®ããå¿ èŠãããã - ã¢ã¯ã»ã³ãå¥æ«ã«
ïŒ
(å šè§)ãå ¥ããããšã«ããçåæã®çºé³ãã§ãã
/audio_query
ã®ã¬ã¹ãã³ã¹ã«ã¯ãšã³ãžã³ãå€æããèªã¿æ¹ãAquesTalk 颚èšæ³ã§èšè¿°ãããŸãã
ãããä¿®æ£ããããšã§é³å£°ã®èªã¿ä»®åãã¢ã¯ã»ã³ããå¶åŸ¡ã§ããŸãã
# èªãŸãããæç« ãutf-8ã§text.txtã«æžãåºã
echo -n "ãã£ãŒãã©ãŒãã³ã°ã¯äžèœè¬ã§ã¯ãããŸãã" >text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1" \
--get --data-urlencode [email protected] \
> query.json
cat query.json | grep -o -E "\"kana\":\".*\""
# çµæ... "kana":"ãã£'ã€ã/ã©'ã¢ãã³ã°ã¯/ãã³ããªã€ã¯ãã¯ã¢ãªãã»'ã³"
# "ãã£ã€ãã©'ã¢ãã³ã°ã¯/ãã³ã'ãªã€ã¯ãã¯/ã¢ãªãã»'ã³"ãšèªãŸãããã®ã§ã
# is_kana=trueãã€ããŠã€ã³ãããŒã·ã§ã³ãååŸãnewphrases.jsonã«ä¿å
echo -n "ãã£ã€ãã©'ã¢ãã³ã°ã¯/ãã³ã'ãªã€ã¯ãã¯/ã¢ãªãã»'ã³" > kana.txt
curl -s \
-X POST \
"127.0.0.1:50021/accent_phrases?speaker=1&is_kana=true" \
--get --data-urlencode [email protected] \
> newphrases.json
# query.jsonã®"accent_phrases"ã®å
容ãnewphrases.jsonã®å
容ã«çœ®ãæãã
cat query.json | sed -e "s/\[{.*}\]/$(cat newphrases.json)/g" > newquery.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @newquery.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio.wav
API ãããŠãŒã¶ãŒèŸæžã®åç §ãåèªã®è¿œå ãç·šéãåé€ãè¡ãããšãã§ããŸãã
/user_dict
ã« GET ãªã¯ãšã¹ããæããããšã§ãŠãŒã¶ãŒèŸæžã®äžèŠ§ãååŸããããšãã§ããŸãã
curl -s -X GET "127.0.0.1:50021/user_dict"
/user_dict_word
ã« POST ãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã«åèªãè¿œå ããããšãã§ããŸãã
URL ãã©ã¡ãŒã¿ãšããŠã以äžãå¿
èŠã§ãã
- surface ïŒèŸæžã«ç»é²ããåèªïŒ
- pronunciation ïŒã«ã¿ã«ãã§ã®èªã¿æ¹ïŒ
- accent_type ïŒã¢ã¯ã»ã³ãæ žäœçœ®ãæŽæ°ïŒ
ã¢ã¯ã»ã³ãæ žäœçœ®ã«ã€ããŠã¯ããã¡ãã®æç« ãåèã«ãªãããšæããŸãã
ãåãšãªã£ãŠããæ°åã®éšåãã¢ã¯ã»ã³ãæ žäœçœ®ã«ãªããŸãã
https://tdmelodic.readthedocs.io/ja/latest/pages/introduction.html
æåããå Žåã®è¿ãå€ã¯åèªã«å²ãåœãŠããã UUID ã®æååã«ãªããŸãã
surface="test"
pronunciation="ãã¹ã"
accent_type="1"
curl -s -X POST "127.0.0.1:50021/user_dict_word" \
--get \
--data-urlencode "surface=$surface" \
--data-urlencode "pronunciation=$pronunciation" \
--data-urlencode "accent_type=$accent_type"
/user_dict_word/{word_uuid}
ã« PUT ãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã®åèªãä¿®æ£ããããšãã§ããŸãã
URL ãã©ã¡ãŒã¿ãšããŠã以äžãå¿
èŠã§ãã
- surface ïŒèŸæžã«ç»é²ããã¯ãŒãïŒ
- pronunciation ïŒã«ã¿ã«ãã§ã®èªã¿æ¹ïŒ
- accent_type ïŒã¢ã¯ã»ã³ãæ žäœçœ®ãæŽæ°ïŒ
word_uuid ã¯åèªè¿œå æã«ç¢ºèªã§ããã»ãããŠãŒã¶ãŒèŸæžãåç
§ããããšã§ã確èªã§ããŸãã
æåããå Žåã®è¿ãå€ã¯204 No Content
ã«ãªããŸãã
surface="test2"
pronunciation="ãã¹ãããŒ"
accent_type="2"
# ç°å¢ã«ãã£ãŠword_uuidã¯é©å®æžãæããŠãã ãã
word_uuid="cce59b5f-86ab-42b9-bb75-9fd3407f1e2d"
curl -s -X PUT "127.0.0.1:50021/user_dict_word/$word_uuid" \
--get \
--data-urlencode "surface=$surface" \
--data-urlencode "pronunciation=$pronunciation" \
--data-urlencode "accent_type=$accent_type"
/user_dict_word/{word_uuid}
ã« DELETE ãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã®åèªãåé€ããããšãã§ããŸãã
word_uuid ã¯åèªè¿œå æã«ç¢ºèªã§ããã»ãããŠãŒã¶ãŒèŸæžãåç
§ããããšã§ã確èªã§ããŸãã
æåããå Žåã®è¿ãå€ã¯204 No Content
ã«ãªããŸãã
# ç°å¢ã«ãã£ãŠword_uuidã¯é©å®æžãæããŠãã ãã
word_uuid="cce59b5f-86ab-42b9-bb75-9fd3407f1e2d"
curl -s -X DELETE "127.0.0.1:50021/user_dict_word/$word_uuid"
ãšã³ãžã³ã®èšå®ããŒãžå ã®ããŠãŒã¶ãŒèŸæžã®ãšã¯ã¹ããŒã&ã€ã³ããŒããç¯ã§ããŠãŒã¶ãŒèŸæžã®ã€ã³ããŒã&ãšã¯ã¹ããŒããå¯èœã§ãã
ä»ã«ã API ã§ãŠãŒã¶ãŒèŸæžã®ã€ã³ããŒã&ãšã¯ã¹ããŒããå¯èœã§ãã
ã€ã³ããŒãã«ã¯ POST /import_user_dict
ããšã¯ã¹ããŒãã«ã¯ GET /user_dict
ãå©çšããŸãã
åŒæ°çã®è©³çŽ°ã¯ API ããã¥ã¡ã³ããã芧ãã ããã
ãŠãŒã¶ãŒãã£ã¬ã¯ããªã«ããpresets.yaml
ãç·šéããããšã§ãã£ã©ã¯ã¿ãŒã話éãªã©ã®ããªã»ããã䜿ãããšãã§ããŸãã
echo -n "ããªã»ãããããŸã掻çšããã°ããµãŒãããŒãã£éã§åãèšå®ã䜿ãããšãã§ããŸã" >text.txt
# ããªã»ããæ
å ±ãååŸ
curl -s -X GET "127.0.0.1:50021/presets" > presets.json
preset_id=$(cat presets.json | sed -r 's/^.+"id"\:\s?([0-9]+?).+$/\1/g')
style_id=$(cat presets.json | sed -r 's/^.+"style_id"\:\s?([0-9]+?).+$/\1/g')
# é³å£°åæçšã®ã¯ãšãªãååŸ
curl -s \
-X POST \
"127.0.0.1:50021/audio_query_from_preset?preset_id=$preset_id"\
--get --data-urlencode [email protected] \
> query.json
# é³å£°åæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=$style_id" \
> audio.wav
-
speaker_uuid
ã¯ã/speakers
ã§ç¢ºèªã§ããŸã -
id
ã¯éè€ããŠã¯ãããŸãã - ãšã³ãžã³èµ·ååŸã«ãã¡ã€ã«ãæžãæãããšãšã³ãžã³ã«åæ ãããŸã
/synthesis_morphing
ã§ã¯ã2 çš®é¡ã®ã¹ã¿ã€ã«ã§ããããåæãããé³å£°ãå
ã«ãã¢ãŒãã£ã³ã°ããé³å£°ãçæããŸãã
echo -n "ã¢ãŒãã£ã³ã°ãå©çšããããšã§ãïŒçš®é¡ã®å£°ãæ··ããããšãã§ããŸãã" > text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=8"\
--get --data-urlencode [email protected] \
> query.json
# å
ã®ã¹ã¿ã€ã«ã§ã®åæçµæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=8" \
> audio.wav
export MORPH_RATE=0.5
# ã¹ã¿ã€ã«2çš®é¡åã®é³å£°åæ+WORLDã«ããé³å£°åæãå
¥ãããæéãæããã®ã§æ³šæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis_morphing?base_speaker=8&target_speaker=10&morph_rate=$MORPH_RATE" \
> audio.wav
export MORPH_RATE=0.9
# queryãbase_speakerãtarget_speakerãåãå Žåã¯ãã£ãã·ã¥ã䜿çšãããããæ¯èŒçé«éã«çæããã
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis_morphing?base_speaker=8&target_speaker=10&morph_rate=$MORPH_RATE" \
> audio.wav
è¿œå æ
å ±ã®äžã® portrait.png ãååŸããã³ãŒãã§ãã
ïŒjqã䜿çšã㊠json ãããŒã¹ããŠããŸããïŒ
curl -s -X GET "127.0.0.1:50021/speaker_info?speaker_uuid=7ffcb7ce-00ec-4bdc-82cd-45a8889e43ff" \
| jq -r ".portrait" \
| base64 -d \
> portrait.png
/cancellable_synthesis
ã§ã¯éä¿¡ãåæããå Žåã«å³åº§ã«èšç®ãªãœãŒã¹ãéæŸãããŸãã
(/synthesis
ã§ã¯éä¿¡ãåæããŠãæåŸãŸã§é³å£°åæã®èšç®ãè¡ãããŸã)
ãã® API ã¯å®éšçæ©èœã§ããããšã³ãžã³èµ·åæã«åŒæ°ã§--enable_cancellable_synthesis
ãæå®ããªããšæå¹åãããŸããã
é³å£°åæã«å¿
èŠãªãã©ã¡ãŒã¿ã¯/synthesis
ãšåæ§ã§ãã
echo -n '{
"notes": [
{ "key": null, "frame_length": 15, "lyric": "" },
{ "key": 60, "frame_length": 45, "lyric": "ã" },
{ "key": 62, "frame_length": 45, "lyric": "ã¬" },
{ "key": 64, "frame_length": 45, "lyric": "ã" },
{ "key": null, "frame_length": 15, "lyric": "" }
]
}' > score.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @score.json \
"127.0.0.1:50021/sing_frame_audio_query?speaker=6000" \
> query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/frame_synthesis?speaker=3001" \
> audio.wav
楜èã®key
㯠MIDI çªå·ã§ãã
lyric
ã¯æè©ã§ãä»»æã®æååãæå®ã§ããŸããããšã³ãžã³ã«ãã£ãŠã¯ã²ãããªã»ã«ã¿ã«ãïŒã¢ãŒã©ä»¥å€ã®æååã¯ãšã©ãŒã«ãªãããšããããŸãã
ãã¬ãŒã ã¬ãŒãã¯ããã©ã«ãã 93.75Hz ã§ããšã³ãžã³ãããã§ã¹ãã®frame_rate
ã§ååŸã§ããŸãã
ïŒã€ç®ã®ããŒãã¯ç¡é³ã§ããå¿
èŠããããŸãã
/sing_frame_audio_query
ã§æå®ã§ããspeaker
ã¯ã/singers
ã§ååŸã§ããã¹ã¿ã€ã«ã®å
ãçš®é¡ãsing
ãsinging_teacher
ãªã¹ã¿ã€ã«ã®style_id
ã§ãã
/frame_synthesis
ã§æå®ã§ããspeaker
ã¯ã/singers
ã§ååŸã§ããã¹ã¿ã€ã«ã®å
ãçš®é¡ãframe_decode
ã®style_id
ã§ãã
åŒæ°ã speaker
ãšããååã«ãªã£ãŠããã®ã¯ãä»ã® API ãšäžè²«æ§ãããããããã§ãã
/sing_frame_audio_query
ãš/frame_synthesis
ã«ç°ãªãã¹ã¿ã€ã«ãæå®ããããšãå¯èœã§ãã
VOICEVOX ã§ã¯ã»ãã¥ãªãã£ä¿è·ã®ããlocalhost
ã»127.0.0.1
ã»app://
ã»Origin ãªã以å€ã® Origin ãããªã¯ãšã¹ããåãå
¥ããªãããã«ãªã£ãŠããŸãã
ãã®ãããäžéšã®ãµãŒãããŒãã£ã¢ããªããã®ã¬ã¹ãã³ã¹ãåãåããªãå¯èœæ§ããããŸãã
ãããåé¿ããæ¹æ³ãšããŠããšã³ãžã³ããèšå®ã§ãã UI ãçšæããŠããŸãã
- http://127.0.0.1:50021/setting ã«ã¢ã¯ã»ã¹ããŸãã
- å©çšããã¢ããªã«åãããŠèšå®ãå€æŽãè¿œå ããŠãã ããã
- ä¿åãã¿ã³ãæŒããŠãå€æŽã確å®ããŠãã ããã
- èšå®ã®é©çšã«ã¯ãšã³ãžã³ã®åèµ·åãå¿ èŠã§ããå¿ èŠã«å¿ããŠåèµ·åãããŠãã ããã
å®è¡æåŒæ°--disable_mutable_api
ãç°å¢å€æ°VV_DISABLE_MUTABLE_API=1
ãæå®ããããšã§ããšã³ãžã³ã®èšå®ãèŸæžãªã©ãå€æŽãã API ãç¡å¹ã«ã§ããŸãã
ãªã¯ãšã¹ãã»ã¬ã¹ãã³ã¹ã®æåã³ãŒãã¯ãã¹ãŠ UTF-8 ã§ãã
ãšã³ãžã³èµ·åæã«åŒæ°ãæå®ã§ããŸãã詳ããããšã¯-h
åŒæ°ã§ãã«ãã確èªããŠãã ããã
$ python run.py -h
usage: run.py [-h] [--host HOST] [--port PORT] [--use_gpu] [--voicevox_dir VOICEVOX_DIR] [--voicelib_dir VOICELIB_DIR] [--runtime_dir RUNTIME_DIR] [--enable_mock] [--enable_cancellable_synthesis]
[--init_processes INIT_PROCESSES] [--load_all_models] [--cpu_num_threads CPU_NUM_THREADS] [--output_log_utf8] [--cors_policy_mode {CorsPolicyMode.all,CorsPolicyMode.localapps}]
[--allow_origin [ALLOW_ORIGIN ...]] [--setting_file SETTING_FILE] [--preset_file PRESET_FILE] [--disable_mutable_api]
VOICEVOX ã®ãšã³ãžã³ã§ãã
options:
-h, --help show this help message and exit
--host HOST æ¥ç¶ãåãä»ãããã¹ãã¢ãã¬ã¹ã§ãã
--port PORT æ¥ç¶ãåãä»ããããŒãçªå·ã§ãã
--use_gpu GPUã䜿ã£ãŠé³å£°åæããããã«ãªããŸãã
--voicevox_dir VOICEVOX_DIR
VOICEVOXã®ãã£ã¬ã¯ããªãã¹ã§ãã
--voicelib_dir VOICELIB_DIR
VOICEVOX COREã®ãã£ã¬ã¯ããªãã¹ã§ãã
--runtime_dir RUNTIME_DIR
VOICEVOX COREã§äœ¿çšããã©ã€ãã©ãªã®ãã£ã¬ã¯ããªãã¹ã§ãã
--enable_mock VOICEVOX COREã䜿ããã¢ãã¯ã§é³å£°åæãè¡ããŸãã
--enable_cancellable_synthesis
é³å£°åæãéäžã§ãã£ã³ã»ã«ã§ããããã«ãªããŸãã
--init_processes INIT_PROCESSES
cancellable_synthesisæ©èœã®åæåæã«çæããããã»ã¹æ°ã§ãã
--load_all_models èµ·åæã«å
šãŠã®é³å£°åæã¢ãã«ãèªã¿èŸŒã¿ãŸãã
--cpu_num_threads CPU_NUM_THREADS
é³å£°åæãè¡ãã¹ã¬ããæ°ã§ããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_CPU_NUM_THREADS ã®å€ã䜿ãããŸããVV_CPU_NUM_THREADS ã空æååã§ãªãæ°å€ã§ããªãå Žåã¯ãšã©ãŒçµäºããŸãã
--output_log_utf8 ãã°åºåãUTF-8ã§ãããªããŸããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_OUTPUT_LOG_UTF8 ã®å€ã䜿ãããŸããVV_OUTPUT_LOG_UTF8 ã®å€ã1ã®å Žåã¯UTF-8ã§ã0ãŸãã¯ç©ºæåãå€ããªãå Žåã¯ç°å¢ã«ãã£ãŠèªåçã«æ±ºå®ãããŸãã
--cors_policy_mode {CorsPolicyMode.all,CorsPolicyMode.localapps}
CORSã®èš±å¯ã¢ãŒããallãŸãã¯localappsãæå®ã§ããŸããallã¯ãã¹ãŠãèš±å¯ããŸããlocalappsã¯ãªãªãžã³éãªãœãŒã¹å
±æããªã·ãŒããapp://.ãšlocalhosté¢é£ã«éå®ããŸãããã®ä»ã®ãªãªãžã³ã¯allow_originãªãã·ã§ã³ã§è¿œå ã§ããŸããããã©ã«ãã¯localappsããã®ãªãã·ã§ã³ã¯--
setting_fileã§æå®ãããèšå®ãã¡ã€ã«ãããåªå
ãããŸãã
--allow_origin [ALLOW_ORIGIN ...]
èš±å¯ãããªãªãžã³ãæå®ããŸããã¹ããŒã¹ã§åºåãããšã§è€æ°æå®ã§ããŸãããã®ãªãã·ã§ã³ã¯--setting_fileã§æå®ãããèšå®ãã¡ã€ã«ãããåªå
ãããŸãã
--setting_file SETTING_FILE
èšå®ãã¡ã€ã«ãæå®ã§ããŸãã
--preset_file PRESET_FILE
ããªã»ãããã¡ã€ã«ãæå®ã§ããŸããæå®ããªãå Žåãç°å¢å€æ° VV_PRESET_FILEããŠãŒã¶ãŒãã£ã¬ã¯ããªã®presets.yamlãé ã«æ¢ããŸãã
--disable_mutable_api
èŸæžç»é²ãèšå®å€æŽãªã©ããšã³ãžã³ã®éçãªããŒã¿ãå€æŽããAPIãç¡å¹åããŸããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_DISABLE_MUTABLE_API ã®å€ã䜿ãããŸããVV_DISABLE_MUTABLE_API ã®å€ã1ã®å Žåã¯ç¡å¹åã§ã0ãŸãã¯ç©ºæåãå€ããªãå Žåã¯ç¡èŠãããŸãã
ãšã³ãžã³ãã£ã¬ã¯ããªå ã«ãããã¡ã€ã«ãå šãŠæ¶å»ããæ°ãããã®ã«çœ®ãæããŠãã ããã
VOICEVOX ENGINE ã¯çããã®ã³ã³ããªãã¥ãŒã·ã§ã³ããåŸ
ã¡ããŠããŸãïŒ
詳现㯠CONTRIBUTING.md ãã芧ãã ããã
ãŸã VOICEVOX éå
¬åŒ Discord ãµãŒããŒã«ãŠãéçºã®è°è«ãéè«ãè¡ã£ãŠããŸããæ°è»œã«ãåå ãã ããã
ãªããIssue ã解決ãããã«ãªã¯ãšã¹ããäœæãããéã¯ãå¥ã®æ¹ãšåã Issue ã«åãçµãããšãé¿ãããããIssue åŽã§åãçµã¿å§ããããšãäŒããããæåã« Draft ãã«ãªã¯ãšã¹ããäœæããããšãæšå¥šããŠããŸãã
Python 3.11.9
ãçšããŠéçºãããŠããŸãã
ã€ã³ã¹ããŒã«ããã«ã¯ãå OS ããšã® C/C++ ã³ã³ãã€ã©ãCMake ãå¿
èŠã«ãªããŸãã
# å®è¡ç°å¢ã®ã€ã³ã¹ããŒã«
python -m pip install -r requirements.txt
# éçºç°å¢ã»ãã¹ãç°å¢ã»ãã«ãç°å¢ã®ã€ã³ã¹ããŒã«
python -m pip install -r requirements-dev.txt -r requirements-build.txt
ã³ãã³ãã©ã€ã³åŒæ°ã®è©³çŽ°ã¯ä»¥äžã®ã³ãã³ãã§ç¢ºèªããŠãã ããã
python run.py --help
# 補åç VOICEVOX ã§ãµãŒããŒãèµ·å
VOICEVOX_DIR="C:/path/to/voicevox" # 補åç VOICEVOX ãã£ã¬ã¯ããªã®ãã¹
python run.py --voicevox_dir=$VOICEVOX_DIR
# ã¢ãã¯ã§ãµãŒããŒèµ·å
python run.py --enable_mock
# ãã°ãUTF8ã«å€æŽ
python run.py --output_log_utf8
# ããã㯠VV_OUTPUT_LOG_UTF8=1 python run.py
CPU ã¹ã¬ããæ°ãæªæå®ã®å Žåã¯ãè«çã³ã¢æ°ã®ååã䜿ãããŸããïŒæ®ã©ã® CPU ã§ãããã¯å
šäœã®åŠçèœåã®ååã§ãïŒ
ãã IaaS äžã§å®è¡ããŠããããå°çšãµãŒããŒã§å®è¡ããŠããå Žåãªã©ã
ãšã³ãžã³ã䜿ãåŠçèœåã調ç¯ãããå Žåã¯ãCPU ã¹ã¬ããæ°ãæå®ããããšã§å®çŸã§ããŸãã
- å®è¡æåŒæ°ã§æå®ãã
python run.py --voicevox_dir=$VOICEVOX_DIR --cpu_num_threads=4
- ç°å¢å€æ°ã§æå®ãã
export VV_CPU_NUM_THREADS=4 python run.py --voicevox_dir=$VOICEVOX_DIR
VOICEVOX Core 0.5.4 以éã®ã³ã¢ã䜿çšããäºãå¯èœã§ãã
Mac ã§ã® libtorch çã³ã¢ã®ãµããŒãã¯ããŠããŸããã
補åç VOICEVOX ãããã¯ã³ã³ãã€ã«æžã¿ãšã³ãžã³ã®ãã£ã¬ã¯ããªã--voicevox_dir
åŒæ°ã§æå®ãããšããã®ããŒãžã§ã³ã®ã³ã¢ã䜿çšãããŸãã
python run.py --voicevox_dir="/path/to/voicevox"
Mac ã§ã¯ãDYLD_LIBRARY_PATH
ã®æå®ãå¿
èŠã§ãã
DYLD_LIBRARY_PATH="/path/to/voicevox" python run.py --voicevox_dir="/path/to/voicevox"
VOICEVOX Core ã® zip ãã¡ã€ã«ã解åãããã£ã¬ã¯ããªã--voicelib_dir
åŒæ°ã§æå®ããŸãã
ãŸããã³ã¢ã®ããŒãžã§ã³ã«åãããŠãlibtorchãonnxruntime (å
±æã©ã€ãã©ãª) ã®ãã£ã¬ã¯ããªã--runtime_dir
åŒæ°ã§æå®ããŸãã
ãã ããã·ã¹ãã ã®æ¢çŽ¢ãã¹äžã« libtorchãonnxruntime ãããå Žåã--runtime_dir
åŒæ°ã®æå®ã¯äžèŠã§ãã
--voicelib_dir
åŒæ°ã--runtime_dir
åŒæ°ã¯è€æ°å䜿çšå¯èœã§ãã
API ãšã³ããã€ã³ãã§ã³ã¢ã®ããŒãžã§ã³ãæå®ããå Žåã¯core_version
åŒæ°ãæå®ããŠãã ãããïŒæªæå®ã®å Žåã¯ææ°ã®ã³ã¢ã䜿çšãããŸãïŒ
python run.py --voicelib_dir="/path/to/voicevox_core" --runtime_dir="/path/to/libtorch_or_onnx"
Mac ã§ã¯ã--runtime_dir
åŒæ°ã®ä»£ããã«DYLD_LIBRARY_PATH
ã®æå®ãå¿
èŠã§ãã
DYLD_LIBRARY_PATH="/path/to/onnx" python run.py --voicelib_dir="/path/to/voicevox_core"
以äžã®ãã£ã¬ã¯ããªã«ããé³å£°ã©ã€ãã©ãªã¯èªåã§èªã¿èŸŒãŸããŸãã
- ãã«ãç:
<user_data_dir>/voicevox-engine/core_libraries/
- Python ç:
<user_data_dir>/voicevox-engine-dev/core_libraries/
<user_data_dir>
㯠OS ã«ãã£ãŠç°ãªããŸãã
- Windows:
C:\Users\<username>\AppData\Local\
- macOS:
/Users/<username>/Library/Application\ Support/
- Linux:
/home/<username>/.local/share/
pyinstaller
ãçšããããã±ãŒãžåãš Dockerfile ãçšããã³ã³ããåã«ããããŒã«ã«ã§ãã«ããå¯èœã§ãã
æé ã®è©³çŽ°ã¯ è²¢ç®è
ã¬ã€ã#ãã«ã ã埡芧ãã ããã
GitHub ãçšããå Žåãfork ãããªããžããªã§ GitHub Actions ã«ãããã«ããå¯èœã§ãã
Actions ã ON ã«ããworkflow_dispatch ã§build-engine-package.yml
ãèµ·åããã°ãã«ãã§ããŸãã
ææç©ã¯ Release ã«ã¢ããããŒããããŸãã
ãã«ãã«å¿
èŠãª GitHub Actions ã®èšå®ã¯ è²¢ç®è
ã¬ã€ã#GitHub Actions ã埡芧ãã ããã
pytest
ãçšãããã¹ããšåçš®ãªã³ã¿ãŒãçšããéç解æãå¯èœã§ãã
æé ã®è©³çŽ°ã¯ è²¢ç®è
ã¬ã€ã#ãã¹ã, è²¢ç®è
ã¬ã€ã#éç解æ ã埡芧ãã ããã
äŸåé¢ä¿ã¯ poetry
ã§ç®¡çãããŠããŸãããŸããå°å
¥å¯èœãªäŸåã©ã€ãã©ãªã«ã¯ã©ã€ã»ã³ã¹äžã®å¶çŽããããŸãã
詳现㯠貢ç®è
ã¬ã€ã#ããã±ãŒãž ã埡芧ãã ããã
VOICEVOX ãšãã£ã¿ãŒã§ã¯ãè€æ°ã®ãšã³ãžã³ãåæã«èµ·åããããšãã§ããŸãã ãã®æ©èœãå©çšããããšã§ãèªäœã®é³å£°åæãšã³ãžã³ãæ¢åã®é³å£°åæãšã³ãžã³ã VOICEVOX ãšãã£ã¿ãŒäžã§åããããšãå¯èœã§ãã
VOICEVOX API ã«æºæ ããè€æ°ã®ãšã³ãžã³ã® Web API ãããŒããåããŠèµ·åããçµ±äžçã«æ±ãããšã§ãã«ããšã³ãžã³æ©èœãå®çŸããŠããŸãã ãšãã£ã¿ãŒãããããã®ãšã³ãžã³ãå®è¡ãã€ããªçµç±ã§èµ·åããEngineID ãšçµã³ã€ããŠèšå®ãç¶æ ãåå¥ç®¡çããŸãã
VOICEVOX API æºæ ãšã³ãžã³ãèµ·åããå®è¡ãã€ããªãäœãããšã§å¯Ÿå¿ãå¯èœã§ãã VOICEVOX ENGINE ãªããžããªã fork ããäžéšã®æ©èœãæ¹é ããã®ãç°¡åã§ãã
æ¹é ãã¹ãç¹ã¯ãšã³ãžã³æ å ±ã»ãã£ã©ã¯ã¿ãŒæ å ±ã»é³å£°åæã®ïŒç¹ã§ãã
ãšã³ãžã³ã®æ
å ±ã¯ã«ãŒãçŽäžã®ãããã§ã¹ããã¡ã€ã«ïŒengine_manifest.json
ïŒã§ç®¡çãããŠããŸãã
ãã®åœ¢åŒã®ãããã§ã¹ããã¡ã€ã«ã¯ VOICEVOX API æºæ ãšã³ãžã³ã«å¿
é ã§ãã
ãããã§ã¹ããã¡ã€ã«å
ã®æ
å ±ãèŠãŠé©å®å€æŽããŠãã ããã
é³å£°åæææ³ã«ãã£ãŠã¯ãäŸãã°ã¢ãŒãã£ã³ã°æ©èœãªã©ãVOICEVOX ãšåãæ©èœãæã€ããšãã§ããªãå ŽåããããŸãã
ãã®å Žåã¯ãããã§ã¹ããã¡ã€ã«å
ã®supported_features
å
ã®æ
å ±ãé©å®å€æŽããŠãã ããã
ãã£ã©ã¯ã¿ãŒæ
å ±ã¯resources/character_info
ãã£ã¬ã¯ããªå
ã®ãã¡ã€ã«ã§ç®¡çãããŠããŸãã
ãããŒã®ã¢ã€ã³ã³ãªã©ãçšæãããŠããã®ã§é©å®å€æŽããŠãã ããã
é³å£°åæã¯voicevox_engine/tts_pipeline/tts_engine.py
ã§è¡ãããŠããŸãã
VOICEVOX API ã§ã®é³å£°åæã¯ããšã³ãžã³åŽã§é³å£°åæçšã®ã¯ãšãª AudioQuery
ã®åæå€ãäœæããŠãŠãŒã¶ãŒã«è¿ãããŠãŒã¶ãŒãå¿
èŠã«å¿ããŠã¯ãšãªãç·šéããããšããšã³ãžã³ãã¯ãšãªã«åŸã£ãŠé³å£°åæããããšã§å®çŸããŠããŸãã
ã¯ãšãªäœæã¯/audio_query
ãšã³ããã€ã³ãã§ãé³å£°åæã¯/synthesis
ãšã³ããã€ã³ãã§è¡ã£ãŠãããæäœãã®ïŒã€ã«å¯Ÿå¿ããã° VOICEVOX API ã«æºæ ããããšã«ãªããŸãã
VVPP ãã¡ã€ã«ãšããŠé
åžããã®ãããããã§ãã
VVPP ã¯ãVOICEVOX ãã©ã°ã€ã³ããã±ãŒãžãã®ç¥ã§ãäžèº«ã¯ãã«ããããšã³ãžã³ãªã©ãå«ãã ãã£ã¬ã¯ããªã® Zip ãã¡ã€ã«ã§ãã
æ¡åŒµåã.vvpp
ã«ãããšãããã«ã¯ãªãã¯ã§ VOICEVOX ãšãã£ã¿ãŒã«ã€ã³ã¹ããŒã«ã§ããŸãã
ãšãã£ã¿ãŒåŽã¯åãåã£ã VVPP ãã¡ã€ã«ãããŒã«ã«ãã£ã¹ã¯äžã« Zip å±éããããšãã«ãŒãã®çŽäžã«ããengine_manifest.json
ã«åŸã£ãŠãã¡ã€ã«ãæ¢æ»ããŸãã
VOICEVOX ãšãã£ã¿ãŒã«ããŸãèªã¿èŸŒãŸããããªããšãã¯ããšãã£ã¿ãŒã®ãšã©ãŒãã°ãåç
§ããŠãã ããã
ãŸããxxx.vvpp
ã¯åå²ããŠé£çªãä»ããxxx.0.vvppp
ãã¡ã€ã«ãšããŠé
åžããããšãå¯èœã§ãã
ããã¯ãã¡ã€ã«å®¹éã倧ãããŠé
åžãå°é£ãªå Žåã«æçšã§ãã
ã€ã³ã¹ããŒã«ã«å¿
èŠãªvvpp
ããã³vvppp
ãã¡ã€ã«ã¯vvpp.txt
ãã¡ã€ã«ã«ãªã¹ãã¢ããããŠããŸãã
voicevox-client @voicevox-client  VOICEVOX ENGINE ã®åèšèªåã API ã©ãããŒ
LGPL v3 ãšããœãŒã¹ã³ãŒãã®å
¬éãäžèŠãªå¥ã©ã€ã»ã³ã¹ã®ãã¥ã¢ã«ã©ã€ã»ã³ã¹ã§ãã
å¥ã©ã€ã»ã³ã¹ãååŸãããå Žåã¯ãããã«æ±ããŠãã ããã
X ã¢ã«ãŠã³ã: @hiho_karuta
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for AivisSpeech-Engine
Similar Open Source Tools

AivisSpeech-Engine
AivisSpeech-Engine is a powerful open-source tool for speech recognition and synthesis. It provides state-of-the-art algorithms for converting speech to text and text to speech. The tool is designed to be user-friendly and customizable, allowing developers to easily integrate speech capabilities into their applications. With AivisSpeech-Engine, users can transcribe audio recordings, create voice-controlled interfaces, and generate natural-sounding speech output. Whether you are building a virtual assistant, developing a speech-to-text application, or experimenting with voice technology, AivisSpeech-Engine offers a comprehensive solution for all your speech processing needs.

recognizer
Recognizer is a Python library for speech recognition. It provides a simple interface to transcribe speech from audio files or live audio input. The library supports multiple speech recognition engines, including Google Speech Recognition, Sphinx, and Wit.ai. Recognizer is easy to use and can be integrated into various applications to enable voice commands, transcription, and speech-to-text functionality.

orcish-ai-nextjs-framework
The Orcish AI Next.js Framework is a powerful tool that leverages OpenAI API to seamlessly integrate AI functionalities into Next.js applications. It allows users to generate text, images, and text-to-speech based on specified input. The framework provides an easy-to-use interface for utilizing AI capabilities in application development.

amazon-sagemaker-generativeai
Repository for training and deploying Generative AI models, including text-text, text-to-image generation, prompt engineering playground and chain of thought examples using SageMaker Studio. The tool provides a platform for users to experiment with generative AI techniques, enabling them to create text and image outputs based on input data. It offers a range of functionalities for training and deploying models, as well as exploring different generative AI applications.

Rodel.Agent
Rodel Agent is a Windows desktop application that integrates chat, text-to-image, text-to-speech, and machine translation services, providing users with a comprehensive desktop AI experience. The application supports mainstream AI services and aims to enhance user interaction through various AI functionalities.

nvim-aider
Nvim-aider is a plugin for Neovim that provides additional functionality and key mappings to enhance the user's editing experience. It offers features such as code navigation, quick access to commonly used commands, and improved text manipulation tools. With Nvim-aider, users can streamline their workflow and increase productivity while working with Neovim.

companion
Companion is a generative AI-powered tool that serves as a private tutor for learning a new foreign language. It utilizes OpenAI ChatGPT & Whisper and Google Text-to-Speech & Translate to enable users to write, talk, read, and listen in both their native language and the selected foreign language. The tool is designed to correct any mistakes made by the user and can be run locally or as a cloud service, making it accessible on mobile devices. Companion is distributed for non-commercial usage, but users should be aware that some of the APIs and services it relies on may incur charges based on usage.

chatmcp
Chatmcp is a chatbot framework for building conversational AI applications. It provides a flexible and extensible platform for creating chatbots that can interact with users in a natural language. With Chatmcp, developers can easily integrate chatbot functionality into their applications, enabling users to communicate with the system through text-based conversations. The framework supports various natural language processing techniques and allows for the customization of chatbot behavior and responses. Chatmcp simplifies the development of chatbots by providing a set of pre-built components and tools that streamline the creation process. Whether you are building a customer support chatbot, a virtual assistant, or a chat-based game, Chatmcp offers the necessary features and capabilities to bring your conversational AI ideas to life.

Kohaku-NAI
Kohaku-NAI is a simple Novel-AI client with utilities like a generation server, saving images automatically, account pool, and an auth system. It also includes a standalone client, a DC bot based on the generation server, and a stable-diffusion-webui extension. Users can use it to generate images with NAI API within sd-webui, as a standalone client, gen server, or DC bot. The project aims to add features like QoS system, better client, random prompts, and fetch account info in the future.

deeppowers
Deeppowers is a powerful Python library for deep learning applications. It provides a wide range of tools and utilities to simplify the process of building and training deep neural networks. With Deeppowers, users can easily create complex neural network architectures, perform efficient training and optimization, and deploy models for various tasks. The library is designed to be user-friendly and flexible, making it suitable for both beginners and experienced deep learning practitioners.

duckduckgo-ai-chat
This repository contains a chatbot tool powered by AI technology. The chatbot is designed to interact with users in a conversational manner, providing information and assistance on various topics. Users can engage with the chatbot to ask questions, seek recommendations, or simply have a casual conversation. The AI technology behind the chatbot enables it to understand natural language inputs and provide relevant responses, making the interaction more intuitive and engaging. The tool is versatile and can be customized for different use cases, such as customer support, information retrieval, or entertainment purposes. Overall, the chatbot offers a user-friendly and interactive experience, leveraging AI to enhance communication and engagement.

open-ai
Open AI is a powerful tool for artificial intelligence research and development. It provides a wide range of machine learning models and algorithms, making it easier for developers to create innovative AI applications. With Open AI, users can explore cutting-edge technologies such as natural language processing, computer vision, and reinforcement learning. The platform offers a user-friendly interface and comprehensive documentation to support users in building and deploying AI solutions. Whether you are a beginner or an experienced AI practitioner, Open AI offers the tools and resources you need to accelerate your AI projects and stay ahead in the rapidly evolving field of artificial intelligence.

semantic-router
Semantic Router is a superfast decision-making layer for your LLMs and agents. Rather than waiting for slow LLM generations to make tool-use decisions, we use the magic of semantic vector space to make those decisions â _routing_ our requests using _semantic_ meaning.

simple-ai
Simple AI is a lightweight Python library for implementing basic artificial intelligence algorithms. It provides easy-to-use functions and classes for tasks such as machine learning, natural language processing, and computer vision. With Simple AI, users can quickly prototype and deploy AI solutions without the complexity of larger frameworks.

swirl-search
Swirl is an open-source software that allows users to simultaneously search multiple content sources and receive AI-ranked results. It connects to various data sources, including databases, public data services, and enterprise sources, and utilizes AI and LLMs to generate insights and answers based on the user's data. Swirl is easy to use, requiring only the download of a YML file, starting in Docker, and searching with Swirl. Users can add credentials to preloaded SearchProviders to access more sources. Swirl also offers integration with ChatGPT as a configured AI model. It adapts and distributes user queries to anything with a search API, re-ranking the unified results using Large Language Models without extracting or indexing anything. Swirl includes five Google Programmable Search Engines (PSEs) to get users up and running quickly. Key features of Swirl include Microsoft 365 integration, SearchProvider configurations, query adaptation, synchronous or asynchronous search federation, optional subscribe feature, pipelining of Processor stages, results stored in SQLite3 or PostgreSQL, built-in Query Transformation support, matching on word stems and handling of stopwords, duplicate detection, re-ranking of unified results using Cosine Vector Similarity, result mixers, page through all results requested, sample data sets, optional spell correction, optional search/result expiration service, easily extensible Connector and Mixer objects, and a welcoming community for collaboration and support.

note-gen
Note-gen is a simple tool for generating notes automatically based on user input. It uses natural language processing techniques to analyze text and extract key information to create structured notes. The tool is designed to save time and effort for users who need to summarize large amounts of text or generate notes quickly. With note-gen, users can easily create organized and concise notes for study, research, or any other purpose.
For similar tasks

AivisSpeech-Engine
AivisSpeech-Engine is a powerful open-source tool for speech recognition and synthesis. It provides state-of-the-art algorithms for converting speech to text and text to speech. The tool is designed to be user-friendly and customizable, allowing developers to easily integrate speech capabilities into their applications. With AivisSpeech-Engine, users can transcribe audio recordings, create voice-controlled interfaces, and generate natural-sounding speech output. Whether you are building a virtual assistant, developing a speech-to-text application, or experimenting with voice technology, AivisSpeech-Engine offers a comprehensive solution for all your speech processing needs.

npcsh
`npcsh` is a python-based command-line tool designed to integrate Large Language Models (LLMs) and Agents into one's daily workflow by making them available and easily configurable through the command line shell. It leverages the power of LLMs to understand natural language commands and questions, execute tasks, answer queries, and provide relevant information from local files and the web. Users can also build their own tools and call them like macros from the shell. `npcsh` allows users to take advantage of agents (i.e. NPCs) through a managed system, tailoring NPCs to specific tasks and workflows. The tool is extensible with Python, providing useful functions for interacting with LLMs, including explicit coverage for popular providers like ollama, anthropic, openai, gemini, deepseek, and openai-like providers. Users can set up a flask server to expose their NPC team for use as a backend service, run SQL models defined in their project, execute assembly lines, and verify the integrity of their NPC team's interrelations. Users can execute bash commands directly, use favorite command-line tools like VIM, Emacs, ipython, sqlite3, git, pipe the output of these commands to LLMs, or pass LLM results to bash commands.

agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.

zep-python
Zep is an open-source platform for building and deploying large language model (LLM) applications. It provides a suite of tools and services that make it easy to integrate LLMs into your applications, including chat history memory, embedding, vector search, and data enrichment. Zep is designed to be scalable, reliable, and easy to use, making it a great choice for developers who want to build LLM-powered applications quickly and easily.

lollms
LoLLMs Server is a text generation server based on large language models. It provides a Flask-based API for generating text using various pre-trained language models. This server is designed to be easy to install and use, allowing developers to integrate powerful text generation capabilities into their applications.

LlamaIndexTS
LlamaIndex.TS is a data framework for your LLM application. Use your own data with large language models (LLMs, OpenAI ChatGPT and others) in Typescript and Javascript.

semantic-kernel
Semantic Kernel is an SDK that integrates Large Language Models (LLMs) like OpenAI, Azure OpenAI, and Hugging Face with conventional programming languages like C#, Python, and Java. Semantic Kernel achieves this by allowing you to define plugins that can be chained together in just a few lines of code. What makes Semantic Kernel _special_ , however, is its ability to _automatically_ orchestrate plugins with AI. With Semantic Kernel planners, you can ask an LLM to generate a plan that achieves a user's unique goal. Afterwards, Semantic Kernel will execute the plan for the user.

botpress
Botpress is a platform for building next-generation chatbots and assistants powered by OpenAI. It provides a range of tools and integrations to help developers quickly and easily create and deploy chatbots for various use cases.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.