AivisSpeech

AivisSpeech: AI Voice Imitation System - Text to Speech Software

Stars: 325

Visit

AivisSpeech is a Japanese text-to-speech software based on the VOICEVOX editor UI. It incorporates the AivisSpeech Engine for generating emotionally rich voices easily. It supports AIVMX format voice synthesis model files and specific model architectures like Style-Bert-VITS2. Users can download AivisSpeech and AivisSpeech Engine for Windows and macOS PCs, with minimum memory requirements specified. The development follows the latest version of VOICEVOX, focusing on minimal modifications, rebranding only where necessary, and avoiding refactoring. The project does not update documentation, maintain test code, or refactor unused features to prevent conflicts with VOICEVOX.

README:

AivisSpeech

💠 AivisSpeech: AI Voice Imitation System - Text to Speech Software

AivisSpeech は、VOICEVOX のエディター UI をベースにした、日本語音声合成ソフトウェアです。
日本語音声合成エンジンの AivisSpeech Engine を組み込んでおり、かんたんに感情豊かな音声を生成できます。

💠 AivisSpeech をダウンロード／ 💠 AivisSpeech Engine をダウンロード

ユーザーの方へ
動作環境
サポートされている音声合成モデル
- 対応モデルアーキテクチャ
- モデルファイルの配置場所
開発方針
開発環境の構築
開発
ライセンス

ユーザーの方へ

AivisSpeech の使い方をお探しの方は、AivisSpeech 公式サイトをご覧ください。

このページでは、主に開発者向けの情報を掲載しています。
以下はユーザーの方向けのドキュメントです。

動作環境

Windows・macOS 搭載の PC に対応しています。
AivisSpeech を起動するには、PC に 1.5GB 以上の空きメモリ (RAM) が必要です。

Windows: Windows 10 (22H2 以降)・Windows 11
macOS: macOS 13 Ventura 以降

[!NOTE] Intel CPU 搭載 Mac での動作は積極的に検証していません。
Intel CPU 搭載 Mac はすでに製造が終了しており、検証環境やビルド環境の用意自体が難しくなってきています。なるべく Apple Silicon 搭載 Mac での利用をおすすめいたします。

[!WARNING] Windows 10 では、バージョン 22H2 での動作確認のみ行っています。
サポートが終了した Windows 10 の古いバージョンでは、AivisSpeech Engine がクラッシュし起動に失敗する事例が報告されています。
セキュリティ上の観点からも、Windows 10 環境の方は、最低限バージョン 22H2 まで更新してからの利用を強くおすすめいたします。

サポートされている音声合成モデル

AivisSpeech に組み込まれている AivisSpeech Engine は、AIVMX (Aivis Voice Model for ONNX) (拡張子 .aivmx) フォーマットの音声合成モデルファイルをサポートしています。

AIVM (Aivis Voice Model) / AIVMX (Aivis Voice Model for ONNX) は、学習済みモデル・ハイパーパラメータ・スタイルベクトル・話者メタデータ（名前・概要・ライセンス・アイコン・ボイスサンプルなど）を 1 つのファイルにギュッとまとめた、AI 音声合成モデル用オープンファイルフォーマットです。

AIVM 仕様や AIVM / AIVMX ファイルについての詳細は、Aivis Project にて策定した AIVM 仕様 をご参照ください。

[!NOTE]
「AIVM」は、AIVM / AIVMX 両方のフォーマット仕様・メタデータ仕様の総称でもあります。
具体的には、AIVM ファイルは「AIVM メタデータを追加した Safetensors 形式」、AIVMX ファイルは「AIVM メタデータを追加した ONNX 形式」のモデルファイルです。
「AIVM メタデータ」とは、AIVM 仕様に定義されている、学習済みモデルに紐づく各種メタデータのことをいいます。

[!IMPORTANT]
AivisSpeech Engine は AIVM 仕様のリファレンス実装でもありますが、敢えて AIVMX ファイルのみをサポートする設計としています。
これにより、PyTorch への依存を排除してインストールサイズを削減し、ONNX Runtime による高速な CPU 推論を実現しています。

[!TIP]
AIVM Generator を使うと、既存の音声合成モデルから AIVM / AIVMX ファイルを生成したり、既存の AIVM / AIVMX ファイルのメタデータを編集したりできます！

対応モデルアーキテクチャ

以下のモデルアーキテクチャの AIVMX ファイルを利用できます。

Style-Bert-VITS2
Style-Bert-VITS2 (JP-Extra)

[!NOTE] AIVM メタデータの仕様上は多言語対応の話者を定義できますが、AivisSpeech Engine は VOICEVOX ENGINE と同様に、日本語音声合成のみに対応しています。
そのため、英語や中国語に対応した音声合成モデルであっても、日本語以外の音声合成はできません。

モデルファイルの配置場所

AIVMX ファイルは、OS ごとに以下のフォルダに配置してください。

Windows: C:\Users\(ユーザー名)\AppData\Roaming\AivisSpeech-Engine\Models
macOS: ~/Library/Application Support/AivisSpeech-Engine/Models
Linux: ~/.local/share/AivisSpeech-Engine/Models

実際のフォルダパスは、AivisSpeech Engine の起動直後のログに Models directory: として表示されます。

[!TIP]
AivisSpeech 利用時は、AivisSpeech の UI 画面から簡単に音声合成モデルを追加できます！
エンドユーザーの方は、基本的にこちらの方法で音声合成モデルを追加することをおすすめします。

[!IMPORTANT] 開発版 (PyInstaller でビルドされていない状態で AivisSpeech Engine を実行している場合) の配置フォルダは、AivisSpeech-Engine 以下ではなく AivisSpeech-Engine-Dev 以下となります。

開発方針

VOICEVOX は非常に巨大なソフトウェアであり、現在も活発に開発が続けられています。
そのため、AivisSpeech では VOICEVOX の最新版をベースに、以下の方針で開発を行っています。

VOICEVOX 最新版への追従を容易にするため、できるだけ改変を必要最小限に留める
- VOICEVOX から AivisSpeech へのリブランディングは必要な箇所のみ行う
リファクタリングを行わない
- VOICEVOX とのコンフリクトが発生することが容易に予想される上、コード全体に精通しているわけではないため
AivisSpeech で利用しない機能 (歌声合成機能など) であっても、コードの削除は行わない
- これもコンフリクトを回避するため
- 利用しないコードの無効化は削除ではなく、コメントアウトで行う
保守や追従が困難なため、ドキュメントの更新は行わない
- このため各ドキュメントは一切更新されておらず、AivisSpeech での変更を反映していない
AivisSpeech 向けの改変にともないテストコードの維持が困難なため、テストコードの更新は行わない
- 特に E2E テストは UI が大きく改変されているためまともに動作しない
- テストコードの一部は AivisSpeech に合わせて修正されているが、動作検証は行っておらず放置されている

開発環境の構築

手順はオリジナルの VOICEVOX と異なります。
事前に Node.js 22.11.0 がインストールされている必要があります。

# 依存関係をすべてインストール
npm ci

# .env.development を .env にコピー
## コピーした .env を編集する必要はない
cp .env.development .env

# macOS のみ、.env.production を編集
nano .env.production
--------------------
# executionFilePath を "AivisSpeech-Engine/run.exe" から "../Resources/AivisSpeech-Engine/run" に書き換える
## executionFilePath は、npm run electron:build でビルドした製品ビルドの AivisSpeech の起動時に使用される
...
VITE_DEFAULT_ENGINE_INFOS=`[
    {
        "uuid": "1b4a5014-d9fd-11ee-b97d-83c170a68ed3",
        "name": "AivisSpeech Engine",
        "executionEnabled": true,
        "executionFilePath": "../Resources/AivisSpeech-Engine/run",
        "executionArgs": [],
        "host": "http://127.0.0.1:10101"
    }
]`
...
--------------------

# 事前に別のターミナルで AivisSpeech Engine を起動しておく
## AivisSpeech Engine の開発環境は別途構築する必要がある
cd ../AivisSpeech-Engine
poetry run task serve

開発

手順は一部オリジナルの VOICEVOX と異なります。

# 開発環境で Electron 版 AivisSpeech を起動
npm run electron:serve

# 開発環境でブラウザ版 AivisSpeech を起動
npm run browser:serve

# Electron 版 AivisSpeech をビルド
npm run electron:build

# ブラウザ版 AivisSpeech (WIP) をビルド
npm run browser:build

# コードフォーマットを自動修正
npm run format

# コードフォーマットをチェック
npm run lint

# OpenAPI Generator による自動生成コードを更新
npm run openapi:generate

# 依存ライブラリのライセンス情報を生成
## VOICEVOX と異なり、音声合成エンジンとのライセンス情報との統合は行わない
## エディタ側で別途エンジンマニフェストから取得したライセンス情報を表示できるようにしているため不要
npm run license:generate

ライセンス

ベースである VOICEVOX / VOICEVOX ENGINE のデュアルライセンスのうち、LGPL-3.0 のみを単独で継承します。

下記ならびに docs/ 以下のドキュメントは、VOICEVOX 本家のドキュメントを改変なしでそのまま引き継いでいます。これらのドキュメントの内容が AivisSpeech にも通用するかは保証されません。

VOICEVOX

VOICEVOX のエディターです。

（エンジンは VOICEVOX ENGINE 、コアは VOICEVOX CORE 、全体構成はこちらに詳細があります。）

ユーザーの方へ

こちらは開発用のページになります。利用方法に関してはVOICEVOX 公式サイトをご覧ください。

プロジェクトに貢献したいと考えている方へ

VOICEVOXプロジェクトは興味ある方の参画を歓迎しています。貢献手順について説明したガイドをご用意しております。

貢献というとプログラム作成と思われがちですが、ドキュメント執筆、テスト生成、改善提案への議論参加など様々な参加方法があります。初心者歓迎タスクもありますので、皆様のご参加をお待ちしております。

VOICEVOX のエディタは Electron・TypeScript・Vue・Vuex などが活用されており、全体構成がわかりにくくなっています。
コードの歩き方で構成を紹介しているので、開発の一助になれば幸いです。

Issue を解決するプルリクエストを作成される際は、別の方と同じ Issue に取り組むことを避けるため、 Issue 側で取り組み始めたことを伝えるか、最初に Draft プルリクエストを作成してください。

VOICEVOX 非公式 Discord サーバーにて、開発の議論や雑談を行っています。気軽にご参加ください。

デザインガイドライン

UX・UI デザインの方針をご参照ください。

環境構築

.node-version に記載されているバージョンの Node.js をインストールしてください。
Node.js の管理ツール（nvsやVoltaなど）を利用すると簡単にインストールでき、Node.js の自動切り替えもできます。

Node.js をインストール後、このリポジトリを Fork して git clone してください。

依存ライブラリをインストールする

次のコマンドを実行することで依存ライブラリがインストール・アップデートされます。

npm i -g pnpm # 初回のみ
pnpm i

実行

エンジンの準備

.env.productionをコピーして.envを作成し、VITE_DEFAULT_ENGINE_INFOS内のexecutionFilePathに製品版 VOICEVOX 内のvv-engine/run.exeを指定すれば動きます。

Windows でインストール先を変更していない場合はC:/Users/(ユーザー名)/AppData/Local/Programs/VOICEVOX/vv-engine/run.exeを指定してください。
パスの区切り文字は\ではなく/なのでご注意ください。

macOS 向けのVOICEVOX.appを利用している場合は/path/to/VOICEVOX.app/Resources/MacOS/vv-engine/runを指定してください。

Linux の場合は、Releasesから入手できる tar.gz 版に含まれるvv-engine/runコマンドを指定してください。 AppImage 版の場合は$ /path/to/VOICEVOX.AppImage --appimage-mountでファイルシステムをマウントできます。

VOICEVOX エディタの実行とは別にエンジン API のサーバを立てている場合はexecutionFilePathを指定する必要はありませんが、代わりにexecutionEnabledをfalseにしてください。これは製品版 VOICEVOX を起動している場合もあてはまります。

エンジン API の宛先エンドポイントを変更する場合はVITE_DEFAULT_ENGINE_INFOS内のhostを変更してください。

Electron の実行

# 開発しやすい環境で実行
pnpm run electron:serve

# ビルド時に近い環境で実行
pnpm run electron:serve --mode production

# 引数を指定して実行
pnpm run electron:serve -- ...

音声合成エンジンのリポジトリはこちらです https://github.com/VOICEVOX/voicevox_engine

Storybook の実行

Storybook を使ってコンポーネントを開発することができます。

pnpm run storybook

main ブランチの Storybook はVOICEVOX/preview-pagesから確認できます。
https://voicevox.github.io/preview-pages/preview/branch-main/storybook/index.html

ブラウザ版の実行（開発中）

別途音声合成エンジンを起動し、以下を実行して表示された localhost へアクセスします。

pnpm run browser:serve

また、main ブランチのビルド結果がVOICEVOX/preview-pagesにデプロイされています。
https://voicevox.github.io/preview-pages/preview/branch-main/editor/index.html
今はローカル PC 上で音声合成エンジンを起動する必要があります。

ビルド

pnpm run electron:build

Github Actions でビルド

fork したリポジトリで Actions を ON にし、workflow_dispatch でbuild.ymlを起動すればビルドできます。成果物は Release にアップロードされます。

テスト

単体テスト

./tests/unit/ 以下にあるテストと、Storybookのテストを実行します。

pnpm run test:unit
pnpm run test-watch:unit # 監視モード
pnpm run test-ui:unit # VitestのUIを表示
pnpm run test:unit --update # スナップショットの更新

[!NOTE]
./tests/unit 下のテストは、ファイル名によってテストを実行する環境が変化します。

.node.spec.ts：Node.js 環境

.browser.spec.ts：ブラウザ環境（Chromium）

.spec.ts：ブラウザ環境（happy-domによるエミュレート）

ブラウザ End to End テスト

Electron の機能が不要な、UI や音声合成などの End to End テストを実行します。

[!NOTE] 一部のエンジンの設定を書き換えるテストは、CI(Github Actions)上でのみ実行されるようになっています。

pnpm run test:browser-e2e
pnpm run test-watch:browser-e2e # 監視モード
pnpm run test-watch:browser-e2e --headed # テスト中の UI を表示
pnpm run test-ui:browser-e2e # Playwright の UI を表示

Playwright を使用しているためテストパターンを生成することもできます。 ブラウザ版を起動している状態で以下のコマンドを実行してください。

pnpm exec playwright codegen http://localhost:5173/ --viewport-size=1024,630

詳細は Playwright ドキュメントの Test generator を参照してください。

Storybook の Visual Regression Testing

Storybook のコンポーネントのスクリーンショットを比較して、変更がある場合は差分を表示します。

[!NOTE] このテストは Windows でのみ実行できます。

pnpm run test:storybook-vrt
pnpm run test-watch:storybook-vrt # 監視モード
pnpm run test-ui:storybook-vrt # Playwright の UI を表示

スクリーンショットの更新

ブラウザ End to End テストと Storybook では Visual Regression Testing を行っています。現在 VRT テストは Windows のみで行っています。以下の手順でスクリーンショットを更新できます：

Github Actions で更新する場合

フォークしたリポジトリの設定で GitHub Actions を有効にします。
リポジトリの設定の Actions > General > Workflow permissions で Read and write permissions を選択します。
[update snapshots] という文字列をコミットメッセージに含めてコミットします。
```
git commit -m "UIを変更 [update snapshots]"
```
Github Workflow が完了すると、更新されたスクリーンショットがコミットされます。
プルした後、空コミットをプッシュしてテストを再実行します。
```
git commit --allow-empty -m "（テストを再実行）"
git push
```

[!NOTE] トークンを作成して Secrets に追加することで、自動的にテストを再実行できます。

Fine-granted Tokens にアクセスします。

適当な名前を入力し、 ユーザー名/voicevox へのアクセス権を与え、 Repository permissions の Contents で Read and write を選択します。

設定例

トークンを作成して文字列をコピーします。

ユーザー名/voicevox のリポジトリの Settings > Secrets and variables > Actions > New repository secret を開きます。

名前に PUSH_TOKEN と入力し、先ほどの文字列を貼り付けて Secrets を追加します。

ローカルで更新する場合

ローカル PC の OS に対応したもののみが更新されます。

pnpm run test:browser-e2e --update-snapshots

Electron End to End テスト

Electron の機能が必要な、エンジン起動・終了などを含めた End to End テストを実行します。

pnpm run test:electron-e2e
pnpm run test-watch:electron-e2e # 監視モード

依存ライブラリのライセンス情報の生成

依存ライブラリのライセンス情報は Github Workflow でのビルド時に自動生成されます。以下のコマンドで生成できます。

# get licenses.json from voicevox_engine as engine_licenses.json

pnpm run license:generate -o voicevox_licenses.json
pnpm run license:merge -o public/licenses.json -i engine_licenses.json -i voicevox_licenses.json

コードフォーマット

コードのフォーマットを整えます。プルリクエストを送る前に実行してください。

pnpm run fmt

リント（静的解析）

コードの静的解析を行い、バグを未然に防ぎます。プルリクエストを送る前に実行してください。

pnpm run lint

リントを行うとリポジトリルートにキャッシュファイル.eslintcacheが作られます。 ESLintがバージョンアップした場合や、設定が変わった場合、キャッシュが壊れた場合はこのファイルを消してください。

タイポチェック

typos を使ってタイポのチェックを行っています。

pnpm run typos

でタイポチェックを行えます。もし誤判定やチェックから除外すべきファイルがあれば設定ファイルの説明に従って_typos.tomlを編集してください。

型チェック

TypeScript の型チェックを行います。

pnpm run typecheck

Markdownlint

Markdown の文法チェックを行います。

pnpm run markdownlint

Shellcheck

ShellScript の文法チェックを行います。インストール方法はこちらを参照してください。

shellcheck ./build/*.sh

OpenAPI generator

音声合成エンジンが起動している状態で以下のコマンドを実行してください。

curl http://127.0.0.1:50021/openapi.json >openapi.json

pnpm exec openapi-generator-cli generate \
    -i openapi.json \
    -g typescript-fetch \
    -o src/openapi/ \
    --additional-properties "modelPropertyNaming=camelCase,supportsES6=true,withInterfaces=true,typescriptThreePlus=true"

pnpm run fmt

OpanAPI generator のバージョンアップ

新しいバージョンの確認・インストールは次のコマンドで行えます。

pnpm exec openapi-generator-cli version-manager list

VS Code でのデバッグ実行

npm scripts の serve や electron:serve などの開発ビルド下では、ビルドに使用している vite で sourcemap を出力するため、ソースコードと出力されたコードの対応付けが行われます。

.vscode/launch.template.json をコピーして .vscode/launch.json を、 .vscode/tasks.template.json をコピーして .vscode/tasks.json を作成することで、開発ビルドを VS Code から実行し、デバッグを可能にするタスクが有効になります。

ライセンス

LGPL v3 と、ソースコードの公開が不要な別ライセンスのデュアルライセンスです。別ライセンスを取得したい場合は、ヒホに求めてください。
X アカウント: @hiho_karuta

For Tasks:

Click tags to check more tools for each tasks

generate voices add emotion to text download aivmx files configure model architecture install on windows/macos

For Jobs:

voice actor software developer linguist content creator ai engineer

Alternative AI tools for AivisSpeech

Similar Open Source Tools

AivisSpeech

github

: 325

AivisSpeech-Engine

AivisSpeech-Engine is a powerful open-source tool for speech recognition and synthesis. It provides state-of-the-art algorithms for converting speech to text and text to speech. The tool is designed to be user-friendly and customizable, allowing developers to easily integrate speech capabilities into their applications. With AivisSpeech-Engine, users can transcribe audio recordings, create voice-controlled interfaces, and generate natural-sounding speech output. Whether you are building a virtual assistant, developing a speech-to-text application, or experimenting with voice technology, AivisSpeech-Engine offers a comprehensive solution for all your speech processing needs.

github

: 97

chatgpt-webui

ChatGPT WebUI is a user-friendly web graphical interface for various LLMs like ChatGPT, providing simplified features such as core ChatGPT conversation and document retrieval dialogues. It has been optimized for better RAG retrieval accuracy and supports various search engines. Users can deploy local language models easily and interact with different LLMs like GPT-4, Azure OpenAI, and more. The tool offers powerful functionalities like GPT4 API configuration, system prompt setup for role-playing, and basic conversation features. It also provides a history of conversations, customization options, and a seamless user experience with themes, dark mode, and PWA installation support.

github

: 79

ZcChat

ZcChat is an AI desktop pet suitable for Galgame characters, featuring long-term memory, expressive actions, control over the computer, and voice functions. It utilizes Letta for AI long-term memory, Galgame-style character illustrations for more actions and expressions, and voice interaction with support for various voice synthesis tools like Vits. Users can configure characters, install Letta, set up voice synthesis and input, and control the pet to interact with the computer. The tool enhances visual and auditory experiences for users interested in AI desktop pets.

github

: 209

AI-Codereview-Gitlab

AI-Codereview-Gitlab is an automated code review tool based on large models, designed to help development teams conduct intelligent code reviews quickly during code merging or submission. It supports multiple large models including DeepSeek, ZhipuAI, OpenAI, and Ollama. The tool can automatically push review results to DingTalk, WeChat Work, and Feishu, generate daily reports based on GitLab commit records, and provide a visual dashboard to display code review records. The tool works by triggering webhook events on GitLab when users submit code, calling third-party large models to review the code, and recording the review results in corresponding Merge Requests or Commit Notes.

github

: 168

spring-boot-init-template

github

: 305

MoneyPrinterTurbo

MoneyPrinterTurbo is a tool that can automatically generate video content based on a provided theme or keyword. It can create video scripts, materials, subtitles, and background music, and then compile them into a high-definition short video. The tool features a web interface and an API interface, supporting AI-generated video scripts, customizable scripts, multiple HD video sizes, batch video generation, customizable video segment duration, multilingual video scripts, multiple voice synthesis options, subtitle generation with font customization, background music selection, access to high-definition and copyright-free video materials, and integration with various AI models like OpenAI, moonshot, Azure, and more. The tool aims to simplify the video creation process and offers future plans to enhance voice synthesis, add video transition effects, provide more video material sources, offer video length options, include free network proxies, enable real-time voice and music previews, support additional voice synthesis services, and facilitate automatic uploads to YouTube platform.

github

: 25.7k

gzm-design

Gzm Design is a free and open-source poster designer developed using the latest mainstream technologies such as Vue3, Vite4, TypeScript, etc. It provides features like PSD import, JSON import, multiple pages support, shortcut key support, template import, layer management, ruler tool, pen tool, element editing, preview, file download, canvas zooming and dragging, border stroke, filling, blending modes, text formatting, group handling, canvas size modification, rich text support, masking, shadow effects, undo/redo functionality, QR code tool, barcode tool, and ruler line npm package encapsulation.

github

: 513

wealth-tracker

Wealth Tracker is a personal finance management tool designed to help users track their income, expenses, and investments in one place. With intuitive features and customizable categories, users can easily monitor their financial health and make informed decisions. The tool provides detailed reports and visualizations to analyze spending patterns and set financial goals. Whether you are budgeting, saving for a big purchase, or planning for retirement, Wealth Tracker offers a comprehensive solution to manage your money effectively.

github

: 376

xlings

Xlings is a developer tool for programming learning, development, and course building. It provides features such as software installation, one-click environment setup, project dependency management, and cross-platform language package management. Additionally, it offers real-time compilation and running, AI code suggestions, tutorial project creation, automatic code checking for practice, and demo examples collection.

github

: 390

MarkMap-OpenAi-ChatGpt

MarkMap-OpenAi-ChatGpt is a Vue.js-based mind map generation tool that allows users to generate mind maps by entering titles or content. The application integrates the markmap-lib and markmap-view libraries, supports visualizing mind maps, and provides functions for zooming and adapting the map to the screen. Users can also export the generated mind map in PNG, SVG, JPEG, and other formats. This project is suitable for quickly organizing ideas, study notes, project planning, etc. By simply entering content, users can get an intuitive mind map that can be continuously expanded, downloaded, and shared.

github

: 77

wiseflow

Wiseflow is an agile information mining tool that utilizes the thinking and analysis capabilities of large models to accurately extract specific information from various given sources, without the need for manual intervention. The tool focuses on filtering noise from a vast amount of information to reveal valuable insights. It is recommended to use normal language models for information extraction tasks to optimize speed and cost, rather than complex reasoning models. The tool is designed for continuous information gathering based on specified focus points from various sources.

github

: 7.2k

Nano

Nano is a Transformer-based autoregressive language model for personal enjoyment, research, modification, and alchemy. It aims to implement a specific and lightweight Transformer language model based on PyTorch, without relying on Hugging Face. Nano provides pre-training and supervised fine-tuning processes for models with 56M and 168M parameters, along with LoRA plugins. It supports inference on various computing devices and explores the potential of Transformer models in various non-NLP tasks. The repository also includes instructions for experiencing inference effects, installing dependencies, downloading and preprocessing data, pre-training, supervised fine-tuning, model conversion, and various other experiments.

github

: 160

meet-libai

The 'meet-libai' project aims to promote and popularize the cultural heritage of the Chinese poet Li Bai by constructing a knowledge graph of Li Bai and training a professional AI intelligent body using large models. The project includes features such as data preprocessing, knowledge graph construction, question-answering system development, and visualization exploration of the graph structure. It also provides code implementations for large models and RAG retrieval enhancement.

github

: 1.1k

MINI_LLM

This project is a personal implementation and reproduction of a small-parameter Chinese LLM. It mainly refers to these two open source projects: https://github.com/charent/Phi2-mini-Chinese and https://github.com/DLLXW/baby-llama2-chinese. It includes the complete process of pre-training, SFT instruction fine-tuning, DPO, and PPO (to be done). I hope to share it with everyone and hope that everyone can work together to improve it!

github

: 413

uDesktopMascot

uDesktopMascot is an open-source project for a desktop mascot application with a theme of 'freedom of creation'. It allows users to load and display VRM or GLB/FBX model files on the desktop, customize GUI colors and background images, and access various features through a menu screen. The application supports Windows 10/11 and macOS platforms.

github

: 265

For similar tasks

AivisSpeech

github

: 325

Speech-AI-Forge

Speech-AI-Forge is a project developed around TTS generation models, implementing an API Server and a WebUI based on Gradio. The project offers various ways to experience and deploy Speech-AI-Forge, including online experience on HuggingFace Spaces, one-click launch on Colab, container deployment with Docker, and local deployment. The WebUI features include TTS model functionality, speaker switch for changing voices, style control, long text support with automatic text segmentation, refiner for ChatTTS native text refinement, various tools for voice control and enhancement, support for multiple TTS models, SSML synthesis control, podcast creation tools, voice creation, voice testing, ASR tools, and post-processing tools. The API Server can be launched separately for higher API throughput. The project roadmap includes support for various TTS models, ASR models, voice clone models, and enhancer models. Model downloads can be manually initiated using provided scripts. The project aims to provide inference services and may include training-related functionalities in the future.

github

: 1.2k

aigcpanel

AigcPanel is a simple and easy-to-use all-in-one AI digital human system that even beginners can use. It supports video synthesis, voice synthesis, voice cloning, simplifies local model management, and allows one-click import and use of AI models. It prohibits the use of this product for illegal activities and users must comply with the laws and regulations of the People's Republic of China.

github

: 656

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k

AivisSpeech

README:

AivisSpeech

💠 AivisSpeech をダウンロード ／ 💠 AivisSpeech Engine をダウンロード

ユーザーの方へ

動作環境

サポートされている音声合成モデル

対応モデルアーキテクチャ

モデルファイルの配置場所

開発方針

開発環境の構築

開発

ライセンス

VOICEVOX

ユーザーの方へ

プロジェクトに貢献したいと考えている方へ

デザインガイドライン

環境構築

依存ライブラリをインストールする

実行

エンジンの準備

Electron の実行

Storybook の実行

ブラウザ版の実行（開発中）

ビルド

Github Actions でビルド

テスト

単体テスト

ブラウザ End to End テスト

Storybook の Visual Regression Testing

スクリーンショットの更新

Github Actions で更新する場合

ローカルで更新する場合

Electron End to End テスト

依存ライブラリのライセンス情報の生成

コードフォーマット

リント（静的解析）

タイポチェック

型チェック

Markdownlint

Shellcheck

OpenAPI generator

OpanAPI generator のバージョンアップ

VS Code でのデバッグ実行

ライセンス

For Tasks:

For Jobs:

Alternative AI tools for AivisSpeech

Similar Open Source Tools

AivisSpeech

AivisSpeech-Engine

chatgpt-webui

ZcChat

AI-Codereview-Gitlab

spring-boot-init-template

MoneyPrinterTurbo

gzm-design

wealth-tracker

xlings

MarkMap-OpenAi-ChatGpt

wiseflow

Nano

meet-libai

MINI_LLM

uDesktopMascot

For similar tasks

AivisSpeech

Speech-AI-Forge

aigcpanel

For similar jobs

sweep

teams-ai

ai-guide

classifai

chatbot-ui

BricksLLM

uAgents

griptape

💠 AivisSpeech をダウンロード／ 💠 AivisSpeech Engine をダウンロード