
ebook2audiobook
Convert ebooks to audiobooks with chapters and metadata using dynamic AI models and voice cloning. Supports 1,107+ languages!
Stars: 9221

ebook2audiobook is a CPU/GPU converter tool that converts eBooks to audiobooks with chapters and metadata using tools like Calibre, ffmpeg, XTTSv2, and Fairseq. It supports voice cloning and a wide range of languages. The tool is designed to run on 4GB RAM and provides a new v2.0 Web GUI interface for user-friendly interaction. Users can convert eBooks to text format, split eBooks into chapters, and utilize high-quality text-to-speech functionalities. Supported languages include Arabic, Chinese, English, French, German, Hindi, and many more. The tool can be used for legal, non-DRM eBooks only and should be used responsibly in compliance with applicable laws.
README:
CPU/GPU Converter from eBooks to audiobooks with chapters and metadata
using XTTSv2, Bark, Vits, Fairseq, YourTTS and more. Supports voice cloning and +1110 languages!
[!IMPORTANT] This tool is intended for use with non-DRM, legally acquired eBooks only.
The authors are not responsible for any misuse of this software or any resulting legal consequences.
Use this tool responsibly and in accordance with all applicable laws.
Thanks to support ebook2audiobook developers!
- ara Ψ§ΩΨΉΨ±Ψ¨ΩΨ© (Arabic)
- zho δΈζ (Chinese)
- eng English
- swe Svenska (Swedish)
- fas ΩΨ§Ψ±Ψ³Ϋ (Persian)
- kor νκ΅μ΄ (Korean)
- ita Italiano (Italian)
- ebook2audiobook
- Features
- Docker GUI Interface
- Huggingface Space Demo
- Free Google Colab
- Pre-made Audio Demos
- Supported Languages
- Requirements
- Installation Instructions
- Usage
- Fine Tuned TTS models
- Using Docker
- Supported eBook Formats
- Output
- Common Issues
- Special Thanks
- Join Our Server!
- Legacy
- Table of Contents
- π Converts eBooks to text format with Calibre.
- π Splits eBook into chapters for organized audio.
- ποΈ High-quality text-to-speech with Coqui XTTSv2 and Fairseq (and more).
- π£οΈ Optional voice cloning with your own voice file.
- π Supports +1110 languages (English by default). List of Supported languages
- π₯οΈ Designed to run on 4GB RAM.
- Huggingface space is running on free cpu tier so expect very slow or timeout lol, just do not give it giant files is all
- Best to duplicate space or run locally.
Arabic (ar) | Chinese (zh) | English (en) | Spanish (es) |
---|---|---|---|
French (fr) | German (de) | Italian (it) | Portuguese (pt) |
Polish (pl) | Turkish (tr) | Russian (ru) | Dutch (nl) |
Czech (cs) | Japanese (ja) | Hindi (hi) | Bengali (bn) |
Hungarian (hu) | Korean (ko) | Vietnamese (vi) | Swedish (sv) |
Persian (fa) | Yoruba (yo) | Swahili (sw) | Indonesian (id) |
Slovak (sk) | Croatian (hr) | Tamil (ta) | Danish (da) |
- 4gb RAM minimum, 8GB recommended
- Virtualization enabled if running on windows (Docker only)
- CPU (intel, AMD, ARM), GPU (Nvidia, AMD*, Intel*) (Recommended), MPS (Apple Silicon CPU) *available very soon
[!IMPORTANT] Before to post an install or bug issue search carefully to the opened and closed issues TAB
to be sure your issue does not exist already.
[!NOTE] Lacking of any standards structure like what is a chapter, paragraph, preface etc.
you should first remove manually any text you don't want to be converted in audio.
- Clone repo
git clone https://github.com/DrewThomasson/ebook2audiobook.git
-
Run ebook2audiobook:
-
Linux/MacOS
./ebook2audiobook.sh # Run launch script
-
Mac Launcher
Double clickMac Ebook2Audiobook Launcher.command
-
Windows
.\ebook2audiobook.cmd # Run launch script or double click on it
-
Windows Launcher
Double clickebook2audiobook.exe
-
Linux/MacOS
- Open the Web App: Click the URL provided in the terminal to access the web app and convert eBooks.
-
For Public Link:
python app.py --share
(all OS)./ebook2audiobook.sh --share
(Linux/MacOS)ebook2audiobook.cmd --share
(Windows)
[!IMPORTANT] If the script is stopped and run again, you need to refresh your gradio GUI interface
to let the web page reconnect to the new connection socket.
-
Linux/MacOS:
./ebook2audiobook.sh --headless --ebook <path_to_ebook_file> \ --voice [path_to_voice_file] --language [language_code]
-
Windows
.\ebook2audiobook.cmd --headless --ebook <path_to_ebook_file> --voice [path_to_voice_file] --language [language_code]
-
[--ebook]: Path to your eBook file
-
[--voice]: Voice cloning file path (optional)
-
[--language]: Language code in ISO-639-3 (i.e.: ita for italian, eng for english, deu for german...).
Default language is eng and --language is optional for default language set in ./lib/lang.py.
The ISO-639-1 2 letters codes are also supported.
(must be a .zip file containing the mandatory model files. Example for XTTS: config.json, model.pth, vocab.json and ref.wav)
-
Linux/MacOS
./ebook2audiobook.sh --headless --ebook <ebook_file_path> \ --voice <target_voice_file_path> --language <language> --custom_model <custom_model_path>
-
Windows
.\ebook2audiobook.cmd --headless --ebook <ebook_file_path> \ --voice <target_voice_file_path> --language <language> --custom_model <custom_model_path>
-
<custom_model_path>: Path to
model_name.zip
file, which must contain (according to the tts engine) all the mandatory files
(see ./lib/models.py).
-
Linux/MacOS
./ebook2audiobook.sh --help
-
Windows
.\ebook2audiobook.cmd --help
-
Or for all OS
python app.py --help
usage: app.py [-h] [--script_mode SCRIPT_MODE] [--session SESSION] [--share]
[--headless] [--ebook EBOOK] [--ebooks_dir EBOOKS_DIR]
[--language LANGUAGE] [--voice VOICE] [--device {cpu,gpu,mps}]
[--tts_engine {xtts,bark,vits,fairseq,yourtts}]
[--custom_model CUSTOM_MODEL] [--fine_tuned FINE_TUNED]
[--output_format OUTPUT_FORMAT] [--temperature TEMPERATURE]
[--length_penalty LENGTH_PENALTY] [--num_beams NUM_BEAMS]
[--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K] [--top_p TOP_P]
[--speed SPEED] [--enable_text_splitting] [--output_dir OUTPUT_DIR]
[--version]
Convert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the Gradio interface or run the script in headless mode for direct conversion.
options:
-h, --help show this help message and exit
--session SESSION Session to resume the conversion in case of interruption, crash,
or reuse of custom models and custom cloning voices.
**** The following option are for gradio/gui mode only:
Optional
--share Enable a public shareable Gradio link.
**** The following options are for --headless mode only:
--headless Run the script in headless mode
--ebook EBOOK Path to the ebook file for conversion. Cannot be used when --ebooks_dir is present.
--ebooks_dir EBOOKS_DIR
Relative or absolute path of the directory containing the files to convert.
Cannot be used when --ebook is present.
--language LANGUAGE Language of the e-book. Default language is set
in ./lib/lang.py sed as default if not present. All compatible language codes are in ./lib/lang.py
optional parameters:
--voice VOICE (Optional) Path to the voice cloning file for TTS engine.
Uses the default voice if not present.
--device {cpu,gpu,mps}
(Optional) Pprocessor unit type for the conversion.
Default is set in ./lib/conf.py if not present. Fall back to CPU if GPU not available.
--tts_engine {xtts,bark,vits,fairseq,yourtts}
(Optional) Preferred TTS engine (available are: ['xtts', 'bark', 'vits', 'fairseq', 'yourtts'].
Default depends on the selected language. The tts engine should be compatible with the chosen language
--custom_model CUSTOM_MODEL
(Optional) Path to the custom model zip file cntaining mandatory model files.
Please refer to ./lib/models.py
--fine_tuned FINE_TUNED
(Optional) Fine tuned model path. Default is builtin model.
--output_format OUTPUT_FORMAT
(Optional) Output audio format. Default is set in ./lib/conf.py
--temperature TEMPERATURE
(xtts only, optional) Temperature for the model.
Default to config.json model. Higher temperatures lead to more creative outputs.
--length_penalty LENGTH_PENALTY
(xtts only, optional) A length penalty applied to the autoregressive decoder.
Default to config.json model. Not applied to custom models.
--num_beams NUM_BEAMS
(xtts only, optional) Controls how many alternative sequences the model explores. Must be equal or greater than length penalty.
Default to config.json model.
--repetition_penalty REPETITION_PENALTY
(xtts only, optional) A penalty that prevents the autoregressive decoder from repeating itself.
Default to config.json model.
--top_k TOP_K (xtts only, optional) Top-k sampling.
Lower values mean more likely outputs and increased audio generation speed.
Default to config.json model.
--top_p TOP_P (xtts only, optional) Top-p sampling.
Lower values mean more likely outputs and increased audio generation speed. Default to 0.85
--speed SPEED (xtts only, optional) Speed factor for the speech generation.
Default to config.json model.
--enable_text_splitting
(xtts only, optional) Enable TTS text splitting. This option is known to not be very efficient.
Default to config.json model.
--output_dir OUTPUT_DIR
(Optional) Path to the output directory. Default is set in ./lib/conf.py
--version Show the version of the script and exit
Example usage:
Windows:
Gradio/GUI:
ebook2audiobook.cmd
Headless mode:
ebook2audiobook.cmd --headless --ebook '/path/to/file'
Linux/Mac:
Gradio/GUI:
./ebook2audiobook.sh
Headless mode:
./ebook2audiobook.sh --headless --ebook '/path/to/file'
NOTE: in gradio/gui mode, to cancel a running conversion, just click on the [X] from the ebook upload component.
You can also use Docker to run the eBook to Audiobook converter. This method ensures consistency across different environments and simplifies setup.
To run the Docker container and start the Gradio interface, use the following command:
-Run with CPU only
docker run --rm -p 7860:7860 athomasson2/ebook2audiobook
-Run with GPU Speedup (NVIDIA compatible only)
docker run --rm --gpus all -p 7860:7860 athomasson2/ebook2audiobook
- You can build the docker image with the command:
docker build -t athomasson2/ebook2audiobook .
This command will start the Gradio interface on port 7860.(localhost:7860)
- For more options add the parameter
--help
All ebook2audiobooks will have the base dir of /home/user/app/
For example:
tmp
= /home/user/app/tmp
audiobooks
= /home/user/app/audiobooks
first for a docker pull of the latest with
docker pull athomasson2/ebook2audiobook
- Before you do run this you need to create a dir named "input-folder" in your current dir which will be linked, This is where you can put your input files for the docker image to see
mkdir input-folder && mkdir Audiobooks
- In the command below swap out YOUR_INPUT_FILE.TXT with the name of your input file
docker run --rm \
-v $(pwd)/input-folder:/app/input_folder \
-v $(pwd)/audiobooks:/app/audiobooks \
athomasson2/ebook2audiobook \
--headless --ebook /input_folder/YOUR_EBOOK_FILE
- And that should be it!
- The output Audiobooks will be found in the Audiobook folder which will also be located in your local dir you ran this docker command in
docker run --rm athomasson2/ebook2audiobook --help
and that will output this Help command output
This project uses Docker Compose to run locally. You can enable or disable GPU support
by setting either *gpu-enabled
or *gpu-disabled
in docker-compose.yml
-
Clone the Repository (if you haven't already):
git clone https://github.com/DrewThomasson/ebook2audiobook.git cd ebook2audiobook
-
Set GPU Support (disabled by default)
To enable GPU support, modify
docker-compose.yml
and change*gpu-disabled
to*gpu-enabled
-
Start the service:
docker-compose up -d
- Access the service: The service will be available at http://localhost:7860.
Don't have the hardware to run it or you want to rent a GPU?
(Be aware it will time out after a bit of your not messing with the google colab) Free Google Colab
-
python: can't open file '/home/user/app/app.py': [Errno 2] No such file or directory
(Just remove all post arguments as I replaced theCMD
withENTRYPOINT
in the Dockerfile)- Example:
docker run athomasson2/ebook2audiobook app.py --script_mode full_docker
- > corrected - >docker run athomasson2/ebook2audiobook
- Arguments can be easily added like this now
docker run athomasson2/ebook2audiobook --share
- Example:
-
Docker gets stuck downloading Fine-Tuned models. (This does not happen for every computer but some appear to run into this issue) Disabling the progress bar appears to fix the issue, as discussed here in #191 Example of adding this fix in the
docker run
command
docker run --rm --gpus all -e HF_HUB_DISABLE_PROGRESS_BARS=1 -e HF_HUB_ENABLE_HF_TRANSFER=0 \
-p 7860:7860 athomasson2/ebook2audiobook
You can fine-tune your own xtts model easily with this repo xtts-finetune-webui
If you want to rent a GPU easily you can also duplicate this huggingface xtts-finetune-webui-space
A space you can use to de-noise the training data easily also denoise-huggingface-space
To find our collection of already fine-tuned TTS models, visit this Hugging Face link For an XTTS custom model a ref audio clip of the voice reference is mandatory:
Rainy day voice https://github.com/user-attachments/assets/d25034d9-c77f-43a9-8f14-0d167172b080
David Attenborough voice https://github.com/user-attachments/assets/0d437a41-0b0d-48ed-8c9b-02763d5e48ea
-
.epub
,.pdf
,.mobi
,.txt
,.html
,.rtf
,.chm
,.lit
,.pdb
,.fb2
,.odt
,.cbr
,.cbz
,.prc
,.lrf
,.pml
,.snb
,.cbc
,.rb
,.tcr
-
Best results:
.epub
or.mobi
for automatic chapter detection
- Creates a
['m4b', 'm4a', 'mp4', 'webm', 'mov', 'mp3', 'flac', 'wav', 'ogg', 'aac']
(set in ./lib/conf.py) file with metadata and chapters. -
Example
- CPU is slow (better on server smp CPU) while NVIDIA GPU can have almost real time conversion. Discussion about this For faster multilingual generation I would suggest my other project that uses piper-tts instead (It doesn't have zero-shot voice cloning though, and is Siri quality voices, but it is much faster on cpu).
- "I'm having dependency issues" - Just use the docker, its fully self contained and has a headless mode,
add
--help
parameter at the end of the docker run command for more information. - "Im getting a truncated audio issue!" - PLEASE MAKE AN ISSUE OF THIS, we don't speak every language and need advise from users to fine tune the sentence splitting logic.π
- Any help from people speaking any of the supported languages to help with proper sentence splitting methods
- Potentially creating readme Guides for Multiple languages(Becuase the only language I know is English π)
- Coqui TTS: Coqui TTS GitHub
- Calibre: Calibre Website
- FFmpeg: FFmpeg Website
- @shakenbake15 for better chapter saving method
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for ebook2audiobook
Similar Open Source Tools

ebook2audiobook
ebook2audiobook is a CPU/GPU converter tool that converts eBooks to audiobooks with chapters and metadata using tools like Calibre, ffmpeg, XTTSv2, and Fairseq. It supports voice cloning and a wide range of languages. The tool is designed to run on 4GB RAM and provides a new v2.0 Web GUI interface for user-friendly interaction. Users can convert eBooks to text format, split eBooks into chapters, and utilize high-quality text-to-speech functionalities. Supported languages include Arabic, Chinese, English, French, German, Hindi, and many more. The tool can be used for legal, non-DRM eBooks only and should be used responsibly in compliance with applicable laws.

cog
Cog is an open-source tool that lets you package machine learning models in a standard, production-ready container. You can deploy your packaged model to your own infrastructure, or to Replicate.

llm-x
LLM X is a ChatGPT-style UI for the niche group of folks who run Ollama (think of this like an offline chat gpt server) locally. It supports sending and receiving images and text and works offline through PWA (Progressive Web App) standards. The project utilizes React, Typescript, Lodash, Mobx State Tree, Tailwind css, DaisyUI, NextUI, Highlight.js, React Markdown, kbar, Yet Another React Lightbox, Vite, and Vite PWA plugin. It is inspired by ollama-ui's project and Perplexity.ai's UI advancements in the LLM UI space. The project is still under development, but it is already a great way to get started with building your own LLM UI.

obsei
Obsei is an open-source, low-code, AI powered automation tool that consists of an Observer to collect unstructured data from various sources, an Analyzer to analyze the collected data with various AI tasks, and an Informer to send analyzed data to various destinations. The tool is suitable for scheduled jobs or serverless applications as all Observers can store their state in databases. Obsei is still in alpha stage, so caution is advised when using it in production. The tool can be used for social listening, alerting/notification, automatic customer issue creation, extraction of deeper insights from feedbacks, market research, dataset creation for various AI tasks, and more based on creativity.

WebMasterLog
WebMasterLog is a comprehensive repository showcasing various web development projects built with front-end and back-end technologies. It highlights interactive user interfaces, dynamic web applications, and a spectrum of web development solutions. The repository encourages contributions in areas such as adding new projects, improving existing projects, updating documentation, fixing bugs, implementing responsive design, enhancing code readability, and optimizing project functionalities. Contributors are guided to follow specific guidelines for project submissions, including directory naming conventions, README file inclusion, project screenshots, and commit practices. Pull requests are reviewed based on criteria such as proper PR template completion, originality of work, code comments for clarity, and sharing screenshots for frontend updates. The repository also participates in various open-source programs like JWOC, GSSoC, Hacktoberfest, KWOC, 24 Pull Requests, IWOC, SWOC, and DWOC, welcoming valuable contributors.

nyxtext
Nyxtext is a text editor built using Python, featuring Custom Tkinter with the Catppuccin color scheme and glassmorphic design. It follows a modular approach with each element organized into separate files for clarity and maintainability. NyxText is not just a text editor but also an AI-powered desktop application for creatives, developers, and students.

auto-subs
Auto-subs is a tool designed to automatically transcribe editing timelines using OpenAI Whisper and Stable-TS for extreme accuracy. It generates subtitles in a custom style, is completely free, and runs locally within Davinci Resolve. It works on Mac, Linux, and Windows, supporting both Free and Studio versions of Resolve. Users can jump to positions on the timeline using the Subtitle Navigator and translate from any language to English. The tool provides a user-friendly interface for creating and customizing subtitles for video content.

fastserve-ai
FastServe-AI is a machine learning serving tool focused on GenAI & LLMs with simplicity as the top priority. It allows users to easily serve custom models by implementing the 'handle' method for 'FastServe'. The tool provides a FastAPI server for custom models and can be deployed using Lightning AI Studio. Users can install FastServe-AI via pip and run it to serve their own GPT-like LLM models in minutes.

duolingo-clone
Lingo is an interactive platform for language learning that provides a modern UI/UX experience. It offers features like courses, quests, and a shop for users to engage with. The tech stack includes React JS, Next JS, Typescript, Tailwind CSS, Vercel, and Postgresql. Users can contribute to the project by submitting changes via pull requests. The platform utilizes resources from CodeWithAntonio, Kenney Assets, Freesound, Elevenlabs AI, and Flagpack. Key dependencies include @clerk/nextjs, @neondatabase/serverless, @radix-ui/react-avatar, and more. Users can follow the project creator on GitHub and Twitter, as well as subscribe to their YouTube channel for updates. To learn more about Next.js, users can refer to the Next.js documentation and interactive tutorial.

chatbox
Chatbox is a desktop client for ChatGPT, Claude, and other LLMs, providing a user-friendly interface for AI copilot assistance on Windows, Mac, and Linux. It offers features like local data storage, multiple LLM provider support, image generation with Dall-E-3, enhanced prompting, keyboard shortcuts, and more. Users can collaborate, access the tool on various platforms, and enjoy multilingual support. Chatbox is constantly evolving with new features to enhance the user experience.

chatbox
Chatbox is a desktop client for ChatGPT, Claude, and other LLMs, providing features like local data storage, multiple LLM provider support, image generation, enhanced prompting, keyboard shortcuts, and more. It offers a user-friendly interface with dark theme, team collaboration, cross-platform availability, web version access, iOS & Android apps, multilingual support, and ongoing feature enhancements. Developed for prompt and API debugging, it has gained popularity for daily chatting and professional role-playing with AI assistance.

ort
Ort is an unofficial ONNX Runtime 1.17 wrapper for Rust based on the now inactive onnxruntime-rs. ONNX Runtime accelerates ML inference on both CPU and GPU.

agentica
Agentica is a specialized Agentic AI library focused on LLM Function Calling. Users can provide Swagger/OpenAPI documents or TypeScript class types to Agentica for seamless functionality. The library simplifies AI development by handling various tasks effortlessly.

project-blog
Welcome to the Blog Script Project, a collaborative platform for developers and writers to create, manage, and share content. With features like Markdown support, submodule integration, customizable templates, project contribution workflow, global visibility, community discussions, SEO optimization, and role-based dashboard, Blog Script enhances collaboration and visibility for your work. You can contribute by adding new projects, improving existing projects, updating documentation, fixing bugs, optimizing, and ensuring code readability. Follow the contribution guidelines to star the repository, find tasks, fork the repository, make changes, add screenshots, submit a pull request, and contribute to the open-source community. Additionally, you can add your project as a submodule by following the provided guidelines. Join us, contribute, and grow together!

Notate
Notate is a powerful desktop research assistant that combines AI-driven analysis with advanced vector search technology. It streamlines research workflow by processing, organizing, and retrieving information from documents, audio, and text. Notate offers flexible AI capabilities with support for various LLM providers and local models, ensuring data privacy. Built for researchers, academics, and knowledge workers, it features real-time collaboration, accessible UI, and cross-platform compatibility.

DriveLM
DriveLM is a multimodal AI model that enables autonomous driving by combining computer vision and natural language processing. It is designed to understand and respond to complex driving scenarios using visual and textual information. DriveLM can perform various tasks related to driving, such as object detection, lane keeping, and decision-making. It is trained on a massive dataset of images and text, which allows it to learn the relationships between visual cues and driving actions. DriveLM is a powerful tool that can help to improve the safety and efficiency of autonomous vehicles.
For similar tasks

metavoice-src
MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech). It has been built with the following priorities: * Emotional speech rhythm and tone in English. * Zero-shot cloning for American & British voices, with 30s reference audio. * Support for (cross-lingual) voice cloning with finetuning. * We have had success with as little as 1 minute training data for Indian speakers. * Synthesis of arbitrary length text

wunjo.wladradchenko.ru
Wunjo AI is a comprehensive tool that empowers users to explore the realm of speech synthesis, deepfake animations, video-to-video transformations, and more. Its user-friendly interface and privacy-first approach make it accessible to both beginners and professionals alike. With Wunjo AI, you can effortlessly convert text into human-like speech, clone voices from audio files, create multi-dialogues with distinct voice profiles, and perform real-time speech recognition. Additionally, you can animate faces using just one photo combined with audio, swap faces in videos, GIFs, and photos, and even remove unwanted objects or enhance the quality of your deepfakes using the AI Retouch Tool. Wunjo AI is an all-in-one solution for your voice and visual AI needs, offering endless possibilities for creativity and expression.

Pandrator
Pandrator is a GUI tool for generating audiobooks and dubbing using voice cloning and AI. It transforms text, PDF, EPUB, and SRT files into spoken audio in multiple languages. It leverages XTTS, Silero, and VoiceCraft models for text-to-speech conversion and voice cloning, with additional features like LLM-based text preprocessing and NISQA for audio quality evaluation. The tool aims to be user-friendly with a one-click installer and a graphical interface.

ruoyi-ai
ruoyi-ai is a platform built on top of ruoyi-plus to implement AI chat and drawing functionalities on the backend. The project is completely open source and free. The backend management interface uses elementUI, while the server side is built using Java 17 and SpringBoot 3.X. It supports various AI models such as ChatGPT4, Dall-E-3, ChatGPT-4-All, voice cloning based on GPT-SoVITS, GPTS, and MidJourney. Additionally, it supports WeChat mini programs, personal QR code real-time payments, monitoring and AI auto-reply in live streaming rooms like Douyu and Bilibili, and personal WeChat integration with ChatGPT. The platform also includes features like private knowledge base management and provides various demo interfaces for different platforms such as mobile, web, and PC.

viitor-voice
ViiTor-Voice is an LLM based TTS Engine that offers a lightweight design with 0.5B parameters for efficient deployment on various platforms. It provides real-time streaming output with low latency experience, a rich voice library with over 300 voice options, flexible speech rate adjustment, and zero-shot voice cloning capabilities. The tool supports both Chinese and English languages and is suitable for applications requiring quick response and natural speech fluency.

ebook2audiobook
ebook2audiobook is a CPU/GPU converter tool that converts eBooks to audiobooks with chapters and metadata using tools like Calibre, ffmpeg, XTTSv2, and Fairseq. It supports voice cloning and a wide range of languages. The tool is designed to run on 4GB RAM and provides a new v2.0 Web GUI interface for user-friendly interaction. Users can convert eBooks to text format, split eBooks into chapters, and utilize high-quality text-to-speech functionalities. Supported languages include Arabic, Chinese, English, French, German, Hindi, and many more. The tool can be used for legal, non-DRM eBooks only and should be used responsibly in compliance with applicable laws.

HeyGem.ai
Heygem is an open-source, affordable alternative to Heygen, offering a fully offline video synthesis tool for Windows systems. It enables precise appearance and voice cloning, allowing users to digitalize their image and drive virtual avatars through text and voice for video production. With core features like efficient video synthesis and multi-language support, Heygem ensures a user-friendly experience with fully offline operation and support for multiple models. The tool leverages advanced AI algorithms for voice cloning, automatic speech recognition, and computer vision technology to enhance the virtual avatar's performance and synchronization.

MeloTTS
MeloTTS is a high-quality multi-lingual text-to-speech library by MyShell.ai. It supports various languages including English (American, British, Indian, Australian), Spanish, French, Chinese, Japanese, and Korean. The Chinese speaker also supports mixed Chinese and English. The library is fast enough for CPU real-time inference and offers features like using without installation, local installation, and training on custom datasets. The Python API and model cards are available in the repository and on HuggingFace. The community can join the Discord channel for discussions and collaboration opportunities. Contributions are welcome, and the library is under the MIT License. MeloTTS is based on TTS, VITS, VITS2, and Bert-VITS2.
For similar jobs

ebook2audiobook
ebook2audiobook is a CPU/GPU converter tool that converts eBooks to audiobooks with chapters and metadata using tools like Calibre, ffmpeg, XTTSv2, and Fairseq. It supports voice cloning and a wide range of languages. The tool is designed to run on 4GB RAM and provides a new v2.0 Web GUI interface for user-friendly interaction. Users can convert eBooks to text format, split eBooks into chapters, and utilize high-quality text-to-speech functionalities. Supported languages include Arabic, Chinese, English, French, German, Hindi, and many more. The tool can be used for legal, non-DRM eBooks only and should be used responsibly in compliance with applicable laws.

Pandrator
Pandrator is a GUI tool for generating audiobooks and dubbing using voice cloning and AI. It transforms text, PDF, EPUB, and SRT files into spoken audio in multiple languages. It leverages XTTS, Silero, and VoiceCraft models for text-to-speech conversion and voice cloning, with additional features like LLM-based text preprocessing and NISQA for audio quality evaluation. The tool aims to be user-friendly with a one-click installer and a graphical interface.

audiobook-creator
Audiobook Creator is an open-source tool that converts books in various text formats into fully voiced audiobooks with intelligent character voice attribution. It utilizes NLP, LLMs, and TTS technologies to provide an engaging audiobook experience. The project includes components for text cleaning and formatting, character identification, and audiobook generation. Key features include a Gradio UI app, M4B audiobook creation, multi-format support, Docker compatibility, customizable narration, progress tracking, and open-source licensing.

wenxin-starter
WenXin-Starter is a spring-boot-starter for Baidu's "Wenxin Qianfan WENXINWORKSHOP" large model, which can help you quickly access Baidu's AI capabilities. It fully integrates the official API documentation of Wenxin Qianfan. Supports text-to-image generation, built-in dialogue memory, and supports streaming return of dialogue. Supports QPS control of a single model and supports queuing mechanism. Plugins will be added soon.

WebAI-to-API
This project implements a web API that offers a unified interface to Google Gemini and Claude 3. It provides a self-hosted, lightweight, and scalable solution for accessing these AI models through a streaming API. The API supports both Claude and Gemini models, allowing users to interact with them in real-time. The project includes a user-friendly web UI for configuration and documentation, making it easy to get started and explore the capabilities of the API.

openai-chat-api-workflow
**OpenAI Chat API Workflow for Alfred** An Alfred 5 Workflow for using OpenAI Chat API to interact with GPT-3.5/GPT-4 π€π¬ It also allows image generation πΌοΈ, image understanding π, speech-to-text conversion π€, and text-to-speech synthesis π **Features:** * Execute all features using Alfred UI, selected text, or a dedicated web UI * Web UI is constructed by the workflow and runs locally on your Mac π» * API call is made directly between the workflow and OpenAI, ensuring your chat messages are not shared online with anyone other than OpenAI π * OpenAI does not use the data from the API Platform for training π« * Export chat data to a simple JSON format external file π * Continue the chat by importing the exported data later π

BlossomLM
BlossomLM is a series of open-source conversational large language models. This project aims to provide a high-quality general-purpose SFT dataset in both Chinese and English, making fine-tuning accessible while also providing pre-trained model weights. **Hint**: BlossomLM is a personal non-commercial project.

Chinese-LLaMA-Alpaca
This project open sources the **Chinese LLaMA model and the Alpaca large model fine-tuned with instructions**, to further promote the open research of large models in the Chinese NLP community. These models **extend the Chinese vocabulary based on the original LLaMA** and use Chinese data for secondary pre-training, further enhancing the basic Chinese semantic understanding ability. At the same time, the Chinese Alpaca model further uses Chinese instruction data for fine-tuning, significantly improving the model's understanding and execution of instructions.