ebook2audiobook

Convert ebooks to audiobooks with chapters and metadata using dynamic AI models and voice cloning. Supports 1,107+ languages!

Stars: 9221

Visit

ebook2audiobook is a CPU/GPU converter tool that converts eBooks to audiobooks with chapters and metadata using tools like Calibre, ffmpeg, XTTSv2, and Fairseq. It supports voice cloning and a wide range of languages. The tool is designed to run on 4GB RAM and provides a new v2.0 Web GUI interface for user-friendly interaction. Users can convert eBooks to text format, split eBooks into chapters, and utilize high-quality text-to-speech functionalities. Supported languages include Arabic, Chinese, English, French, German, Hindi, and many more. The tool can be used for legal, non-DRM eBooks only and should be used responsibly in compliance with applicable laws.

README:

📚 ebook2audiobook

CPU/GPU Converter from eBooks to audiobooks with chapters and metadata
using XTTSv2, Bark, Vits, Fairseq, YourTTS and more. Supports voice cloning and +1110 languages!

[!IMPORTANT] This tool is intended for use with non-DRM, legally acquired eBooks only.
The authors are not responsible for any misuse of this software or any resulting legal consequences.
Use this tool responsibly and in accordance with all applicable laws.

Thanks to support ebook2audiobook developers!

GUI Interface

Click to see images of Web GUI

README.md

ebook2audiobook
Features
Docker GUI Interface
Huggingface Space Demo
Free Google Colab
Pre-made Audio Demos
Supported Languages
Requirements
Installation Instructions
Usage
Fine Tuned TTS models
- For Collection of Fine-Tuned TTS Models
Using Docker
Supported eBook Formats
Output
Common Issues
Special Thanks
Join Our Server!
Legacy
Table of Contents

Features

📖 Converts eBooks to text format with Calibre.
📚 Splits eBook into chapters for organized audio.
🎙️ High-quality text-to-speech with Coqui XTTSv2 and Fairseq (and more).
🗣️ Optional voice cloning with your own voice file.
🌍 Supports +1110 languages (English by default). List of Supported languages
🖥️ Designed to run on 4GB RAM.

Huggingface space demo

Huggingface space is running on free cpu tier so expect very slow or timeout lol, just do not give it giant files is all
Best to duplicate space or run locally.

Free Google Colab

Supported Languages

Arabic (ar)	Chinese (zh)	English (en)	Spanish (es)
French (fr)	German (de)	Italian (it)	Portuguese (pt)
Polish (pl)	Turkish (tr)	Russian (ru)	Dutch (nl)
Czech (cs)	Japanese (ja)	Hindi (hi)	Bengali (bn)
Hungarian (hu)	Korean (ko)	Vietnamese (vi)	Swedish (sv)
Persian (fa)	Yoruba (yo)	Swahili (sw)	Indonesian (id)
Slovak (sk)	Croatian (hr)	Tamil (ta)	Danish (da)

+1100 languages and dialects here

Hardware Requirements

4gb RAM minimum, 8GB recommended
Virtualization enabled if running on windows (Docker only)
CPU (intel, AMD, ARM), GPU (Nvidia, AMD*, Intel*) (Recommended), MPS (Apple Silicon CPU) *available very soon

[!IMPORTANT] Before to post an install or bug issue search carefully to the opened and closed issues TAB
to be sure your issue does not exist already.

[!NOTE] Lacking of any standards structure like what is a chapter, paragraph, preface etc.
you should first remove manually any text you don't want to be converted in audio.

Installation Instructions

Clone repo

git clone https://github.com/DrewThomasson/ebook2audiobook.git

Launching Gradio Web Interface

Run ebook2audiobook:
- Linux/MacOS
```
./ebook2audiobook.sh  # Run launch script
```
- Mac Launcher
  Double click Mac Ebook2Audiobook Launcher.command
- Windows
```
.\ebook2audiobook.cmd  # Run launch script or double click on it
```
- Windows Launcher
  Double click ebook2audiobook.exe
Open the Web App: Click the URL provided in the terminal to access the web app and convert eBooks.
For Public Link: python app.py --share (all OS) ./ebook2audiobook.sh --share (Linux/MacOS) ebook2audiobook.cmd --share (Windows)

[!IMPORTANT] If the script is stopped and run again, you need to refresh your gradio GUI interface
to let the web page reconnect to the new connection socket.

Basic Usage

Linux/MacOS:

./ebook2audiobook.sh --headless --ebook <path_to_ebook_file> \
    --voice [path_to_voice_file] --language [language_code]

Windows

.\ebook2audiobook.cmd --headless --ebook <path_to_ebook_file>
    --voice [path_to_voice_file] --language [language_code]

[--ebook]: Path to your eBook file
[--voice]: Voice cloning file path (optional)
[--language]: Language code in ISO-639-3 (i.e.: ita for italian, eng for english, deu for german...).
Default language is eng and --language is optional for default language set in ./lib/lang.py.
The ISO-639-1 2 letters codes are also supported.

Example of Custom Model Zip Upload

(must be a .zip file containing the mandatory model files. Example for XTTS: config.json, model.pth, vocab.json and ref.wav)

Linux/MacOS

./ebook2audiobook.sh --headless --ebook <ebook_file_path> \
    --voice <target_voice_file_path> --language <language> --custom_model <custom_model_path>

Windows

.\ebook2audiobook.cmd --headless --ebook <ebook_file_path> \
    --voice <target_voice_file_path> --language <language> --custom_model <custom_model_path>

<custom_model_path>: Path to model_name.zip file, which must contain (according to the tts engine) all the mandatory files
(see ./lib/models.py).

For Detailed Guide with list of all Parameters to use

Linux/MacOS
```
./ebook2audiobook.sh --help
```
Windows
```
.\ebook2audiobook.cmd --help
```
Or for all OS python app.py --help

usage: app.py [-h] [--script_mode SCRIPT_MODE] [--session SESSION] [--share]
              [--headless] [--ebook EBOOK] [--ebooks_dir EBOOKS_DIR]
              [--language LANGUAGE] [--voice VOICE] [--device {cpu,gpu,mps}]
              [--tts_engine {xtts,bark,vits,fairseq,yourtts}]
              [--custom_model CUSTOM_MODEL] [--fine_tuned FINE_TUNED]
              [--output_format OUTPUT_FORMAT] [--temperature TEMPERATURE]
              [--length_penalty LENGTH_PENALTY] [--num_beams NUM_BEAMS]
              [--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K] [--top_p TOP_P]
              [--speed SPEED] [--enable_text_splitting] [--output_dir OUTPUT_DIR]
              [--version]

Convert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the Gradio interface or run the script in headless mode for direct conversion.

options:
  -h, --help            show this help message and exit
  --session SESSION     Session to resume the conversion in case of interruption, crash, 
                            or reuse of custom models and custom cloning voices.

**** The following option are for gradio/gui mode only:
  Optional

  --share               Enable a public shareable Gradio link.

**** The following options are for --headless mode only:
  --headless            Run the script in headless mode
  --ebook EBOOK         Path to the ebook file for conversion. Cannot be used when --ebooks_dir is present.
  --ebooks_dir EBOOKS_DIR
                        Relative or absolute path of the directory containing the files to convert. 
                            Cannot be used when --ebook is present.
  --language LANGUAGE   Language of the e-book. Default language is set 
                            in ./lib/lang.py sed as default if not present. All compatible language codes are in ./lib/lang.py

optional parameters:
  --voice VOICE         (Optional) Path to the voice cloning file for TTS engine. 
                            Uses the default voice if not present.
  --device {cpu,gpu,mps}
                        (Optional) Pprocessor unit type for the conversion. 
                            Default is set in ./lib/conf.py if not present. Fall back to CPU if GPU not available.
  --tts_engine {xtts,bark,vits,fairseq,yourtts}
                        (Optional) Preferred TTS engine (available are: ['xtts', 'bark', 'vits', 'fairseq', 'yourtts'].
                            Default depends on the selected language. The tts engine should be compatible with the chosen language
  --custom_model CUSTOM_MODEL
                        (Optional) Path to the custom model zip file cntaining mandatory model files. 
                            Please refer to ./lib/models.py
  --fine_tuned FINE_TUNED
                        (Optional) Fine tuned model path. Default is builtin model.
  --output_format OUTPUT_FORMAT
                        (Optional) Output audio format. Default is set in ./lib/conf.py
  --temperature TEMPERATURE
                        (xtts only, optional) Temperature for the model. 
                            Default to config.json model. Higher temperatures lead to more creative outputs.
  --length_penalty LENGTH_PENALTY
                        (xtts only, optional) A length penalty applied to the autoregressive decoder. 
                            Default to config.json model. Not applied to custom models.
  --num_beams NUM_BEAMS
                        (xtts only, optional) Controls how many alternative sequences the model explores. Must be equal or greater than length penalty. 
                            Default to config.json model.
  --repetition_penalty REPETITION_PENALTY
                        (xtts only, optional) A penalty that prevents the autoregressive decoder from repeating itself. 
                            Default to config.json model.
  --top_k TOP_K         (xtts only, optional) Top-k sampling. 
                            Lower values mean more likely outputs and increased audio generation speed. 
                            Default to config.json model.
  --top_p TOP_P         (xtts only, optional) Top-p sampling. 
                            Lower values mean more likely outputs and increased audio generation speed. Default to 0.85
  --speed SPEED         (xtts only, optional) Speed factor for the speech generation. 
                            Default to config.json model.
  --enable_text_splitting
                        (xtts only, optional) Enable TTS text splitting. This option is known to not be very efficient. 
                            Default to config.json model.
  --output_dir OUTPUT_DIR
                        (Optional) Path to the output directory. Default is set in ./lib/conf.py
  --version             Show the version of the script and exit

Example usage:    
Windows:
    Gradio/GUI:
    ebook2audiobook.cmd
    Headless mode:
    ebook2audiobook.cmd --headless --ebook '/path/to/file'
Linux/Mac:
    Gradio/GUI:
    ./ebook2audiobook.sh
    Headless mode:
    ./ebook2audiobook.sh --headless --ebook '/path/to/file'

NOTE: in gradio/gui mode, to cancel a running conversion, just click on the [X] from the ebook upload component.

Using Docker

You can also use Docker to run the eBook to Audiobook converter. This method ensures consistency across different environments and simplifies setup.

Running the Docker Container

To run the Docker container and start the Gradio interface, use the following command:

-Run with CPU only

docker run --rm -p 7860:7860 athomasson2/ebook2audiobook

-Run with GPU Speedup (NVIDIA compatible only)

docker run --rm --gpus all -p 7860:7860 athomasson2/ebook2audiobook

Building the Docker Container

You can build the docker image with the command:

docker build -t athomasson2/ebook2audiobook .

This command will start the Gradio interface on port 7860.(localhost:7860)

For more options add the parameter --help

Docker container file locations

All ebook2audiobooks will have the base dir of /home/user/app/ For example: tmp = /home/user/app/tmp audiobooks = /home/user/app/audiobooks

Docker headless guide

first for a docker pull of the latest with

docker pull athomasson2/ebook2audiobook

Before you do run this you need to create a dir named "input-folder" in your current dir which will be linked, This is where you can put your input files for the docker image to see

mkdir input-folder && mkdir Audiobooks

In the command below swap out YOUR_INPUT_FILE.TXT with the name of your input file

docker run --rm \
    -v $(pwd)/input-folder:/app/input_folder \
    -v $(pwd)/audiobooks:/app/audiobooks \
    athomasson2/ebook2audiobook \
    --headless --ebook /input_folder/YOUR_EBOOK_FILE

And that should be it!
The output Audiobooks will be found in the Audiobook folder which will also be located in your local dir you ran this docker command in

To get the help command for the other parameters this program has you can run this

docker run --rm athomasson2/ebook2audiobook --help

and that will output this Help command output

Docker Compose

This project uses Docker Compose to run locally. You can enable or disable GPU support by setting either *gpu-enabled or *gpu-disabled in docker-compose.yml

Steps to Run

Clone the Repository (if you haven't already):

git clone https://github.com/DrewThomasson/ebook2audiobook.git
cd ebook2audiobook

Set GPU Support (disabled by default) To enable GPU support, modify docker-compose.yml and change *gpu-disabled to *gpu-enabled
Start the service:
```
docker-compose up -d
```
Access the service: The service will be available at http://localhost:7860.

Docker GUI Interface

Click to see images of Web GUI

Renting a GPU

Don't have the hardware to run it or you want to rent a GPU?

You can duplicate the hugginface space and rent a gpu for around $0.40 an hour

Huggingface Space Demo

Or you can try using the google colab for free!

(Be aware it will time out after a bit of your not messing with the google colab) Free Google Colab

Common Docker Issues

python: can't open file '/home/user/app/app.py': [Errno 2] No such file or directory (Just remove all post arguments as I replaced the CMD with ENTRYPOINT in the Dockerfile)
- Example: docker run athomasson2/ebook2audiobook app.py --script_mode full_docker - > corrected - > docker run athomasson2/ebook2audiobook
- Arguments can be easily added like this now docker run athomasson2/ebook2audiobook --share
Docker gets stuck downloading Fine-Tuned models. (This does not happen for every computer but some appear to run into this issue) Disabling the progress bar appears to fix the issue, as discussed here in #191 Example of adding this fix in the docker run command

docker run --rm --gpus all -e HF_HUB_DISABLE_PROGRESS_BARS=1 -e HF_HUB_ENABLE_HF_TRANSFER=0 \
    -p 7860:7860 athomasson2/ebook2audiobook

Fine Tuned TTS models

You can fine-tune your own xtts model easily with this repo xtts-finetune-webui

If you want to rent a GPU easily you can also duplicate this huggingface xtts-finetune-webui-space

A space you can use to de-noise the training data easily also denoise-huggingface-space

Fine Tuned TTS Collection

To find our collection of already fine-tuned TTS models, visit this Hugging Face link For an XTTS custom model a ref audio clip of the voice reference is mandatory:

Demos

Rainy day voice https://github.com/user-attachments/assets/d25034d9-c77f-43a9-8f14-0d167172b080

David Attenborough voice https://github.com/user-attachments/assets/0d437a41-0b0d-48ed-8c9b-02763d5e48ea

Supported eBook Formats

.epub, .pdf, .mobi, .txt, .html, .rtf, .chm, .lit, .pdb, .fb2, .odt, .cbr, .cbz, .prc, .lrf, .pml, .snb, .cbc, .rb, .tcr
Best results: .epub or .mobi for automatic chapter detection

Output

Creates a ['m4b', 'm4a', 'mp4', 'webm', 'mov', 'mp3', 'flac', 'wav', 'ogg', 'aac'] (set in ./lib/conf.py) file with metadata and chapters.
Example

Common Issues:

CPU is slow (better on server smp CPU) while NVIDIA GPU can have almost real time conversion. Discussion about this For faster multilingual generation I would suggest my other project that uses piper-tts instead (It doesn't have zero-shot voice cloning though, and is Siri quality voices, but it is much faster on cpu).
"I'm having dependency issues" - Just use the docker, its fully self contained and has a headless mode, add --help parameter at the end of the docker run command for more information.
"Im getting a truncated audio issue!" - PLEASE MAKE AN ISSUE OF THIS, we don't speak every language and need advise from users to fine tune the sentence splitting logic.😊

What I need help with! 🙌

Full list of things can be found here

Any help from people speaking any of the supported languages to help with proper sentence splitting methods
Potentially creating readme Guides for Multiple languages(Becuase the only language I know is English 😔)

Special Thanks

Coqui TTS: Coqui TTS GitHub
Calibre: Calibre Website
FFmpeg: FFmpeg Website
@shakenbake15 for better chapter saving method

Join Our Server!

For Tasks:

Click tags to check more tools for each tasks

convert ebooks split chapters clone voices support languages run on 4gb ram

For Jobs:

audiobook producer voiceover artist ebook publisher language translator audio content creator

Alternative AI tools for ebook2audiobook

Similar Open Source Tools

ebook2audiobook

github

: 9.2k

AiR

AiR is an AI tool built entirely in Rust that delivers blazing speed and efficiency. It features accurate translation and seamless text rewriting to supercharge productivity. AiR is designed to assist non-native speakers by automatically fixing errors and polishing language to sound like a native speaker. The tool is under heavy development with more features on the horizon.

github

: 118

swift-chat

SwiftChat is a fast and responsive AI chat application developed with React Native and powered by Amazon Bedrock. It offers real-time streaming conversations, AI image generation, multimodal support, conversation history management, and cross-platform compatibility across Android, iOS, and macOS. The app supports multiple AI models like Amazon Bedrock, Ollama, DeepSeek, and OpenAI, and features a customizable system prompt assistant. With a minimalist design philosophy and robust privacy protection, SwiftChat delivers a seamless chat experience with various features like rich Markdown support, comprehensive multimodal analysis, creative image suite, and quick access tools. The app prioritizes speed in launch, request, render, and storage, ensuring a fast and efficient user experience. SwiftChat also emphasizes app privacy and security by encrypting API key storage, minimal permission requirements, local-only data storage, and a privacy-first approach.

github

: 343

rkllama

RKLLama is a server and client tool designed for running and interacting with LLM models optimized for Rockchip RK3588(S) and RK3576 platforms. It allows models to run on the NPU, with features such as running models on NPU, partial Ollama API compatibility, pulling models from Huggingface, API REST with documentation, dynamic loading/unloading of models, inference requests with streaming modes, simplified model naming, CPU model auto-detection, and optional debug mode. The tool supports Python 3.8 to 3.12 and has been tested on Orange Pi 5 Pro and Orange Pi 5 Plus with specific OS versions.

github

: 88

probe

Probe is an AI-friendly, fully local, semantic code search tool designed to power the next generation of AI coding assistants. It combines the speed of ripgrep with the code-aware parsing of tree-sitter to deliver precise results with complete code blocks, making it perfect for large codebases and AI-driven development workflows. Probe is fully local, keeping code on the user's machine without relying on external APIs. It supports multiple languages, offers various search options, and can be used in CLI mode, MCP server mode, AI chat mode, and web interface. The tool is designed to be flexible, fast, and accurate, providing developers and AI models with full context and relevant code blocks for efficient code exploration and understanding.

github

: 110

farfalle

Farfalle is an open-source AI-powered search engine that allows users to run their own local LLM or utilize the cloud. It provides a tech stack including Next.js for frontend, FastAPI for backend, Tavily for search API, Logfire for logging, and Redis for rate limiting. Users can get started by setting up prerequisites like Docker and Ollama, and obtaining API keys for Tavily, OpenAI, and Groq. The tool supports models like llama3, mistral, and gemma. Users can clone the repository, set environment variables, run containers using Docker Compose, and deploy the backend and frontend using services like Render and Vercel.

github

: 2.1k

cog

Cog is an open-source tool that lets you package machine learning models in a standard, production-ready container. You can deploy your packaged model to your own infrastructure, or to Replicate.

github

: 8.5k

llm-x

LLM X is a ChatGPT-style UI for the niche group of folks who run Ollama (think of this like an offline chat gpt server) locally. It supports sending and receiving images and text and works offline through PWA (Progressive Web App) standards. The project utilizes React, Typescript, Lodash, Mobx State Tree, Tailwind css, DaisyUI, NextUI, Highlight.js, React Markdown, kbar, Yet Another React Lightbox, Vite, and Vite PWA plugin. It is inspired by ollama-ui's project and Perplexity.ai's UI advancements in the LLM UI space. The project is still under development, but it is already a great way to get started with building your own LLM UI.

github

: 113

Visionatrix

Visionatrix is a project aimed at providing easy use of ComfyUI workflows. It offers simplified setup and update processes, a minimalistic UI for daily workflow use, stable workflows with versioning and update support, scalability for multiple instances and task workers, multiple user support with integration of different user backends, LLM power for integration with Ollama/Gemini, and seamless integration as a service with backend endpoints and webhook support. The project is approaching version 1.0 release and welcomes new ideas for further implementation.

github

: 122

Zero

Zero is an open-source AI email solution that allows users to self-host their email app while integrating external services like Gmail. It aims to modernize and enhance emails through AI agents, offering features like open-source transparency, AI-driven enhancements, data privacy, self-hosting freedom, unified inbox, customizable UI, and developer-friendly extensibility. Built with modern technologies, Zero provides a reliable tech stack including Next.js, React, TypeScript, TailwindCSS, Node.js, Drizzle ORM, and PostgreSQL. Users can set up Zero using standard setup or Dev Container setup for VS Code users, with detailed environment setup instructions for Better Auth, Google OAuth, and optional GitHub OAuth. Database setup involves starting a local PostgreSQL instance, setting up database connection, and executing database commands for dependencies, tables, migrations, and content viewing.

github

: 4.8k

crawl4ai

Crawl4AI is a powerful and free web crawling service that extracts valuable data from websites and provides LLM-friendly output formats. It supports crawling multiple URLs simultaneously, replaces media tags with ALT, and is completely free to use and open-source. Users can integrate Crawl4AI into Python projects as a library or run it as a standalone local server. The tool allows users to crawl and extract data from specified URLs using different providers and models, with options to include raw HTML content, force fresh crawls, and extract meaningful text blocks. Configuration settings can be adjusted in the `crawler/config.py` file to customize providers, API keys, chunk processing, and word thresholds. Contributions to Crawl4AI are welcome from the open-source community to enhance its value for AI enthusiasts and developers.

github

: 37.5k

CrewAI-Studio

CrewAI Studio is an application with a user-friendly interface for interacting with CrewAI, offering support for multiple platforms and various backend providers. It allows users to run crews in the background, export single-page apps, and use custom tools for APIs and file writing. The roadmap includes features like better import/export, human input, chat functionality, automatic crew creation, and multiuser environment support.

github

: 682

WebMasterLog

WebMasterLog is a comprehensive repository showcasing various web development projects built with front-end and back-end technologies. It highlights interactive user interfaces, dynamic web applications, and a spectrum of web development solutions. The repository encourages contributions in areas such as adding new projects, improving existing projects, updating documentation, fixing bugs, implementing responsive design, enhancing code readability, and optimizing project functionalities. Contributors are guided to follow specific guidelines for project submissions, including directory naming conventions, README file inclusion, project screenshots, and commit practices. Pull requests are reviewed based on criteria such as proper PR template completion, originality of work, code comments for clarity, and sharing screenshots for frontend updates. The repository also participates in various open-source programs like JWOC, GSSoC, Hacktoberfest, KWOC, 24 Pull Requests, IWOC, SWOC, and DWOC, welcoming valuable contributors.

github

: 111

nyxtext

Nyxtext is a text editor built using Python, featuring Custom Tkinter with the Catppuccin color scheme and glassmorphic design. It follows a modular approach with each element organized into separate files for clarity and maintainability. NyxText is not just a text editor but also an AI-powered desktop application for creatives, developers, and students.

github

: 64

payload-ai

The Payload AI Plugin is an advanced extension that integrates modern AI capabilities into your Payload CMS, streamlining content creation and management. It offers features like text generation, voice and image generation, field-level prompt customization, prompt editor, document analyzer, fact checking, automated content workflows, internationalization support, editor AI suggestions, and AI chat support. Users can personalize and configure the plugin by setting environment variables. The plugin is actively developed and tested with Payload version v3.2.1, with regular updates expected.

github

: 163

Learn_Prompting

Learn Prompting is a platform offering free resources, courses, and webinars to master prompt engineering and generative AI. It provides a Prompt Engineering Guide, courses on Generative AI, workshops, and the HackAPrompt competition. The platform also offers AI Red Teaming and AI Safety courses, research reports on prompting techniques, and welcomes contributions in various forms such as content suggestions, translations, artwork, and typo fixes. Users can locally develop the website using Visual Studio Code, Git, and Node.js, and run it in development mode to preview changes.

github

: 4.3k

For similar tasks

metavoice-src

MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech). It has been built with the following priorities: * Emotional speech rhythm and tone in English. * Zero-shot cloning for American & British voices, with 30s reference audio. * Support for (cross-lingual) voice cloning with finetuning. * We have had success with as little as 1 minute training data for Indian speakers. * Synthesis of arbitrary length text

github

: 3.1k

wunjo.wladradchenko.ru

Wunjo AI is a comprehensive tool that empowers users to explore the realm of speech synthesis, deepfake animations, video-to-video transformations, and more. Its user-friendly interface and privacy-first approach make it accessible to both beginners and professionals alike. With Wunjo AI, you can effortlessly convert text into human-like speech, clone voices from audio files, create multi-dialogues with distinct voice profiles, and perform real-time speech recognition. Additionally, you can animate faces using just one photo combined with audio, swap faces in videos, GIFs, and photos, and even remove unwanted objects or enhance the quality of your deepfakes using the AI Retouch Tool. Wunjo AI is an all-in-one solution for your voice and visual AI needs, offering endless possibilities for creativity and expression.

github

: 820

Pandrator

Pandrator is a GUI tool for generating audiobooks and dubbing using voice cloning and AI. It transforms text, PDF, EPUB, and SRT files into spoken audio in multiple languages. It leverages XTTS, Silero, and VoiceCraft models for text-to-speech conversion and voice cloning, with additional features like LLM-based text preprocessing and NISQA for audio quality evaluation. The tool aims to be user-friendly with a one-click installer and a graphical interface.

github

: 429

ruoyi-ai

ruoyi-ai is a platform built on top of ruoyi-plus to implement AI chat and drawing functionalities on the backend. The project is completely open source and free. The backend management interface uses elementUI, while the server side is built using Java 17 and SpringBoot 3.X. It supports various AI models such as ChatGPT4, Dall-E-3, ChatGPT-4-All, voice cloning based on GPT-SoVITS, GPTS, and MidJourney. Additionally, it supports WeChat mini programs, personal QR code real-time payments, monitoring and AI auto-reply in live streaming rooms like Douyu and Bilibili, and personal WeChat integration with ChatGPT. The platform also includes features like private knowledge base management and provides various demo interfaces for different platforms such as mobile, web, and PC.

github

: 2.1k

viitor-voice

ViiTor-Voice is an LLM based TTS Engine that offers a lightweight design with 0.5B parameters for efficient deployment on various platforms. It provides real-time streaming output with low latency experience, a rich voice library with over 300 voice options, flexible speech rate adjustment, and zero-shot voice cloning capabilities. The tool supports both Chinese and English languages and is suitable for applications requiring quick response and natural speech fluency.

github

: 60

ebook2audiobook

github

: 9.2k

HeyGem.ai

Heygem is an open-source, affordable alternative to Heygen, offering a fully offline video synthesis tool for Windows systems. It enables precise appearance and voice cloning, allowing users to digitalize their image and drive virtual avatars through text and voice for video production. With core features like efficient video synthesis and multi-language support, Heygem ensures a user-friendly experience with fully offline operation and support for multiple models. The tool leverages advanced AI algorithms for voice cloning, automatic speech recognition, and computer vision technology to enhance the virtual avatar's performance and synchronization.

github

: 2.5k

MeloTTS

MeloTTS is a high-quality multi-lingual text-to-speech library by MyShell.ai. It supports various languages including English (American, British, Indian, Australian), Spanish, French, Chinese, Japanese, and Korean. The Chinese speaker also supports mixed Chinese and English. The library is fast enough for CPU real-time inference and offers features like using without installation, local installation, and training on custom datasets. The Python API and model cards are available in the repository and on HuggingFace. The community can join the Discord channel for discussions and collaboration opportunities. Contributions are welcome, and the library is under the MIT License. MeloTTS is based on TTS, VITS, VITS2, and Bert-VITS2.

github

: 4.2k

For similar jobs

ebook2audiobook

github

: 9.2k

Pandrator

github

: 429

audiobook-creator

Audiobook Creator is an open-source tool that converts books in various text formats into fully voiced audiobooks with intelligent character voice attribution. It utilizes NLP, LLMs, and TTS technologies to provide an engaging audiobook experience. The project includes components for text cleaning and formatting, character identification, and audiobook generation. Key features include a Gradio UI app, M4B audiobook creation, multi-format support, Docker compatibility, customizable narration, progress tracking, and open-source licensing.

github

: 211

wenxin-starter

WenXin-Starter is a spring-boot-starter for Baidu's "Wenxin Qianfan WENXINWORKSHOP" large model, which can help you quickly access Baidu's AI capabilities. It fully integrates the official API documentation of Wenxin Qianfan. Supports text-to-image generation, built-in dialogue memory, and supports streaming return of dialogue. Supports QPS control of a single model and supports queuing mechanism. Plugins will be added soon.

github

: 207

WebAI-to-API

This project implements a web API that offers a unified interface to Google Gemini and Claude 3. It provides a self-hosted, lightweight, and scalable solution for accessing these AI models through a streaming API. The API supports both Claude and Gemini models, allowing users to interact with them in real-time. The project includes a user-friendly web UI for configuration and documentation, making it easy to get started and explore the capabilities of the API.

github

: 304

openai-chat-api-workflow

**OpenAI Chat API Workflow for Alfred** An Alfred 5 Workflow for using OpenAI Chat API to interact with GPT-3.5/GPT-4 🤖💬 It also allows image generation 🖼️, image understanding 👀, speech-to-text conversion 🎤, and text-to-speech synthesis 🔈 **Features:** * Execute all features using Alfred UI, selected text, or a dedicated web UI * Web UI is constructed by the workflow and runs locally on your Mac 💻 * API call is made directly between the workflow and OpenAI, ensuring your chat messages are not shared online with anyone other than OpenAI 🔒 * OpenAI does not use the data from the API Platform for training 🚫 * Export chat data to a simple JSON format external file 📄 * Continue the chat by importing the exported data later 🔄

github

: 308

BlossomLM

BlossomLM is a series of open-source conversational large language models. This project aims to provide a high-quality general-purpose SFT dataset in both Chinese and English, making fine-tuning accessible while also providing pre-trained model weights. **Hint**: BlossomLM is a personal non-commercial project.

github

: 55

Chinese-LLaMA-Alpaca

This project open sources the **Chinese LLaMA model and the Alpaca large model fine-tuned with instructions**, to further promote the open research of large models in the Chinese NLP community. These models **extend the Chinese vocabulary based on the original LLaMA** and use Chinese data for secondary pre-training, further enhancing the basic Chinese semantic understanding ability. At the same time, the Chinese Alpaca model further uses Chinese instruction data for fine-tuning, significantly improving the model's understanding and execution of instructions.

github

: 17.2k

ebook2audiobook

README:

📚 ebook2audiobook

GUI Interface

README.md

Table of Contents

Features

Huggingface space demo

Free Google Colab

Supported Languages

Hardware Requirements

Installation Instructions

Launching Gradio Web Interface

Basic Usage

Example of Custom Model Zip Upload

For Detailed Guide with list of all Parameters to use

Using Docker

Running the Docker Container

Building the Docker Container

Docker container file locations

Docker headless guide

To get the help command for the other parameters this program has you can run this

Docker Compose

Steps to Run

Docker GUI Interface

Renting a GPU

You can duplicate the hugginface space and rent a gpu for around $0.40 an hour

Or you can try using the google colab for free!

Common Docker Issues

Fine Tuned TTS models

Fine Tuned TTS Collection

Demos

Supported eBook Formats

Output

Common Issues:

What I need help with! 🙌

Full list of things can be found here

Special Thanks

Join Our Server!

For Tasks:

For Jobs:

Alternative AI tools for ebook2audiobook

Similar Open Source Tools

ebook2audiobook

AiR

swift-chat

rkllama

probe

farfalle

cog

llm-x

Visionatrix

Zero

crawl4ai

CrewAI-Studio

WebMasterLog

nyxtext

payload-ai

Learn_Prompting

For similar tasks

metavoice-src

wunjo.wladradchenko.ru

Pandrator

ruoyi-ai

viitor-voice

ebook2audiobook

HeyGem.ai

MeloTTS

For similar jobs

ebook2audiobook

Pandrator

audiobook-creator

wenxin-starter

WebAI-to-API

openai-chat-api-workflow

BlossomLM

Chinese-LLaMA-Alpaca