transcribe-anything

Input a local file or url and this service will transcribe it using Whisper AI. Completely private and Free 🤯🤯🤯

Stars: 621

Visit

Transcribe-anything is a front-end app that utilizes Whisper AI for transcription tasks. It offers an easy installation process via pip and supports GPU acceleration for faster processing. The tool can transcribe local files or URLs from platforms like YouTube into subtitle files and raw text. It is known for its state-of-the-art translation service, ensuring privacy by keeping data local. Notably, it can generate a 'speaker.json' file when using the 'insane' backend, allowing speaker-assigned text de-chunkification. The tool also provides options for language translation and embedding subtitles into videos.

README:

transcribe-anything

USES WHISPER AI

Over 300+⭐'s because this program this app just works! This whisper front-end app is the only one to generate a speaker.json file which partitions the conversation by who doing the speaking.

Easiest whisper implementation to install and use. Just install with pip install transcribe-anything. GPU acceleration is automatic, using the blazingly fast insanely-fast-whisper as the backend for --device insane. This is the only tool to optionally produces a speaker.json file, representing speaker-assigned text that has been de-chunkified.

Hardware acceleration on Windows/Linux/MacOS Arm (M1, M2, +) via --device insane

Input a local file or youtube/rumble url and this tool will transcribe it using Whisper AI into subtitle files and raw text.

Uses whisper AI so this is state of the art translation service - completely free. 🤯🤯🤯

Your data stays private and is not uploaded to any service.

The new version now has state of the art speed in transcriptions, thanks to the new backend --device insane, as well as producing a speaker.json file.

pip install transcribe-anything
# slow cpu mode, works everywhere
transcribe-anything https://www.youtube.com/watch?v=dQw4w9WgXcQ
# insanely fast using the insanely-fast-whisper backend.
transcribe-anything https://www.youtube.com/watch?v=dQw4w9WgXcQ --device insane
# translate from any language to english
transcribe-anything https://www.youtube.com/watch?v=dQw4w9WgXcQ --device insane --task translate

Insanely fast on `cuda` platforms

If you pass in --device insane on a cuda platform then this tool will use this state of the art version of whisper: https://github.com/Vaibhavs10/insanely-fast-whisper, which is MUCH faster and has a pipeline for speaker identification (diarization) using the --hf_token option.

Also note, insanely-fast-whisper (--device insane) included in this project has been fixed to work with python 3.11. The upstream version is still broken on python 3.11 as of 1/22/2024.

Speaker.json

When diarization is enabled via --hf_token (hugging face token) then the output json will contain speaker info labeled as SPEAKER_00, SPEAKER_01 etc. For licensing agreement reasons, you must get your own hugging face token if you want to enable this feature. Also there is an additional step to agree to the user policies for the pyannote.audio located here: https://huggingface.co/pyannote/segmentation-3.0. If you don't do this then you'll see runtime exceptions from pyannote when the --hf_token is used.

What's special to this app is that we also generate a speaker.json which is a de-chunkified version of the output json speaker section.

speaker.json example:

[
  {
    "speaker": "SPEAKER_00",
    "timestamp": [
      0.0,
      7.44
    ],
    "text": "for that. But welcome, Zach Vorhees. Great to have you back on. Thank you, Matt. Craving me back onto your show. Man, we got a lot to talk about.",
    "reason": "beginning"
  },
  {
    "speaker": "SPEAKER_01",
    "timestamp": [
      7.44,
      33.52
    ],
    "text": "Oh, we do. 2023 was the year that OpenAI released, you know, chat GPT-4, which I think most people would say has surpassed average human intelligence, at least in test taking, perhaps not in, you know, reasoning and things like that. But it was a major year for AI. I think that most people are behind the curve on this. What's your take of what just happened in the last 12 months and what it means for the future of human cognition versus machine cognition?",
    "reason": "speaker-switch"
  },
  {
    "speaker": "SPEAKER_00",
    "timestamp": [
      33.52,
      44.08
    ],
    "text": "Yeah. Well, you know, at the beginning of 2023, we had a pretty weak AI system, which was a chat GPT 3.5 turbo was the best that we had. And then between the beginning of last",
    "reason": "speaker-switch"
  }
]

Note that speaker.json is only generated when using --device insane and not for --device cuda nor --device cpu.

`cuda` vs `insane`

Insane mode eats up a lot of memory and it's common to get out of memory errors while transcribing. For example a 3060 12GB nividia card produced out of memory errors are common for big content. If you experience this then pass in --batch-size 8 or smaller. Note that any arguments not recognized by transcribe-anything are passed onto the backend transcriber.

Also, please don't use distil-whisper/distil-large-v2, it produces extremely bad stuttering and it's not entirely clear why this is. I've had to switch it out of production environments because it's so bad. It's also non-deterministic so I think that somehow a fallback non-zero temperature is being used, which produces these stutterings.

cuda is the original AI model supplied by openai. It's more stable but MUCH slower. It also won't produce a speaker.json file which looks like this:

--embed. This app will optionally embed subtitles directly "burned" into an output video.

Install

This front end app for whisper boasts the easiest install in the whisper ecosystem thanks to isolated-environment. You can simply install it with pip, like this:

pip install transcribe-anything

GPU Acceleration

GPU acceleration will be automatically enabled for windows and linux. Mac users are stuck with --device cpu mode. But it's possible that --device insane and --model mps on Mac M1+ will work, but this has been completely untested.

Usage

 transcribe-anything https://www.youtube.com/watch?v=dQw4w9WgXcQ

Will output:

Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:27.000]  We're no strangers to love, you know the rules, and so do I
[00:27.000 --> 00:31.000]  I've built commitments while I'm thinking of
[00:31.000 --> 00:35.000]  You wouldn't get this from any other guy
[00:35.000 --> 00:40.000]  I just wanna tell you how I'm feeling
[00:40.000 --> 00:43.000]  Gotta make you understand
[00:43.000 --> 00:45.000]  Never gonna give you up
[00:45.000 --> 00:47.000]  Never gonna let you down
[00:47.000 --> 00:51.000]  Never gonna run around and desert you
[00:51.000 --> 00:53.000]  Never gonna make you cry
[00:53.000 --> 00:55.000]  Never gonna say goodbye
[00:55.000 --> 00:58.000]  Never gonna tell a lie
[00:58.000 --> 01:00.000]  And hurt you
[01:00.000 --> 01:04.000]  We've known each other for so long
[01:04.000 --> 01:09.000]  Your heart's been aching but you're too shy to say it
[01:09.000 --> 01:13.000]  Inside we both know what's been going on
[01:13.000 --> 01:17.000]  We know the game and we're gonna play it
[01:17.000 --> 01:22.000]  And if you ask me how I'm feeling
[01:22.000 --> 01:25.000]  Don't tell me you're too much to see
[01:25.000 --> 01:27.000]  Never gonna give you up
[01:27.000 --> 01:29.000]  Never gonna let you down
[01:29.000 --> 01:33.000]  Never gonna run around and desert you
[01:33.000 --> 01:35.000]  Never gonna make you cry
[01:35.000 --> 01:38.000]  Never gonna say goodbye
[01:38.000 --> 01:40.000]  Never gonna tell a lie
[01:40.000 --> 01:42.000]  And hurt you
[01:42.000 --> 01:44.000]  Never gonna give you up
[01:44.000 --> 01:46.000]  Never gonna let you down
[01:46.000 --> 01:50.000]  Never gonna run around and desert you
[01:50.000 --> 01:52.000]  Never gonna make you cry
[01:52.000 --> 01:54.000]  Never gonna say goodbye
[01:54.000 --> 01:57.000]  Never gonna tell a lie
[01:57.000 --> 01:59.000]  And hurt you
[02:08.000 --> 02:10.000]  Never gonna give
[02:12.000 --> 02:14.000]  Never gonna give
[02:16.000 --> 02:19.000]  We've known each other for so long
[02:19.000 --> 02:24.000]  Your heart's been aching but you're too shy to say it
[02:24.000 --> 02:28.000]  Inside we both know what's been going on
[02:28.000 --> 02:32.000]  We know the game and we're gonna play it
[02:32.000 --> 02:37.000]  I just wanna tell you how I'm feeling
[02:37.000 --> 02:40.000]  Gotta make you understand
[02:40.000 --> 02:42.000]  Never gonna give you up
[02:42.000 --> 02:44.000]  Never gonna let you down
[02:44.000 --> 02:48.000]  Never gonna run around and desert you
[02:48.000 --> 02:50.000]  Never gonna make you cry
[02:50.000 --> 02:53.000]  Never gonna say goodbye
[02:53.000 --> 02:55.000]  Never gonna tell a lie
[02:55.000 --> 02:57.000]  And hurt you
[02:57.000 --> 02:59.000]  Never gonna give you up
[02:59.000 --> 03:01.000]  Never gonna let you down
[03:01.000 --> 03:05.000]  Never gonna run around and desert you
[03:05.000 --> 03:08.000]  Never gonna make you cry
[03:08.000 --> 03:10.000]  Never gonna say goodbye
[03:10.000 --> 03:12.000]  Never gonna tell a lie
[03:12.000 --> 03:14.000]  And hurt you
[03:14.000 --> 03:16.000]  Never gonna give you up
[03:16.000 --> 03:23.000]  If you want, never gonna let you down Never gonna run around and desert you
[03:23.000 --> 03:28.000]  Never gonna make you hide Never gonna say goodbye
[03:28.000 --> 03:42.000]  Never gonna tell you I ain't ready

Api

from transcribe_anything.api import transcribe

transcribe(
    url_or_file="https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    output_dir="output_dir",
)

Develop

Works for Ubuntu/MacOS/Win32(in git-bash) This will create a virtual environment

> cd transcribe_anything
> ./install.sh
# Enter the environment:
> source activate.sh

The environment is now active and the next step will only install to the local python. If the terminal is closed then to get back into the environment cd transcribe_anything and execute source activate.sh

Required: Install to current python environment

pip install transcribe-anything
- The command transcribe_anything will magically become available.
transcribe_anything <YOUTUBE_URL>

Tech Stack

OpenAI whisper
insanely-fast-whisper
yt-dlp: https://github.com/yt-dlp/yt-dlp
static-ffmpeg
- github: https://github.com/zackees/static_ffmpeg
- pypi: https://pypi.org/project/static-ffmpeg/

Testing

Every commit is tested for standard linters and a batch of unit tests.

Versions

2.7.39: Fix --hf-token usage for insanely fast whisper backend.
2.7.37: Fixed breakage due to numpy 2.0 being released.
2.7.36: Fixed some ffmpeg dependencies.
2.7.35: All ffmpeg commands are now static_ffmpeg commands. Fixes issue.
2.7.34: Various fixes.
2.7.33: Fixes linux
2.7.32: Fixes mac m1 and m2.
2.7.31: Adds a warning if using python 3.12, which isn't supported yet in the backend.
2.7.30: adds --query-gpu-json-path
2.7.29: Made to json -> srt more robust for --device insane, bad entries will be skipped but warn.
2.7.28: Fixes bad title fetching with weird characters.
2.7.27: pytorch-audio upgrades broke this package. Upgrade to latest version to resolve.
2.7.26: Add model option distil-whisper/distil-large-v2
2.7.25: Windows (Linux/MacOS) bug with --device insane and python 3.11 installing wrong insanely-fast-whisper version.
2.7.22: Fixes transcribe-anything on Linux.
2.7.21: Tested that Mac Arm can run --device insane. Added tests to ensure this.
2.7.20: Fixes wrong type being returned when speaker.json happens to be empty.
2.7.19: speaker.json is now in plain json format instead of json5 format
2.7.18: Fixes tests
2.7.17: Fixes speaker.json nesting.
2.7.16: Adds --save_hf_token
2.7.15: Fixes 2.7.14 breakage.
2.7.14: (Broken) Now generates speaker.json when diarization is enabled.
2.7.13: Default diarization model is now pyannote/speaker-diarization-3.1
2.7.12: Adds srt_swap for line breaks and improved isolated_environment usage.
2.7.11: --device insane now generates a *.vtt translation file
2.7.10: Better support for namespaced models. Trims text output in output json. Output json is now formatted with indents. SRT file is now printed out for --device insane
2.7.9: All SRT translation errors fixed for --device insane. All tests pass.
2.7.8: During error of --device insane, write out the error.json file into the destination.
2.7.7: Better error messages during failure.
2.7.6: Improved generation of out.txt, removes linebreaks.
2.7.5: --device insane now generates better conforming srt files.
2.7.3: Various fixes for the insane mode backend.
2.7.0: Introduces an insanely-fast-whisper, enable by using --device insane
2.6.0: GPU acceleration now happens automatically on Windows thanks to isolated-environment. This will also prevent interference with different versions of torch for other AI tools.
2.5.0: --model large now aliases to --model large-v3. Use --model large-legacy to use original large model.
2.4.0: pytorch updated to 2.1.2, gpu install script updated to same + cuda version is now 121.
2.3.9: Fallback to cpu device if gpu device is not compatible.
2.3.8: Fix --models arg which
2.3.7: Critical fix: fixes dependency breakage with open-ai. Fixes windows use of embedded tool.
2.3.6: Fixes typo in readme for installation instructions.
2.3.5: Now has --embed to burn the subtitles into the video itself. Only works on local mp4 files at the moment.
2.3.4: Removed out.mp3 and instead use a temporary wav file, as that is faster to process. --no-keep-audio has now been removed.
2.3.3: Fix case where there spaces in name (happens on windows)
2.3.2: Fix windows transcoding error
2.3.1: static-ffmpeg >= 2.5 now specified
2.3.0: Now uses the official version of whisper ai
2.2.1: "test_" is now prepended to all the different output folder names.
2.2.0: Now explictly setting a language will put the file in a folder with that language name, allowing multi language passes without overwriting.
2.1.2: yt-dlp pinned to new minimum version. Fixes downloading issues from old lib. Adds audio normalization by default.
2.1.1: Updates keywords for easier pypi finding.
2.1.0: Unknown args are now assumed to be for whisper and passed to it as-is. Fixes https://github.com/zackees/transcribe-anything/issues/3
2.0.13: Now works with python 3.9
2.0.12: Adds --device to argument parameters. This will default to CUDA if available, else CPU.
2.0.11: Automatically deletes files in the out directory if they already exist.
2.0.10: fixes local file issue https://github.com/zackees/transcribe-anything/issues/2
2.0.9: fixes sanitization of path names for some youtube videos
2.0.8: fix --output_dir not being respected.
2.0.7: install_cuda.sh -> install_cuda.py
2.0.6: Fixes twitter video fetching. --keep-audio -> --no-keep-audio
2.0.5: Fix bad filename on trailing urls ending with /, adds --keep-audio
2.0.3: GPU support is now added. Run the install_cuda.sh script to enable.
2.0.2: Minor cleanup of file names (no more out.mp3.txt, it's now out.txt)
2.0.1: Fixes missing dependencies and adds whisper option.
2.0.0: New! Now a front end for Whisper ai!

Notes:

Insanely Fast whisper for GPU
- https://github.com/Vaibhavs10/insanely-fast-whisper
Fast Whisper for CPU
- https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file
A better whisper CLI that supports more options but has a manual install.
- https://github.com/ochen1/insanely-fast-whisper-cli/blob/main/requirements.txt
Subtitles translator:
- https://github.com/TDHM/Subtitles-Translator
Forum post on how to avoid stuttering
- https://community.openai.com/t/how-to-avoid-hallucinations-in-whisper-transcriptions/125300/23
More stable transcriptions:
- https://github.com/jianfch/stable-ts?tab=readme-ov-file

For Tasks:

Click tags to check more tools for each tasks

transcribe video translate language embed subtitles generate speaker.json accelerate transcription

For Jobs:

transcription specialist ai engineer data scientist content creator video editor

Alternative AI tools for transcribe-anything

Similar Open Source Tools

transcribe-anything

github

: 621

fish-ai

fish-ai is a tool that adds AI functionality to Fish shell. It can be integrated with various AI providers like OpenAI, Azure OpenAI, Google, Hugging Face, Mistral, or a self-hosted LLM. Users can transform comments into commands, autocomplete commands, and suggest fixes. The tool allows customization through configuration files and supports switching between contexts. Data privacy is maintained by redacting sensitive information before submission to the AI models. Development features include debug logging, testing, and creating releases.

github

: 218

comfy-cli

comfy-cli is a command line tool designed to simplify the installation and management of ComfyUI, an open-source machine learning framework. It allows users to easily set up ComfyUI, install packages, manage custom nodes, download checkpoints, and ensure cross-platform compatibility. The tool provides comprehensive documentation and examples to aid users in utilizing ComfyUI efficiently.

github

: 64

kwaak

Kwaak is a tool that allows users to run a team of autonomous AI agents locally from their own machine. It enables users to write code, improve test coverage, update documentation, and enhance code quality while focusing on building innovative projects. Kwaak is designed to run multiple agents in parallel, interact with codebases, answer questions about code, find examples, write and execute code, create pull requests, and more. It is free and open-source, allowing users to bring their own API keys or models via Ollama. Kwaak is part of the bosun.ai project, aiming to be a platform for autonomous code improvement.

github

: 190

ezlocalai

ezlocalai is an artificial intelligence server that simplifies running multimodal AI models locally. It handles model downloading and server configuration based on hardware specs. It offers OpenAI Style endpoints for integration, voice cloning, text-to-speech, voice-to-text, and offline image generation. Users can modify environment variables for customization. Supports NVIDIA GPU and CPU setups. Provides demo UI and workflow visualization for easy usage.

github

: 67

comfy-cli

Comfy-cli is a command line tool designed to facilitate the installation and management of ComfyUI, an open-source machine learning framework. Users can easily set up ComfyUI, install packages, and manage custom nodes directly from the terminal. The tool offers features such as easy installation, seamless package management, custom node management, checkpoint downloads, cross-platform compatibility, and comprehensive documentation. Comfy-cli simplifies the process of working with ComfyUI, making it convenient for users to handle various tasks related to the framework.

github

: 214

abliteration

Abliteration is a tool that allows users to create abliterated models using transformers quickly and easily. It is not a tool for uncensorship, but rather for making models that will not explicitly refuse users. Users can clone the repository, install dependencies, and make abliterations using the provided commands. The tool supports adjusting parameters for stubborn models and offers various options for customization. Abliteration can be used for creating modified models for specific tasks or topics.

github

: 64

OpenAI-sublime-text

The OpenAI Completion plugin for Sublime Text provides first-class code assistant support within the editor. It utilizes LLM models to manipulate code, engage in chat mode, and perform various tasks. The plugin supports OpenAI, llama.cpp, and ollama models, allowing users to customize their AI assistant experience. It offers separated chat histories and assistant settings for different projects, enabling context-specific interactions. Additionally, the plugin supports Markdown syntax with code language syntax highlighting, server-side streaming for faster response times, and proxy support for secure connections. Users can configure the plugin's settings to set their OpenAI API key, adjust assistant modes, and manage chat history. Overall, the OpenAI Completion plugin enhances the Sublime Text editor with powerful AI capabilities, streamlining coding workflows and fostering collaboration with AI assistants.

github

: 267

gpt-cli

gpt-cli is a command-line interface tool for interacting with various chat language models like ChatGPT, Claude, and others. It supports model customization, usage tracking, keyboard shortcuts, multi-line input, markdown support, predefined messages, and multiple assistants. Users can easily switch between different assistants, define custom assistants, and configure model parameters and API keys in a YAML file for easy customization and management.

github

: 580

mandark

Mandark is a lightweight AI tool that can perform various tasks, such as answering questions about codebases, editing files, verifying diffs, estimating token and cost before execution, and working with any codebase. It supports multiple AI models like Claude-3.5 Sonnet, Haiku, GPT-4o-mini, and GPT-4-turbo. Users can run Mandark without installation and easily interact with it through command line options. It offers flexibility in processing individual files or folders and allows for customization with optional AI model selection and output preferences.

github

: 169

reader

Reader is a tool that converts any URL to an LLM-friendly input with a simple prefix `https://r.jina.ai/`. It improves the output for your agent and RAG systems at no cost. Reader supports image reading, captioning all images at the specified URL and adding `Image [idx]: [caption]` as an alt tag. This enables downstream LLMs to interact with the images in reasoning, summarizing, etc. Reader offers a streaming mode, useful when the standard mode provides an incomplete result. In streaming mode, Reader waits a bit longer until the page is fully rendered, providing more complete information. Reader also supports a JSON mode, which contains three fields: `url`, `title`, and `content`. Reader is backed by Jina AI and licensed under Apache-2.0.

github

: 8.5k

basehub

JavaScript / TypeScript SDK for BaseHub, the first AI-native content hub. **Features:** * ✨ Infers types from your BaseHub repository... _meaning IDE autocompletion works great._ * 🏎️ No dependency on graphql... _meaning your bundle is more lightweight._ * 🌐 Works everywhere `fetch` is supported... _meaning you can use it anywhere._

github

: 183

laragenie

Laragenie is an AI chatbot designed to understand and assist developers with their codebases. It runs on the command line from a Laravel app, helping developers onboard to new projects, understand codebases, and provide daily support. Laragenie accelerates workflow and collaboration by indexing files and directories, allowing users to ask questions and receive AI-generated responses. It supports OpenAI and Pinecone for processing and indexing data, making it a versatile tool for any repo in any language.

github

: 135

aiohttp-devtools

aiohttp-devtools provides dev tools for developing applications with aiohttp and associated libraries. It includes CLI commands for running a local server with live reloading and serving static files. The tools aim to simplify the development process by automating tasks such as setting up a new application and managing dependencies. Developers can easily create and run aiohttp applications, manage static files, and utilize live reloading for efficient development.

github

: 255

termax

Termax is an LLM agent in your terminal that converts natural language to commands. It is featured by: - Personalized Experience: Optimize the command generation with RAG. - Various LLMs Support: OpenAI GPT, Anthropic Claude, Google Gemini, Mistral AI, and more. - Shell Extensions: Plugin with popular shells like `zsh`, `bash` and `fish`. - Cross Platform: Able to run on Windows, macOS, and Linux.

github

: 88

aider.nvim

Aider.nvim is a Neovim plugin that integrates the Aider AI coding assistant, allowing users to open a terminal window within Neovim to run Aider. It provides functions like AiderOpen to open the terminal window, AiderAddModifiedFiles to add git-modified files to the Aider chat, and customizable keybindings. Users can configure the plugin using the setup function to manage context, keybindings, debug logging, and ignore specific buffer names.

github

: 241

For similar tasks

transcribe-anything

github

: 621

AudioNotes

AudioNotes is a system built on FunASR and Qwen2 that can quickly extract content from audio and video, and organize it using large models into structured markdown notes for easy reading. Users can interact with the audio and video content, install Ollama, pull models, and deploy services using Docker or locally with a PostgreSQL database. The system provides a seamless way to convert audio and video into structured notes for efficient consumption.

github

: 102

StoryToolkitAI

StoryToolkitAI is a film editing tool that utilizes AI to transcribe, index scenes, search through footage, and create stories. It offers features like full video indexing, automatic transcriptions and translations, compatibility with OpenAI GPT and ollama, story editor for screenplay writing, speaker detection, project file management, and more. It integrates with DaVinci Resolve Studio 18 and offers planned features like automatic topic classification and integration with other AI tools. The tool is developed by Octavian Mot and is actively being updated with new features based on user needs and feedback.

github

: 777

decipher

Decipher is a tool that utilizes AI-generated transcription subtitles to automatically add subtitles to videos. It eliminates the need for manual transcription, making videos more accessible. The tool uses OpenAI's Whisper, a State-of-the-Art speech recognition system trained on a large dataset for improved robustness to accents, background noise, and technical language.

github

: 519

you2txt

You2Txt is a tool developed for the Vercel + Nvidia 2-hour hackathon that converts any YouTube video into a transcribed .txt file. The project won first place in the hackathon and is hosted at you2txt.com. Due to rate limiting issues with YouTube requests, it is recommended to run the tool locally. The project was created using Next.js, Tailwind, v0, and Claude, and can be built and accessed locally for development purposes.

github

: 71

MemoAI

MemoAI is an AI-powered tool that provides podcast, video-to-text, and subtitling capabilities for immediate use. It supports audio and video transcription, model selection for paragraph effects, local subtitles translation, text translation using Google, Microsoft, Volcano Translation, DeepL, and AI Translation, speech synthesis in multiple languages, and exporting text and subtitles in common formats. MemoAI is designed to simplify the process of transcribing, translating, and creating subtitles for various media content.

github

: 610

glide

Glide is a cloud-native LLM gateway that provides a unified REST API for accessing various large language models (LLMs) from different providers. It handles LLMOps tasks such as model failover, caching, key management, and more, making it easy to integrate LLMs into applications. Glide supports popular LLM providers like OpenAI, Anthropic, Azure OpenAI, AWS Bedrock (Titan), Cohere, Google Gemini, OctoML, and Ollama. It offers high availability, performance, and observability, and provides SDKs for Python and NodeJS to simplify integration.

github

: 110

onnxruntime-genai

ONNX Runtime Generative AI is a library that provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management. Users can call a high level `generate()` method, or run each iteration of the model in a loop. It supports greedy/beam search and TopP, TopK sampling to generate token sequences, has built in logits processing like repetition penalties, and allows for easy custom scoring.

github

: 442

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k

transcribe-anything

README:

transcribe-anything

USES WHISPER AI

Insanely fast on cuda platforms

Speaker.json

speaker.json example:

cuda vs insane

Install

GPU Acceleration

Usage

Api

Develop

Required: Install to current python environment

Tech Stack

Testing

Versions

Notes:

For Tasks:

For Jobs:

Alternative AI tools for transcribe-anything

Similar Open Source Tools

transcribe-anything

fish-ai

comfy-cli

kwaak

ezlocalai

comfy-cli

abliteration

OpenAI-sublime-text

gpt-cli

mandark

reader

basehub

laragenie

aiohttp-devtools

termax

aider.nvim

For similar tasks

transcribe-anything

AudioNotes

StoryToolkitAI

decipher

you2txt

MemoAI

glide

onnxruntime-genai

For similar jobs

sweep

teams-ai

ai-guide

classifai

chatbot-ui

BricksLLM

uAgents

griptape

Insanely fast on `cuda` platforms

`cuda` vs `insane`