StoryToolkitAI
An editing tool that uses AI to transcribe, understand content and search for anything in your footage, integrated with ChatGPT and other AI models
Stars: 777
StoryToolkitAI is a film editing tool that utilizes AI to transcribe, index scenes, search through footage, and create stories. It offers features like full video indexing, automatic transcriptions and translations, compatibility with OpenAI GPT and ollama, story editor for screenplay writing, speaker detection, project file management, and more. It integrates with DaVinci Resolve Studio 18 and offers planned features like automatic topic classification and integration with other AI tools. The tool is developed by Octavian Mot and is actively being updated with new features based on user needs and feedback.
README:
StoryToolkitAI is a film editing tool that tries to understand your footage and helps you edit more efficiently with the assistance of AI.
It transcribes, indexes scenes, helps you search through your footage, automatically selects and creates stories using large language models (OpenAI GPT-4, llama, DeepSeek etc.), which you can then import into your editing software via EDL or XML.
The tool works locally on your machine, independent of any other editing software, but it also integrates with DaVinci Resolve Studio 18+.
- [x] Full video indexing and search (How-To)
- [x] Free Automatic Transcriptions on your local machine
- [x] Free Automatic Translation to English on your local machine
- [x] Compatible with OpenAI, ollama, vLLM, LM Studio etc. - chat to AI about your content, or generate new ideas
- [x] Search Content intuitively without having to type in exact words
- [X] Story Editor - write screenplays containing your transcripts and export them for editing (EDL/XML/Fountain) (v. 0.20.1+)
- [X] Translate transcripts to other languages using OpenAI GPT (v. 0.22.0+)
- [X] Ask AI to create Stories and Selections based on your footage using OpenAI GPT (v. 0.22.0+)
- [X] Automatic Speaker Detection in transcripts (v. 0.23.0+)
- [X] Project File Management for more intuitive workflows and easier search
- [X] Automatic Question detection in transcripts
- [X] Transcript Groups - group transcript lines into whatever you need to find them easier
- [x] Multi-format export of transcripts, including SRT, TXT, AVID DS and as Fusion Text node
- [X] Import of existing SRT files
- [X] Easy copy of timecoded transcript text to clipboard etc.
- [x] Mark and Navigate Resolve Timelines via Transcript, plus other handy Resolve-only features
- [x] Advanced Search of Resolve timeline markers using AI
- [x] Copy Resolve timeline markers to transcript and vice-versa for advanced search
- [x] Direct import of subtitles into Resolve bin
- [ ] Automatic Topic Classification to help you discover ideas in your transcripts
- [ ] Integration with other AI tools
- [ ] Integration with other software / standalone players
- [X] Plus more flashy features as clickbait to unrealistically raise expectations and destroy competition
Some of the above features are only available in the non-standalone version of the tool, but they will be available in the standalone version in the next release.
For detailed features info, go here.
To download the latest standalone release, see the releases page.
However, the standalone releases will most likely always be behind the git version, so, if you're comfortable with the terminal / command line and want to always have access to the newest features, we recommend that you try to install the tool from source.
For detailed installation instructions go here.
Yes, the tool runs locally and there's no need for any additional account to transcribe, index video or search. These features will always be free as long as your machine supports them without external services.
The only feature that now requires external services is the Assistant when you want to use on external LLM providers (OpenAI etc.). However, you can run local LLMs too!
We rely on the support of our Patreon members! If you want to support development and get access to new features earlier, check out our Patreon page.
By the way, if you feel that your content is sensitive or subject to privacy laws, no worries: the tool does not send anything that you don't want to the Internet, it only uses your local machine to transcribe and translate your audio.
Currently, the only features that send data from your machine to the Internet are:
- The StoryToolkitAI API Key check to storytoolkit.ai (only when entered in the Settings Window)
- The Assistant, to OpenAI, storytoolkit.ai or other external providers (only contexts and messages that you select and send).
The tool also checks for updates on every start.
This tool is coded by Octavian Mot, your unfriendly filmmaker who hates to code and tries to keep it together as half of mots. Our team uses it daily in our editing room which allows us to update it with features that we need and think will be useful to others.
But, keep in mind that the tool is still being actively developed, raw and unpolished.
Feel free to get in touch with criticism, or weird ideas for new features.
The tool would be useless without using the following open source projects:
- OpenAI Whisper
- Sentence Transformers
- pyannote.audio
- speechbrain
- spaCy
- CustomTkinter
- and many other packages listed in requirements.txt
Please open an issue with what you're trying to solve first and let's discuss it there.
For troubleshooting and possible solutions to known issues, see the known issues section here or do a quick search in the Issues tab
Please report any problems directly in the Issues tab, here on Github: https://github.com/octimot/StoryToolkitAI/issues
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for StoryToolkitAI
Similar Open Source Tools
StoryToolkitAI
StoryToolkitAI is a film editing tool that utilizes AI to transcribe, index scenes, search through footage, and create stories. It offers features like full video indexing, automatic transcriptions and translations, compatibility with OpenAI GPT and ollama, story editor for screenplay writing, speaker detection, project file management, and more. It integrates with DaVinci Resolve Studio 18 and offers planned features like automatic topic classification and integration with other AI tools. The tool is developed by Octavian Mot and is actively being updated with new features based on user needs and feedback.
wunjo.wladradchenko.ru
Wunjo AI is a comprehensive tool that empowers users to explore the realm of speech synthesis, deepfake animations, video-to-video transformations, and more. Its user-friendly interface and privacy-first approach make it accessible to both beginners and professionals alike. With Wunjo AI, you can effortlessly convert text into human-like speech, clone voices from audio files, create multi-dialogues with distinct voice profiles, and perform real-time speech recognition. Additionally, you can animate faces using just one photo combined with audio, swap faces in videos, GIFs, and photos, and even remove unwanted objects or enhance the quality of your deepfakes using the AI Retouch Tool. Wunjo AI is an all-in-one solution for your voice and visual AI needs, offering endless possibilities for creativity and expression.
local_multimodal_ai_chat
Local Multimodal AI Chat is a hands-on project that teaches you how to build a multimodal chat application. It integrates different AI models to handle audio, images, and PDFs in a single chat interface. This project is perfect for anyone interested in AI and software development who wants to gain practical experience with these technologies.
OpenDAN-Personal-AI-OS
OpenDAN is an open source Personal AI OS that consolidates various AI modules for personal use. It empowers users to create powerful AI agents like assistants, tutors, and companions. The OS allows agents to collaborate, integrate with services, and control smart devices. OpenDAN offers features like rapid installation, AI agent customization, connectivity via Telegram/Email, building a local knowledge base, distributed AI computing, and more. It aims to simplify life by putting AI in users' hands. The project is in early stages with ongoing development and future plans for user and kernel mode separation, home IoT device control, and an official OpenDAN SDK release.
ShortGPT
ShortGPT is a powerful framework for automating content creation, simplifying video creation, footage sourcing, voiceover synthesis, and editing tasks. It offers features like automated editing framework, scripts and prompts, voiceover support in multiple languages, caption generation, asset sourcing, and persistency of editing variables. The tool is designed for youtube automation, Tiktok creativity program automation, and offers customization options for efficient and creative content creation.
LLPlayer
LLPlayer is a specialized media player designed for language learning, offering unique features such as dual subtitles, AI-generated subtitles, real-time OCR, real-time translation, word lookup, and more. It supports multiple languages, online video playback, customizable settings, and integration with browser extensions. Written in C#/WPF, LLPlayer is free, open-source, and aims to enhance the language learning experience through innovative functionalities.
meilisearch
Meilisearch is a lightning-fast search engine that seamlessly integrates into apps, websites, and workflows. It offers features like hybrid search, search-as-you-type, typo tolerance, filtering, sorting, synonym support, geosearch, extensive language support, security management, multi-tenancy, RESTful API, AI-readiness, easy installation, deployment, and maintenance.
CodeGPT
CodeGPT is an extension for JetBrains IDEs that provides access to state-of-the-art large language models (LLMs) for coding assistance. It offers a range of features to enhance the coding experience, including code completions, a ChatGPT-like interface for instant coding advice, commit message generation, reference file support, name suggestions, and offline development support. CodeGPT is designed to keep privacy in mind, ensuring that user data remains secure and private.
danswer
Danswer is an open-source Gen-AI Chat and Unified Search tool that connects to your company's docs, apps, and people. It provides a Chat interface and plugs into any LLM of your choice. Danswer can be deployed anywhere and for any scale - on a laptop, on-premise, or to cloud. Since you own the deployment, your user data and chats are fully in your own control. Danswer is MIT licensed and designed to be modular and easily extensible. The system also comes fully ready for production usage with user authentication, role management (admin/basic users), chat persistence, and a UI for configuring Personas (AI Assistants) and their Prompts. Danswer also serves as a Unified Search across all common workplace tools such as Slack, Google Drive, Confluence, etc. By combining LLMs and team specific knowledge, Danswer becomes a subject matter expert for the team. Imagine ChatGPT if it had access to your team's unique knowledge! It enables questions such as "A customer wants feature X, is this already supported?" or "Where's the pull request for feature Y?"
chatty
Chatty is a private AI tool that runs large language models natively and privately in the browser, ensuring in-browser privacy and offline usability. It supports chat history management, open-source models like Gemma and Llama2, responsive design, intuitive UI, markdown & code highlight, chat with files locally, custom memory support, export chat messages, voice input support, response regeneration, and light & dark mode. It aims to bring popular AI interfaces like ChatGPT and Gemini into an in-browser experience.
ProxyAI
ProxyAI is an open-source AI copilot for JetBrains, offering advanced code assistance features powered by top-tier language models. Users can customize their coding experience, receive AI-suggested code changes, autocomplete suggestions, and context-aware naming suggestions. The tool also allows users to chat with images, reference project files and folders, web docs, git history, and search the web. ProxyAI prioritizes user privacy by not collecting sensitive information and only gathering anonymous usage data with consent.
obsidian-smart-composer
Smart Composer is an Obsidian plugin that enhances note-taking and content creation by integrating AI capabilities. It allows users to efficiently write by referencing their vault content, providing contextual chat with precise context selection, multimedia context support for website links and images, document edit suggestions, and vault search for relevant notes. The plugin also offers features like custom model selection, local model support, custom system prompts, and prompt templates. Users can set up the plugin by installing it through the Obsidian community plugins, enabling it, and configuring API keys for supported providers like OpenAI, Anthropic, and Gemini. Smart Composer aims to streamline the writing process by leveraging AI technology within the Obsidian platform.
ai-dev-gallery
The AI Dev Gallery is an app designed to help Windows developers integrate AI capabilities within their own apps and projects. It contains over 25 interactive samples powered by local AI models, allows users to explore, download, and run models from Hugging Face and GitHub, and provides the ability to view the C# source code and export a standalone Visual Studio project for each sample. The app is open-source and welcomes contributions and suggestions from the community.
aide
Aide is an Open Source AI-native code editor that combines the powerful features of VS Code with advanced AI capabilities. It provides a combined chat + edit flow, proactive agents for fixing errors, inline editing widget, intelligent code completion, and AST navigation. Aide is designed to be an intelligent coding companion, helping users write better code faster while maintaining control over the development process.
GameSentenceMiner
GameSentenceMiner (GSM) is an immersion toolkit designed to assist with language learning through games. It enhances Anki cards with automated audio capture, manual trim options, screenshot capture, multi-line support, and AI translation. Additionally, GSM offers OCR capabilities with easier setup, exclusion zones, two-pass OCR system, consistent audio timing, and support for multiple languages. The tool also features game launcher capabilities for simplifying game setup and launching. Basic requirements include an Anki card creation tool, a method of extracting text from games, and, of course, a game. GSM provides detailed documentation and FAQs to help users understand its functionality and troubleshoot any issues. Users can seek support through the project's Discord channel or by creating issues on the repository.
LLM-Minutes-of-Meeting
LLM-Minutes-of-Meeting is a project showcasing NLP & LLM's capability to summarize long meetings and automate the task of delegating Minutes of Meeting(MoM) emails. It converts audio/video files to text, generates editable MoM, and aims to develop a real-time python web-application for meeting automation. The tool features keyword highlighting, topic tagging, export in various formats, user-friendly interface, and uses Celery for asynchronous processing. It is designed for corporate meetings, educational institutions, legal and medical fields, accessibility, and event coverage.
For similar tasks
EasyNovelAssistant
EasyNovelAssistant is a simple novel generation assistant powered by a lightweight and uncensored Japanese local LLM 'LightChatAssistant-TypeB'. It allows for perpetual generation with 'Generate forever' feature, stacking up lucky gacha draws. It also supports text-to-speech. Users can directly utilize KoboldCpp and Style-Bert-VITS2 internally or use EasySdxlWebUi to generate images while using the tool. The tool is designed for local novel generation with a focus on ease of use and flexibility.
tock
Tock is an open conversational AI platform for building bots. It offers a natural language processing open source stack compatible with various tools, a user interface for building stories and analytics, a conversational DSL for different programming languages, built-in connectors for text/voice channels, toolkits for custom web/mobile integration, and the ability to deploy anywhere in the cloud or on-premise with Docker.
StoryToolKit
StoryToolkitAI is a film editing tool that utilizes AI to transcribe, index scenes, search through footage, and create stories. It offers features such as automatic transcription, translation, story creation, speaker detection, project file management, and more. The tool works locally on your machine and integrates with DaVinci Resolve Studio 18. It aims to streamline the editing process by leveraging AI capabilities and enhancing user efficiency.
StoryToolkitAI
StoryToolkitAI is a film editing tool that utilizes AI to transcribe, index scenes, search through footage, and create stories. It offers features like full video indexing, automatic transcriptions and translations, compatibility with OpenAI GPT and ollama, story editor for screenplay writing, speaker detection, project file management, and more. It integrates with DaVinci Resolve Studio 18 and offers planned features like automatic topic classification and integration with other AI tools. The tool is developed by Octavian Mot and is actively being updated with new features based on user needs and feedback.
aichildedu
AICHILDEDU is a microservice-based AI education platform for children that integrates LLMs, image generation, and speech synthesis to provide personalized storybook creation, intelligent conversational learning, and multimedia content generation. It offers features like personalized story generation, educational quiz creation, multimedia integration, age-appropriate content, multi-language support, user management, parental controls, and asynchronous processing. The platform follows a microservice architecture with components like API Gateway, User Service, Content Service, Learning Service, and AI Services. Technologies used include Python, FastAPI, PostgreSQL, MongoDB, Redis, LangChain, OpenAI GPT models, TensorFlow, PyTorch, Transformers, MinIO, Elasticsearch, Docker, Docker Compose, and JWT-based authentication.
Jailbreaks
Jailbreaks is a repository dedicated to organizing and curating models suitable for NSFW writing. It serves as a collection of resources for writers looking to explore adult content in a structured manner.
narratrix
NarratrixAI is an AI-powered tabletop roleplaying platform that leverages AI to create dynamic, responsive, and immersive storytelling experiences. It allows users to create their own stories, use it as character chat, or as a full tabletop RPG experience. The platform features a powerful chat system, flexible AI integration, rich character management, powerful storytelling tools, and developer-friendly customization options. Narratrix supports various AI providers through a manifest system and is built with Tauri for native performance across Windows, macOS, and Linux platforms.
Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.
For similar jobs
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
daily-poetry-image
Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.
exif-photo-blog
EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.
SillyTavern
SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.
Twitter-Insight-LLM
This project enables you to fetch liked tweets from Twitter (using Selenium), save it to JSON and Excel files, and perform initial data analysis and image captions. This is part of the initial steps for a larger personal project involving Large Language Models (LLMs).
AISuperDomain
Aila Desktop Application is a powerful tool that integrates multiple leading AI models into a single desktop application. It allows users to interact with various AI models simultaneously, providing diverse responses and insights to their inquiries. With its user-friendly interface and customizable features, Aila empowers users to engage with AI seamlessly and efficiently. Whether you're a researcher, student, or professional, Aila can enhance your AI interactions and streamline your workflow.
ChatGPT-On-CS
This project is an intelligent dialogue customer service tool based on a large model, which supports access to platforms such as WeChat, Qianniu, Bilibili, Douyin Enterprise, Douyin, Doudian, Weibo chat, Xiaohongshu professional account operation, Xiaohongshu, Zhihu, etc. You can choose GPT3.5/GPT4.0/ Lazy Treasure Box (more platforms will be supported in the future), which can process text, voice and pictures, and access external resources such as operating systems and the Internet through plug-ins, and support enterprise AI applications customized based on their own knowledge base.
obs-localvocal
LocalVocal is a live-streaming AI assistant plugin for OBS that allows you to transcribe audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). It's privacy-first, with all data staying on your machine, and requires no GPU, cloud costs, network, or downtime.
