
Youtube-playlist-to-formatted-text
A desktop application that extracts YouTube playlist transcripts and enhances them using Google's Gemini AI models., the output is a book in any language you want.
Stars: 262

This Python application, 'Youtube-playlist-to-formatted-text', utilizes the Google Gemini API to extract and refine transcripts from YouTube playlists. It offers various refinement styles such as Balanced and Detailed, Summary, Educational, Narrative Rewriting, and Q&A Generation. Users can control the chunk size for API calls, select Gemini models, and output the refined transcript as a formatted markdown file. The tool is designed to convert lengthy YouTube playlists into organized text files for easy readability and further processing, suitable for tasks like summarizing videos, creating study guides, and enhancing content comprehension.
README:
✅ Added several Refinement styles to choose from based on your specific needs.
The "Refinement Style" dropdown allows you to choose how AI will process the YouTube transcript. Here's a description of each style:
⚖️ Balanced and Detailed: This is the default style, providing a comprehensive refinement of the transcript. It focuses on organizing the text into a well-structured, readable format with headings, bullet points, and bold text, while preserving every detail, context, and nuance of the original content. Ideal if you want a thoroughly enhanced transcript without any information loss.
📝 Summary: This style generates a concise and informative summary of the video transcript. It extracts the core message, main arguments, and key information, providing a quick and easily digestible overview of the video's content. Best for when you need to quickly grasp the main points without reading the entire transcript.
📚 Educational: This style transforms the transcript into a structured educational text, similar to a textbook chapter. It uses headings, subheadings, and bullet points for clarity and organization, making it ideal for learning. Crucially, it also identifies and defines technical terms and jargon within blockquotes, enhancing understanding and acting as a built-in glossary. (Example Image Below)
✍️ Narrative Rewriting: This style creatively rewrites the transcript into an engaging narrative or story format. It transforms the factual or conversational content into a more captivating and readable piece, like a short story or narrative article. While storytelling is applied, it stays closely aligned with the original video's subjects and information, making the content more accessible and enjoyable.
❓ Q&A Generation: This style generates a set of questions and answers based on the transcript, formatted for self-assessment or review. Each question is presented as a foldable header (using Markdown), with the answer hidden beneath. This format is perfect for creating study guides or quizzes to test your understanding of the video content.(Example Image Below)
✅ Added Language Support, now the output file is in the language of user's input.
✅ Added single video url support, no need to put it in a playlist.
✅ Added configurable Chunk Size for API calls.
Users can now control the chunk size used when processing transcripts with the Gemini API via a slider in the UI. This allows for customization of processing behavior:
- Larger chunk sizes: Reduce the number of API calls, potentially speeding up execution and suitable for summarizing longer videos with less emphasis on fine details.
- Smaller chunk sizes: Increase API calls but may preserve more detail and nuance, potentially beneficial for tasks requiring high fidelity output.
❓ What is Chunk Size?
A video, is divided into chunks to be given to AI, so if you set chunk size to 3000 words, and the video has 8000 words, the API workflow would be like this :
- First 3000 words ➡➡processed by AI➡➡ Refined part 1
- Second 3000 words + Refined part 1 as context ➡➡processed by AI➡➡ Refinde part 2
- final 2000 words + Refined part 1 + 2 as context ➡➡processed by AI➡➡ Refinde part 3
- Refined part 1 + Refined part 2 + Refined part 3 = Final Formatted Text of the video!
This Python application extracts transcripts from YouTube playlists and refines them using the Google Gemini API(which is free). It takes a YouTube playlist URL as input, extracts transcripts for each video, and then uses Gemini to reformat and improve the readability of the combined transcript. The output is saved as a text file.
So you can have a neatly formatted book out of a YouTube playlist!
I personally use it to convert large YouTube playlists containing dozens of long videos into a very large organized markdown file to give it as input to NotebookLM as one source.
Works Great with Obsidian too!
Read more about it in this Medium Article
- Batch processing of entire playlists
- Refine transcripts using Google Gemini API for improved formatting and readability.
- User-friendly PyQt5 graphical interface.
- Selectable Gemini models.
- Output to markdown file.
- 🎥 Automatic transcript extraction from YouTube playlists
- 🧠 AI-powered text refinement using Gemini models
- 📁 Configurable output file paths
- ⏳ Progress tracking for both extraction and refinement
- 📄 Output to formatted markdown file.
- Python 3.9+
- Google Gemini API key
- YouTube playlist URL
pip install -r requirements.txt
- First, the transcript of every video in the playlist is fetched.
- since gemini api doesnt have unlimited context window for input and output, the text for each video gets divided into chunks(right now, chunk size is set to 3000 after testing, but it can be changed via the added slider)
- Each text chunk is then sent to the Gemini API, along with a context prompt that includes the previously refined text. This helps maintain consistency and coherence across chunks.
- The refined output from Gemini for each chunk is appended to the final output file.
- This process is repeated for every video in the playlist, resulting in a single, refined transcript output file for the entire playlist.
- Get a Gemini API Key: You need a Google Gemini API key. Obtain one from Google AI Studio.
-
Run the Application:
python main.py
-
In the GUI:
- Enter the YouTube Playlist URL or Video link.
- Type the Output Language.
- choose the style of output.
- Specify chunk size.
- Choose output file locations for the transcript and Gemini refined text using the "Choose File" buttons.
- Enter your Gemini API key in the "Gemini API Key" field.
- Click "Start Processing".
- You can select a Gemini model.
- Wait for the processing to complete. Progress will be shown in the progress bar and status display.
- The output files will be saved to the locations you specified.
Example of Educational Style with added definition of technical terms
Example of Q&A Style, Questions are headers so they can be folded/unfolded
YouTube playlist used for example files : https://www.youtube.com/playlist?list=PLmHVyfmcRKyx1KSoobwukzf1Nf-Y97Rw0
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Youtube-playlist-to-formatted-text
Similar Open Source Tools

Youtube-playlist-to-formatted-text
This Python application, 'Youtube-playlist-to-formatted-text', utilizes the Google Gemini API to extract and refine transcripts from YouTube playlists. It offers various refinement styles such as Balanced and Detailed, Summary, Educational, Narrative Rewriting, and Q&A Generation. Users can control the chunk size for API calls, select Gemini models, and output the refined transcript as a formatted markdown file. The tool is designed to convert lengthy YouTube playlists into organized text files for easy readability and further processing, suitable for tasks like summarizing videos, creating study guides, and enhancing content comprehension.

feedgen
FeedGen is an open-source tool that uses Google Cloud's state-of-the-art Large Language Models (LLMs) to improve product titles, generate more comprehensive descriptions, and fill missing attributes in product feeds. It helps merchants and advertisers surface and fix quality issues in their feeds using Generative AI in a simple and configurable way. The tool relies on GCP's Vertex AI API to provide both zero-shot and few-shot inference capabilities on GCP's foundational LLMs. With few-shot prompting, users can customize the model's responses towards their own data, achieving higher quality and more consistent output. FeedGen is an Apps Script based application that runs as an HTML sidebar in Google Sheets, allowing users to optimize their feeds with ease.

PulsarRPA
PulsarRPA is a high-performance, distributed, open-source Robotic Process Automation (RPA) framework designed to handle large-scale RPA tasks with ease. It provides a comprehensive solution for browser automation, web content understanding, and data extraction. PulsarRPA addresses challenges of browser automation and accurate web data extraction from complex and evolving websites. It incorporates innovative technologies like browser rendering, RPA, intelligent scraping, advanced DOM parsing, and distributed architecture to ensure efficient, accurate, and scalable web data extraction. The tool is open-source, customizable, and supports cutting-edge information extraction technology, making it a preferred solution for large-scale web data extraction.

intelligence-toolkit
The Intelligence Toolkit is a suite of interactive workflows designed to help domain experts make sense of real-world data by identifying patterns, themes, relationships, and risks within complex datasets. It utilizes generative AI (GPT models) to create reports on findings of interest. The toolkit supports analysis of case, entity, and text data, providing various interactive workflows for different intelligence tasks. Users are expected to evaluate the quality of data insights and AI interpretations before taking action. The system is designed for moderate-sized datasets and responsible use of personal case data. It uses the GPT-4 model from OpenAI or Azure OpenAI APIs for generating reports and insights.

HiNote
HiNote is an AI-programmed Obsidian plugin that allows users to extract highlighted text from notes, add comments, generate AI comments, and engage in dialogue with the highlighted text. Users can highlight text in various formats, export it as knowledge card images, create new notes, and enjoy extended features in the main view. The plugin supports features like highlighted text retrieval, highlight comments, export as image, export as note, AI comment generation, AI chat, and premium features like a Flashcard system for effective memorization.

project_alice
Alice is an agentic workflow framework that integrates task execution and intelligent chat capabilities. It provides a flexible environment for creating, managing, and deploying AI agents for various purposes, leveraging a microservices architecture with MongoDB for data persistence. The framework consists of components like APIs, agents, tasks, and chats that interact to produce outputs through files, messages, task results, and URL references. Users can create, test, and deploy agentic solutions in a human-language framework, making it easy to engage with by both users and agents. The tool offers an open-source option, user management, flexible model deployment, and programmatic access to tasks and chats.

SwiftSage
SwiftSage is a tool designed for conducting experiments in the field of machine learning and artificial intelligence. It provides a platform for researchers and developers to implement and test various algorithms and models. The tool is particularly useful for exploring new ideas and conducting experiments in a controlled environment. SwiftSage aims to streamline the process of developing and testing machine learning models, making it easier for users to iterate on their ideas and achieve better results. With its user-friendly interface and powerful features, SwiftSage is a valuable tool for anyone working in the field of AI and ML.

ChainForge
ChainForge is a visual programming environment for battle-testing prompts to LLMs. It is geared towards early-stage, quick-and-dirty exploration of prompts, chat responses, and response quality that goes beyond ad-hoc chatting with individual LLMs. With ChainForge, you can: * Query multiple LLMs at once to test prompt ideas and variations quickly and effectively. * Compare response quality across prompt permutations, across models, and across model settings to choose the best prompt and model for your use case. * Setup evaluation metrics (scoring function) and immediately visualize results across prompts, prompt parameters, models, and model settings. * Hold multiple conversations at once across template parameters and chat models. Template not just prompts, but follow-up chat messages, and inspect and evaluate outputs at each turn of a chat conversation. ChainForge comes with a number of example evaluation flows to give you a sense of what's possible, including 188 example flows generated from benchmarks in OpenAI evals. This is an open beta of Chainforge. We support model providers OpenAI, HuggingFace, Anthropic, Google PaLM2, Azure OpenAI endpoints, and Dalai-hosted models Alpaca and Llama. You can change the exact model and individual model settings. Visualization nodes support numeric and boolean evaluation metrics. ChainForge is built on ReactFlow and Flask.

vertex-ai-creative-studio
GenMedia Creative Studio is an application showcasing the capabilities of Google Cloud Vertex AI generative AI creative APIs. It includes features like Gemini for prompt rewriting and multimodal evaluation of generated images. The app is built with Mesop, a Python-based UI framework, enabling rapid development of web and internal apps. The Experimental folder contains stand-alone applications and upcoming features demonstrating cutting-edge generative AI capabilities, such as image generation, prompting techniques, and audio/video tools.

AntSK
AntSK is an AI knowledge base/agent built with .Net8+Blazor+SemanticKernel. It features a semantic kernel for accurate natural language processing, a memory kernel for continuous learning and knowledge storage, a knowledge base for importing and querying knowledge from various document formats, a text-to-image generator integrated with StableDiffusion, GPTs generation for creating personalized GPT models, API interfaces for integrating AntSK into other applications, an open API plugin system for extending functionality, a .Net plugin system for integrating business functions, real-time information retrieval from the internet, model management for adapting and managing different models from different vendors, support for domestic models and databases for operation in a trusted environment, and planned model fine-tuning based on llamafactory.

graphrag-local-ollama
GraphRAG Local Ollama is a repository that offers an adaptation of Microsoft's GraphRAG, customized to support local models downloaded using Ollama. It enables users to leverage local models with Ollama for large language models (LLMs) and embeddings, eliminating the need for costly OpenAPI models. The repository provides a simple setup process and allows users to perform question answering over private text corpora by building a graph-based text index and generating community summaries for closely-related entities. GraphRAG Local Ollama aims to improve the comprehensiveness and diversity of generated answers for global sensemaking questions over datasets.

supervisely
Supervisely is a computer vision platform that provides a range of tools and services for developing and deploying computer vision solutions. It includes a data labeling platform, a model training platform, and a marketplace for computer vision apps. Supervisely is used by a variety of organizations, including Fortune 500 companies, research institutions, and government agencies.

llmops-promptflow-template
LLMOps with Prompt flow is a template and guidance for building LLM-infused apps using Prompt flow. It provides centralized code hosting, lifecycle management, variant and hyperparameter experimentation, A/B deployment, many-to-many dataset/flow relationships, multiple deployment targets, comprehensive reporting, BYOF capabilities, configuration-based development, local prompt experimentation and evaluation, endpoint testing, and optional Human-in-loop validation. The tool is customizable to suit various application needs.

sdk
Vikit.ai SDK is a software development kit that enables easy development of video generators using generative AI and other AI models. It serves as a langchain to orchestrate AI models and video editing tools. The SDK allows users to create videos from text prompts with background music and voice-over narration. It also supports generating composite videos from multiple text prompts. The tool requires Python 3.8+, specific dependencies, and tools like FFMPEG and ImageMagick for certain functionalities. Users can contribute to the project by following the contribution guidelines and standards provided.

ImageIndexer
LLMII is a tool that uses a local AI model to label metadata and index images without relying on cloud services or remote APIs. It runs a visual language model on your computer to generate captions and keywords for images, enhancing their metadata for indexing, searching, and organization. The tool can be run multiple times on the same image files, allowing for adding new data, regenerating data, and discovering files with issues. It supports various image formats, offers a user-friendly GUI, and can utilize GPU acceleration for faster processing. LLMII requires Python 3.8 or higher and operates directly on image file metadata fields like MWG:Keyword and XMP:Identifier.

dream-textures
Dream Textures is a tool integrated into Blender that allows users to create textures, concept art, background assets, and more using simple text prompts. It offers features like seamless texture creation, texture projection for entire scenes, restyling animations, and running models on the user's machine for faster iteration. The tool supports CUDA and Apple Silicon GPUs, with over 4GB of VRAM recommended. Users can troubleshoot issues by checking Blender's system console or seeking help from the community on Discord.
For similar tasks

Youtube-playlist-to-formatted-text
This Python application, 'Youtube-playlist-to-formatted-text', utilizes the Google Gemini API to extract and refine transcripts from YouTube playlists. It offers various refinement styles such as Balanced and Detailed, Summary, Educational, Narrative Rewriting, and Q&A Generation. Users can control the chunk size for API calls, select Gemini models, and output the refined transcript as a formatted markdown file. The tool is designed to convert lengthy YouTube playlists into organized text files for easy readability and further processing, suitable for tasks like summarizing videos, creating study guides, and enhancing content comprehension.

OrionChat
Orion is a web-based chat interface that simplifies interactions with multiple AI model providers. It provides a unified platform for chatting and exploring various large language models (LLMs) such as Ollama, OpenAI (GPT model), Cohere (Command-r models), Google (Gemini models), Anthropic (Claude models), Groq Inc., Cerebras, and SambaNova. Users can easily navigate and assess different AI models through an intuitive, user-friendly interface. Orion offers features like browser-based access, code execution with Google Gemini, text-to-speech (TTS), speech-to-text (STT), seamless integration with multiple AI models, customizable system prompts, language translation tasks, document uploads for analysis, and more. API keys are stored locally, and requests are sent directly to official providers' APIs without external proxies.
For similar jobs

ChatFAQ
ChatFAQ is an open-source comprehensive platform for creating a wide variety of chatbots: generic ones, business-trained, or even capable of redirecting requests to human operators. It includes a specialized NLP/NLG engine based on a RAG architecture and customized chat widgets, ensuring a tailored experience for users and avoiding vendor lock-in.

anything-llm
AnythingLLM is a full-stack application that enables you to turn any document, resource, or piece of content into context that any LLM can use as references during chatting. This application allows you to pick and choose which LLM or Vector Database you want to use as well as supporting multi-user management and permissions.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

mikupad
mikupad is a lightweight and efficient language model front-end powered by ReactJS, all packed into a single HTML file. Inspired by the likes of NovelAI, it provides a simple yet powerful interface for generating text with the help of various backends.

glide
Glide is a cloud-native LLM gateway that provides a unified REST API for accessing various large language models (LLMs) from different providers. It handles LLMOps tasks such as model failover, caching, key management, and more, making it easy to integrate LLMs into applications. Glide supports popular LLM providers like OpenAI, Anthropic, Azure OpenAI, AWS Bedrock (Titan), Cohere, Google Gemini, OctoML, and Ollama. It offers high availability, performance, and observability, and provides SDKs for Python and NodeJS to simplify integration.

onnxruntime-genai
ONNX Runtime Generative AI is a library that provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management. Users can call a high level `generate()` method, or run each iteration of the model in a loop. It supports greedy/beam search and TopP, TopK sampling to generate token sequences, has built in logits processing like repetition penalties, and allows for easy custom scoring.

firecrawl
Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown. It crawls all accessible subpages and provides clean markdown for each, without requiring a sitemap. The API is easy to use and can be self-hosted. It also integrates with Langchain and Llama Index. The Python SDK makes it easy to crawl and scrape websites in Python code.