![summarize](/statics/github-mark.png)
summarize
Video transcript summarization from multiple sources (YouTube, Dropbox, Google Drive, local files) using multiple LLM endpoints (OpenAI, Groq, custom model).
Stars: 73
![screenshot](/screenshots_githubs/martinopiaggi-summarize.jpg)
The 'summarize' tool is designed to transcribe and summarize videos from various sources using AI models. It helps users efficiently summarize lengthy videos, take notes, and extract key insights by providing timestamps, original transcripts, and support for auto-generated captions. Users can utilize different AI models via Groq, OpenAI, or custom local models to generate grammatically correct video transcripts and extract wisdom from video content. The tool simplifies the process of summarizing video content, making it easier to remember and reference important information.
README:
Transcribe and summarize videos from multiple sources using state-of-the-art AI models in Google Colab or locally. This tool addresses the problem of too much content and too little time, helping you remember the content you watch or listen to.
https://github.com/user-attachments/assets/db89ec4e-90f1-46b3-a944-f65e78f66496
- Versatile Video Sources: Summarize videos from YouTube, Dropbox, Google Drive, or local files.
-
Efficient Transcription:
- Use existing YouTube captions when available to save time and resources.
- Transcribe audio using Cloud Whisper (via Groq API) or Local Whisper.
-
Customizable Summarization:
- Choose from different prompt types: Summarization, Grammar Correction, or Distill Wisdom to extract key insights.
-
Flexible API Integration:
- Use various AI models via Groq (free), OpenAI, or custom local models for summarization.
-
Output Features:
- Generate summaries with timestamps and include original transcripts.
- Quick Summaries: Get concise summaries of lengthy videos with timestamps.
- Note-Taking: Capture key points efficiently.
- Transcription Correction: Obtain grammatically correct video transcripts.
- Wisdom Extraction: Extract key insights and wisdom from any video content.
graph LR
B{Choose Video Source}
B -->|YouTube| C{Use YouTube Captions?}
B -->|Google Drive| D[Convert to Audio]
B -->|Dropbox| D
B -->|Local File| D
C -->|Yes| E[Download YouTube Captions]
C -->|No| D
E --> J{Choose Prompt Type}
D --> G{Choose Transcription Method}
G -->|Cloud Whisper| H[Transcribe with Groq API endpoint Whisper]
G -->|Local Whisper| I[Transcribe with Local Whisper]
H --> J{Choose Prompt Type}
I --> J{Choose Prompt Type}
J --> K[Summarize Content]
J --> L[Correct Captions]
J --> M[Extract Key Insights]
J --> P[Questions and answers]
J --> Q[Essay Writing in Paul Graham Style]
K --> O[Generate Final Summary]
L --> O
M --> O
P --> O
Q --> O
%% Highlight important decision points
style C fill:#f9f,stroke:#333,stroke-width:2px
style G fill:#f9f,stroke:#333,stroke-width:2px
style J fill:#bbf,stroke:#333,stroke-width:2px
-
API stuff:
- Set
api_endpoint
to Groq, OpenAI, or Custom. - Ensure
api_key
is set accordingly. -
Groq API Key (
api_key_groq
): Required for cloud Whisper transcription. - If you plan to use Whisper API endpoint (only Groq endpoint is supported for now) you have to specify your Groq API key in api_key_groq.
- Why use
api_key_groq
andapi_key
? So that you can use a different API for summarization (e.g., OpenAI).
- Set
-
Configure Runtime Environment:
- If using Local Whisper on Google Colab:
- Switch the runtime type to a GPU instance (e.g., T4).
- Go to Runtime > Change runtime type > Set Hardware accelerator to GPU.
- If using Local Whisper on Google Colab:
-
Input Video Source:
- Input the video URL or file path.
- Select the source type (YouTube Video, Google Drive Video Link, Dropbox Video Link, Local File):
- For Google Drive, use the path relative to "My Drive".
- For Dropbox, use the public sharing link.
- For Youtube video, is recommended to use the available YouTube captions to save on transcription time and API usage.
-
Set Transcription Settings:
-
The transcription settings are applied only if you want to use Whisper transcription and not Youtube Captions.
-
Choose between cloud (Groq endpoint) or local Whisper:
-
Cloud Whisper:
- Only supported via the Groq endpoint.
- Requires
api_key_groq
.
-
Local Whisper:
- Requires a GPU runtime.
-
Cloud Whisper:
-
Language: Specify the language code (ISO-639-1 format, e.g., "en" for English,"it" for Italian).
-
Initial Prompt for Whisper: (Optional) Provide an initial prompt to guide the transcription.
-
Groq Free usage transcription limits using Whisper:
Model ID Requests per Day Audio Minutes per Hour Audio Minutes per Day distil-whisper-large-v3-en
2,000 120 480 whisper-large-v3
2,000 120 480
-
-
Set Summarization Settings:
- Prompt Type: Choose from Summarization, Grammar Correction, Distill Wisdom, Questions and answers or Essay Writing in Paul Graham Style.
- Configure other settings such as Parallel API Calls (mind rate limits), Chunk Size, and Max Output Tokens.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for summarize
Similar Open Source Tools
![summarize Screenshot](/screenshots_githubs/martinopiaggi-summarize.jpg)
summarize
The 'summarize' tool is designed to transcribe and summarize videos from various sources using AI models. It helps users efficiently summarize lengthy videos, take notes, and extract key insights by providing timestamps, original transcripts, and support for auto-generated captions. Users can utilize different AI models via Groq, OpenAI, or custom local models to generate grammatically correct video transcripts and extract wisdom from video content. The tool simplifies the process of summarizing video content, making it easier to remember and reference important information.
![llmchat Screenshot](/screenshots_githubs/trendy-design-llmchat.jpg)
llmchat
LLMChat is an all-in-one AI chat interface that supports multiple language models, offers a plugin library for enhanced functionality, enables web search capabilities, allows customization of AI assistants, provides text-to-speech conversion, ensures secure local data storage, and facilitates data import/export. It also includes features like knowledge spaces, prompt library, personalization, and can be installed as a Progressive Web App (PWA). The tech stack includes Next.js, TypeScript, Pglite, LangChain, Zustand, React Query, Supabase, Tailwind CSS, Framer Motion, Shadcn, and Tiptap. The roadmap includes upcoming features like speech-to-text and knowledge spaces.
![lawglance Screenshot](/screenshots_githubs/lawglance-lawglance.jpg)
lawglance
LawGlance is an AI-powered legal assistant that aims to bridge the gap between people and legal access. It is a free, open-source initiative designed to provide quick and accurate legal support tailored to individual needs. The project covers various laws, with plans for international expansion in the future. LawGlance utilizes AI-powered Retriever-Augmented Generation (RAG) to deliver legal guidance accessible to both laypersons and professionals. The tool is developed with support from mentors and experts at Data Science Academy and Curvelogics.
![agentneo Screenshot](/screenshots_githubs/raga-ai-hub-agentneo.jpg)
agentneo
AgentNeo is a Python package that provides functionalities for project, trace, dataset, experiment management. It allows users to authenticate, create projects, trace agents and LangGraph graphs, manage datasets, and run experiments with metrics. The tool aims to streamline AI project management and analysis by offering a comprehensive set of features.
![ComfyUI_Yvann-Nodes Screenshot](/screenshots_githubs/yvann-ba-ComfyUI_Yvann-Nodes.jpg)
ComfyUI_Yvann-Nodes
ComfyUI_Yvann-Nodes is a pack of custom nodes that enable audio reactivity within ComfyUI, allowing users to create AI-driven animations that sync with music. Users can generate audio reactive AI videos, control AI generation styles, content, and composition with any audio input. The tool is simple to use by dropping workflows in ComfyUI and specifying audio and visual inputs. It is flexible and works with existing ComfyUI AI tech and nodes like IPAdapter, AnimateDiff, and ControlNet. Users can pick workflows for Images → Video or Video → Video, download the corresponding .json file, drop it into ComfyUI, install missing custom nodes, set inputs, and generate audio-reactive animations.
![cog Screenshot](/screenshots_githubs/replicate-cog.jpg)
cog
Cog is an open-source tool that lets you package machine learning models in a standard, production-ready container. You can deploy your packaged model to your own infrastructure, or to Replicate.
![jan Screenshot](/screenshots_githubs/janhq-jan.jpg)
jan
Jan is an open-source ChatGPT alternative that runs 100% offline on your computer. It supports universal architectures, including Nvidia GPUs, Apple M-series, Apple Intel, Linux Debian, and Windows x64. Jan is currently in development, so expect breaking changes and bugs. It is lightweight and embeddable, and can be used on its own within your own projects.
![Visionatrix Screenshot](/screenshots_githubs/Visionatrix-Visionatrix.jpg)
Visionatrix
Visionatrix is a project aimed at providing easy use of ComfyUI workflows. It offers simplified setup and update processes, a minimalistic UI for daily workflow use, stable workflows with versioning and update support, scalability for multiple instances and task workers, multiple user support with integration of different user backends, LLM power for integration with Ollama/Gemini, and seamless integration as a service with backend endpoints and webhook support. The project is approaching version 1.0 release and welcomes new ideas for further implementation.
![kitchenai Screenshot](/screenshots_githubs/epuerta9-kitchenai.jpg)
kitchenai
KitchenAI is an open-source toolkit designed to simplify AI development by serving as an AI backend and LLMOps solution. It aims to empower developers to focus on delivering results without being bogged down by AI infrastructure complexities. With features like simplifying AI integration, providing an AI backend, and empowering developers, KitchenAI streamlines the process of turning AI experiments into production-ready APIs. It offers built-in LLMOps features, is framework-agnostic and extensible, and enables faster time-to-production. KitchenAI is suitable for application developers, AI developers & data scientists, and platform & infra engineers, allowing them to seamlessly integrate AI into apps, deploy custom AI techniques, and optimize AI services with a modular framework. The toolkit eliminates the need to build APIs and infrastructure from scratch, making it easier to deploy AI code as production-ready APIs in minutes. KitchenAI also provides observability, tracing, and evaluation tools, and offers a Docker-first deployment approach for scalability and confidence.
![instill-core Screenshot](/screenshots_githubs/instill-ai-instill-core.jpg)
instill-core
Instill Core is an open-source orchestrator comprising a collection of source-available projects designed to streamline every aspect of building versatile AI features with unstructured data. It includes Instill VDP (Versatile Data Pipeline) for unstructured data, AI, and pipeline orchestration, Instill Model for scalable MLOps and LLMOps for open-source or custom AI models, and Instill Artifact for unified unstructured data management. Instill Core can be used for tasks such as building, testing, and sharing pipelines, importing, serving, fine-tuning, and monitoring ML models, and transforming documents, images, audio, and video into a unified AI-ready format.
![swift-ocr-llm-powered-pdf-to-markdown Screenshot](/screenshots_githubs/yigitkonur-swift-ocr-llm-powered-pdf-to-markdown.jpg)
swift-ocr-llm-powered-pdf-to-markdown
Swift OCR is a powerful tool for extracting text from PDF files using OpenAI's GPT-4 Turbo with Vision model. It offers flexible input options, advanced OCR processing, performance optimizations, structured output, robust error handling, and scalable architecture. The tool ensures accurate text extraction, resilience against failures, and efficient handling of multiple requests.
![WebAI-to-API Screenshot](/screenshots_githubs/Amm1rr-WebAI-to-API.jpg)
WebAI-to-API
This project implements a web API that offers a unified interface to Google Gemini and Claude 3. It provides a self-hosted, lightweight, and scalable solution for accessing these AI models through a streaming API. The API supports both Claude and Gemini models, allowing users to interact with them in real-time. The project includes a user-friendly web UI for configuration and documentation, making it easy to get started and explore the capabilities of the API.
![human Screenshot](/screenshots_githubs/vladmandic-human.jpg)
human
AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition, Body Segmentation
![forge Screenshot](/screenshots_githubs/Card-Forge-forge.jpg)
forge
Forge is a free and open-source digital collectible card game (CCG) engine written in Java. It is designed to be easy to use and extend, and it comes with a variety of features that make it a great choice for developers who want to create their own CCGs. Forge is used by a number of popular CCGs, including Ascension, Dominion, and Thunderstone.
![AiLearning-Theory-Applying Screenshot](/screenshots_githubs/ben1234560-AiLearning-Theory-Applying.jpg)
AiLearning-Theory-Applying
This repository provides a comprehensive guide to understanding and applying artificial intelligence (AI) theory, including basic knowledge, machine learning, deep learning, and natural language processing (BERT). It features detailed explanations, annotated code, and datasets to help users grasp the concepts and implement them in practice. The repository is continuously updated to ensure the latest information and best practices are covered.
![unilm Screenshot](/screenshots_githubs/microsoft-unilm.jpg)
unilm
The 'unilm' repository is a collection of tools, models, and architectures for Foundation Models and General AI, focusing on tasks such as NLP, MT, Speech, Document AI, and Multimodal AI. It includes various pre-trained models, such as UniLM, InfoXLM, DeltaLM, MiniLM, AdaLM, BEiT, LayoutLM, WavLM, VALL-E, and more, designed for tasks like language understanding, generation, translation, vision, speech, and multimodal processing. The repository also features toolkits like s2s-ft for sequence-to-sequence fine-tuning and Aggressive Decoding for efficient sequence-to-sequence decoding. Additionally, it offers applications like TrOCR for OCR, LayoutReader for reading order detection, and XLM-T for multilingual NMT.
For similar tasks
![phospho Screenshot](/screenshots_githubs/phospho-app-phospho.jpg)
phospho
Phospho is a text analytics platform for LLM apps. It helps you detect issues and extract insights from text messages of your users or your app. You can gather user feedback, measure success, and iterate on your app to create the best conversational experience for your users.
![Awesome-Segment-Anything Screenshot](/screenshots_githubs/liliu-avril-Awesome-Segment-Anything.jpg)
Awesome-Segment-Anything
Awesome-Segment-Anything is a powerful tool for segmenting and extracting information from various types of data. It provides a user-friendly interface to easily define segmentation rules and apply them to text, images, and other data formats. The tool supports both supervised and unsupervised segmentation methods, allowing users to customize the segmentation process based on their specific needs. With its versatile functionality and intuitive design, Awesome-Segment-Anything is ideal for data analysts, researchers, content creators, and anyone looking to efficiently extract valuable insights from complex datasets.
![mslearn-knowledge-mining Screenshot](/screenshots_githubs/MicrosoftLearning-mslearn-knowledge-mining.jpg)
mslearn-knowledge-mining
The mslearn-knowledge-mining repository contains lab files for Azure AI Knowledge Mining modules. It provides resources for learning and implementing knowledge mining techniques using Azure AI services. The repository is designed to help users explore and understand how to leverage AI for knowledge mining purposes within the Azure ecosystem.
![summarize Screenshot](/screenshots_githubs/martinopiaggi-summarize.jpg)
summarize
The 'summarize' tool is designed to transcribe and summarize videos from various sources using AI models. It helps users efficiently summarize lengthy videos, take notes, and extract key insights by providing timestamps, original transcripts, and support for auto-generated captions. Users can utilize different AI models via Groq, OpenAI, or custom local models to generate grammatically correct video transcripts and extract wisdom from video content. The tool simplifies the process of summarizing video content, making it easier to remember and reference important information.
![docq Screenshot](/screenshots_githubs/docqai-docq.jpg)
docq
Docq is a private and secure GenAI tool designed to extract knowledge from business documents, enabling users to find answers independently. It allows data to stay within organizational boundaries, supports self-hosting with various cloud vendors, and offers multi-model and multi-modal capabilities. Docq is extensible, open-source (AGPLv3), and provides commercial licensing options. The tool aims to be a turnkey solution for organizations to adopt AI innovation safely, with plans for future features like more data ingestion options and model fine-tuning.
![towhee Screenshot](/screenshots_githubs/towhee-io-towhee.jpg)
towhee
Towhee is a cutting-edge framework designed to streamline the processing of unstructured data through the use of Large Language Model (LLM) based pipeline orchestration. It can extract insights from diverse data types like text, images, audio, and video files using generative AI and deep learning models. Towhee offers rich operators, prebuilt ETL pipelines, and a high-performance backend for efficient data processing. With a Pythonic API, users can build custom data processing pipelines easily. Towhee is suitable for tasks like sentence embedding, image embedding, video deduplication, question answering with documents, and cross-modal retrieval based on CLIP.
![codellm-devkit Screenshot](/screenshots_githubs/IBM-codellm-devkit.jpg)
codellm-devkit
Codellm-devkit (CLDK) is a Python library that serves as a multilingual program analysis framework bridging traditional static analysis tools and Large Language Models (LLMs) specialized for code (CodeLLMs). It simplifies the process of analyzing codebases across multiple programming languages, enabling the extraction of meaningful insights and facilitating LLM-based code analysis. The library provides a unified interface for integrating outputs from various analysis tools and preparing them for effective use by CodeLLMs. Codellm-devkit aims to enable the development and experimentation of robust analysis pipelines that combine traditional program analysis tools and CodeLLMs, reducing friction in multi-language code analysis and ensuring compatibility across different tools and LLM platforms. It is designed to seamlessly integrate with popular analysis tools like WALA, Tree-sitter, LLVM, and CodeQL, acting as a crucial intermediary layer for efficient communication between these tools and CodeLLMs. The project is continuously evolving to include new tools and frameworks, maintaining its versatility for code analysis and LLM integration.
![classifai Screenshot](/screenshots_githubs/10up-classifai.jpg)
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
For similar jobs
![LLMStack Screenshot](/screenshots_githubs/trypromptly-LLMStack.jpg)
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
![daily-poetry-image Screenshot](/screenshots_githubs/liruifengv-daily-poetry-image.jpg)
daily-poetry-image
Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.
![exif-photo-blog Screenshot](/screenshots_githubs/sambecker-exif-photo-blog.jpg)
exif-photo-blog
EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.
![SillyTavern Screenshot](/screenshots_githubs/SillyTavern-SillyTavern.jpg)
SillyTavern
SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.
![Twitter-Insight-LLM Screenshot](/screenshots_githubs/AlexZhangji-Twitter-Insight-LLM.jpg)
Twitter-Insight-LLM
This project enables you to fetch liked tweets from Twitter (using Selenium), save it to JSON and Excel files, and perform initial data analysis and image captions. This is part of the initial steps for a larger personal project involving Large Language Models (LLMs).
![AISuperDomain Screenshot](/screenshots_githubs/win4r-AISuperDomain.jpg)
AISuperDomain
Aila Desktop Application is a powerful tool that integrates multiple leading AI models into a single desktop application. It allows users to interact with various AI models simultaneously, providing diverse responses and insights to their inquiries. With its user-friendly interface and customizable features, Aila empowers users to engage with AI seamlessly and efficiently. Whether you're a researcher, student, or professional, Aila can enhance your AI interactions and streamline your workflow.
![ChatGPT-On-CS Screenshot](/screenshots_githubs/lrhh123-ChatGPT-On-CS.jpg)
ChatGPT-On-CS
This project is an intelligent dialogue customer service tool based on a large model, which supports access to platforms such as WeChat, Qianniu, Bilibili, Douyin Enterprise, Douyin, Doudian, Weibo chat, Xiaohongshu professional account operation, Xiaohongshu, Zhihu, etc. You can choose GPT3.5/GPT4.0/ Lazy Treasure Box (more platforms will be supported in the future), which can process text, voice and pictures, and access external resources such as operating systems and the Internet through plug-ins, and support enterprise AI applications customized based on their own knowledge base.
![obs-localvocal Screenshot](/screenshots_githubs/occ-ai-obs-localvocal.jpg)
obs-localvocal
LocalVocal is a live-streaming AI assistant plugin for OBS that allows you to transcribe audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). It's privacy-first, with all data staying on your machine, and requires no GPU, cloud costs, network, or downtime.