video2blog
视频转图文 AI 跨平台客户端(win mac linux) electron vite vue3 sqlite3 naive-ui
Stars: 58
video2blog is an open-source project aimed at converting videos into textual notes. The tool follows a process of extracting video information using yt-dlp, downloading the video, downloading subtitles if available, translating subtitles if not in Chinese, generating Chinese subtitles using whisper if no subtitles exist, converting subtitles to articles using gemini, and manually inserting images from the video into the article. The tool provides a solution for creating blog content from video resources, enhancing accessibility and content creation efficiency.
README:
- 本开源项目的想法来自于我自己:具体的思考过程可点击链接查看
- 微信公众号:那个曾经的少年回来了
- 后台回复:video2blog 即可进群获取了解最新信息,也方便有想法的可以随时沟通
- 一个是将视频转换为图文笔记(另一个目标还没来的及确定)
视频转图文的思路流程
1、输入视频url之后
2、先通过yt-dlp解析视频url获取视频信息
3、通过yt-dlp下载视频
4、如果存在字幕,则直接进行下载
5、可能是非中文字幕,则需要进行翻译字幕
6、如果不存在字幕,则通过whisper来生成字幕文件,并翻译为中文
7、然后通过gemini将字幕转换为文章。并将视频中的图片进行提取,手动插入到文章中
- node和npm的版本在这附近应该都可以跑起来
node -v //20.11.0
npm -v //10.2.4
- python和pip运行版本
python3 -v //3.11.2
pip3 -v //24.0
- 安装项目依赖
npm i
- 本地window下运行
npm run start-win
- 本地mac下运行
npm run start-mac
主要在于win下命令行中中文乱码,mac下不会出现这个问题,于是使用 chcp 65001 命令来解决这个问题
-
通过git仓库打tag标签来触发编译 github action workflow
-
其中python脚本
在window开发环境下打包,会在/command/win目录下生成exe文件
pyinstaller --onefile RemoveDuplicateImages.py -y --distpath ../command/win -n executename.exe(executename)
如何安装pyinstaller呢
pip install pyinstaller
- 在python/xxxxx目录下 可进行安装依赖
pip3 install -r requirements.txt
- 如何将依赖安装到 requirements.txt中
pip3 install xxx
pip3 freeze > requirements.txt
如何单独执行去重命令 window下 python main.py H:\github\electron-vite-tools\command\2024-05-10-16-29-38\000000133 30
// 第一个版本
现在你作为一个科技博主,请先精读上面的字幕,然后根据字幕内容再进行分段,分的段落不要太多,尽量保持在4到8段左右,分段后要对分段内容进行整理,注意一定不需要总结也不要进行删减内容,只是进行整理和微调,并标记字幕时间的区间
//第二个版本
现在你作为一个科技博主,请先精读上面的字幕,然后直接对上面的字幕进行整理成一篇文章进行输出,记住一定一定不要删减任何内容,也不要进行总结。对输出的文章增加目录功能,而且目录标题一定一定要精简,并且在目录上添加字幕时间区间,目录最多4到8个,不能再多。并且再次提醒你,目录下的内容不能进行删减和总结哟
//第三个版本
现在你作为一个科技博主,请先精读上面的字幕,然后直接对上面的字幕进行整理成一篇文章进行输出,记住一定一定不要删减任何内容,也不要进行总结。对输出的文章增加目录功能,而且目录标题一定一定要精简,并且在目录上添加字幕时间区间,目录最多4到8个,一定一定不能再多了。并且再次提醒你,目录下的所有内容不能进行删减和总结哟,后半段的内容也不能进行偷工减料。
将上述输出的内容最终转换为Quill 富文本编辑器的Delta的JSON格式。
//第四个版本
现在你作为一个科技博主,请先精读上面的字幕,然后直接对上面的字幕进行整理成一篇文章进行输出,记住一定一定不要删减任何内容,也不要进行总结。对输出的文章增加目录功能,而且目录标题一定一定要精简,并且在目录上添加字幕时间区间,目录最多4到8个,一定一定不能再多了。并且再次提醒你,目录下的所有内容不能进行删减和总结哟,后半段的内容也不能进行偷工减料。将上述输出的内容最终转换为markdown格式,目录上添加##,目录下的内容不做任何处理就行了
- 暂定的终极版本
角色: 你现在作为一个资深的科技博主
任务:
1、精读字幕: 请仔细阅读提供的字幕内容。
2、生成博文: 将字幕内容整理成一篇博文,务必保留所有信息,不做任何删减或总结。
3、创建目录:
目录标题需精简,并包含对应内容的时间区间,时间区间要精确。
目录数量控制在 4-8 个。
目录格式使用 markdown,即在标题前添加 ##。
所有生成的目录后面都要添加时间区间,前言中的目录列表可以不添加时间区间。
4、正文格式:
保留字幕内容的完整性,不做任何删减或总结。要整理成博文内容啊。
无需对正文内容进行 markdown 格式处理。
目标:
生成一篇包含完整字幕内容的博文,并配有清晰、精简的目录,方便读者阅读和导航。开头是前言加上目录,然后后面以目录正文的形式展示剩余内容。
注意:
确保忠实于原始字幕内容,避免信息丢失。
目录应简洁明了,方便读者快速定位所需信息。
优化说明:
在原提示词的基础上,强调了保留所有信息的重要性,避免博文内容被删减。
明确了目录格式的要求,使用 markdown 形式,并限制了目录数量,确保简洁易读。
细化了任务步骤,使指令更清晰易懂。
最终我直接复制markdown内容使用。
---------------------------------
按照这个格式给我输出一个模板我看看
-
ffmpeg
-
参考文档
-
初始化项目
-
主进程和渲染进程间的通信
-
Menu菜单的设置
-
引入node原生模块
-
Electron 获取当前用户data存放目录
-
npm build 报错 node-gyp
-
sqlite3 操作数据库api
-
python 打包成exe
-
electron-build 打包添加静态资源
- https://www.cnblogs.com/mrwh/p/12961446.html?ivk_sa=1024320u 区分开发环境和打包后的环境进行处理
-
electron-builder编译时报错
reason=prebuild-install failed with error (run with env DEBUG=electron-builder to get more information)
error=prebuild-install info begin Prebuild-install version 7.1.2
prebuild-install warn This package does not support N-API version 36
解决的办法是因为sqlite3的版本问题 npm install -E [email protected]
- whisper 模型
//https://www.bilibili.com/read/cv23285680/
//https://blog.csdn.net/a71468293a/article/details/135995878
// 下载模型
model_size_or_path="指定模型位置"
如果不指定下载模型的位置,则下载到默认的路径 C:\Users\Administrator\.cache\whisper
- mac下编译报错(sh: electron-builder: command not found)
npm i electron-builder
-
无法打开“yt-dlp”,因为Apple无法检查其是否包含恶意软件。
- https://www.jb51.net/os/MAC/881275.html
- 系统设置=>隐私与安全性=>往下拉可以看到=>安全性 yt-dlp =>点击允许即可
-
安装poetry 来管理python包
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for video2blog
Similar Open Source Tools
video2blog
video2blog is an open-source project aimed at converting videos into textual notes. The tool follows a process of extracting video information using yt-dlp, downloading the video, downloading subtitles if available, translating subtitles if not in Chinese, generating Chinese subtitles using whisper if no subtitles exist, converting subtitles to articles using gemini, and manually inserting images from the video into the article. The tool provides a solution for creating blog content from video resources, enhancing accessibility and content creation efficiency.
aiogram_bot_template
Aiogram bot template is a boilerplate for creating Telegram bots using Aiogram framework. It provides a solid foundation for building robust and scalable bots with a focus on code organization, database integration, and localization.
CrewAI-GUI
CrewAI-GUI is a Node-Based Frontend tool designed to revolutionize AI workflow creation. It empowers users to design complex AI agent interactions through an intuitive drag-and-drop interface, export designs to JSON for modularity and reusability, and supports both GPT-4 API and Ollama for flexible AI backend. The tool ensures cross-platform compatibility, allowing users to create AI workflows on Windows, Linux, or macOS efficiently.
aicommit2
AICommit2 is a Reactive CLI tool that streamlines interactions with various AI providers such as OpenAI, Anthropic Claude, Gemini, Mistral AI, Cohere, and unofficial providers like Huggingface and Clova X. Users can request multiple AI simultaneously to generate git commit messages without waiting for all AI responses. The tool runs 'git diff' to grab code changes, sends them to configured AI, and returns the AI-generated commit message. Users can set API keys or Cookies for different providers and configure options like locale, generate number of messages, commit type, proxy, timeout, max-length, and more. AICommit2 can be used both locally with Ollama and remotely with supported providers, offering flexibility and efficiency in generating commit messages.
awesome-rag
Awesome RAG is a curated list of retrieval-augmented generation (RAG) in large language models. It includes papers, surveys, general resources, lectures, talks, tutorials, workshops, tools, and other collections related to retrieval-augmented generation. The repository aims to provide a comprehensive overview of the latest advancements, techniques, and applications in the field of RAG.
WebAI-to-API
This project implements a web API that offers a unified interface to Google Gemini and Claude 3. It provides a self-hosted, lightweight, and scalable solution for accessing these AI models through a streaming API. The API supports both Claude and Gemini models, allowing users to interact with them in real-time. The project includes a user-friendly web UI for configuration and documentation, making it easy to get started and explore the capabilities of the API.
asktube
AskTube is an AI-powered YouTube video summarizer and QA assistant that utilizes Retrieval Augmented Generation (RAG) technology. It offers a comprehensive solution with Q&A functionality and aims to provide a user-friendly experience for local machine usage. The project integrates various technologies including Python, JS, Sanic, Peewee, Pytubefix, Sentence Transformers, Sqlite, Chroma, and NuxtJs/DaisyUI. AskTube supports multiple providers for analysis, AI services, and speech-to-text conversion. The tool is designed to extract data from YouTube URLs, store embedding chapter subtitles, and facilitate interactive Q&A sessions with enriched questions. It is not intended for production use but rather for end-users on their local machines.
mistral.rs
Mistral.rs is a fast LLM inference platform written in Rust. We support inference on a variety of devices, quantization, and easy-to-use application with an Open-AI API compatible HTTP server and Python bindings.
RTXZY-MD
RTXZY-MD is a bot tool that supports file hosting, QR code, pairing code, and RestApi features. Users must fill in the Apikey for the bot to function properly. It is not recommended to install the bot on platforms lacking ffmpeg, imagemagick, webp, or express.js support. The tool allows for 95% implementation of website api and supports free and premium ApiKeys. Users can join group bots and get support from Sociabuzz. The tool can be run on Heroku with specific buildpacks and is suitable for Windows/VPS/RDP users who need Git, NodeJS, FFmpeg, and ImageMagick installations.
ort
Ort is an unofficial ONNX Runtime 1.17 wrapper for Rust based on the now inactive onnxruntime-rs. ONNX Runtime accelerates ML inference on both CPU and GPU.
TalkWithGemini
Talk With Gemini is a web application that allows users to deploy their private Gemini application for free with one click. It supports Gemini Pro and Gemini Pro Vision models. The application features talk mode for direct communication with Gemini, visual recognition for understanding picture content, full Markdown support, automatic compression of chat records, privacy and security with local data storage, well-designed UI with responsive design, fast loading speed, and multi-language support. The tool is designed to be user-friendly and versatile for various deployment options and language preferences.
LLMTSCS
LLMLight is a novel framework that employs Large Language Models (LLMs) as decision-making agents for Traffic Signal Control (TSC). The framework leverages the advanced generalization capabilities of LLMs to engage in a reasoning and decision-making process akin to human intuition for effective traffic control. LLMLight has been demonstrated to be remarkably effective, generalizable, and interpretable against various transportation-based and RL-based baselines on nine real-world and synthetic datasets.
rag-chatbot
rag-chatbot is a tool that allows users to chat with multiple PDFs using Ollama and LlamaIndex. It provides an easy setup for running on local machines or Kaggle notebooks. Users can leverage models from Huggingface and Ollama, process multiple PDF inputs, and chat in multiple languages. The tool offers a simple UI with Gradio, supporting chat with history and QA modes. Setup instructions are provided for both Kaggle and local environments, including installation steps for Docker, Ollama, Ngrok, and the rag_chatbot package. Users can run the tool locally and access it via a web interface. Future enhancements include adding evaluation, better embedding models, knowledge graph support, improved document processing, MLX model integration, and Corrective RAG.
farfalle
Farfalle is an open-source AI-powered search engine that allows users to run their own local LLM or utilize the cloud. It provides a tech stack including Next.js for frontend, FastAPI for backend, Tavily for search API, Logfire for logging, and Redis for rate limiting. Users can get started by setting up prerequisites like Docker and Ollama, and obtaining API keys for Tavily, OpenAI, and Groq. The tool supports models like llama3, mistral, and gemma. Users can clone the repository, set environment variables, run containers using Docker Compose, and deploy the backend and frontend using services like Render and Vercel.
wzry_ai
This is an open-source project for playing the game King of Glory with an artificial intelligence model. The first phase of the project has been completed, and future upgrades will be built upon this foundation. The second phase of the project has started, and progress is expected to proceed according to plan. For any questions, feel free to join the QQ exchange group: 687853827. The project aims to learn artificial intelligence and strictly prohibits cheating. Detailed installation instructions are available in the doc/README.md file. Environment installation video: (bilibili) Welcome to follow, like, tip, comment, and provide your suggestions.
morgana-form
MorGana Form is a full-stack form builder project developed using Next.js, React, TypeScript, Ant Design, PostgreSQL, and other technologies. It allows users to quickly create and collect data through survey forms. The project structure includes components, hooks, utilities, pages, constants, Redux store, themes, types, server-side code, and component packages. Environment variables are required for database settings, NextAuth login configuration, and file upload services. Additionally, the project integrates an AI model for form generation using the Ali Qianwen model API.
For similar tasks
video2blog
video2blog is an open-source project aimed at converting videos into textual notes. The tool follows a process of extracting video information using yt-dlp, downloading the video, downloading subtitles if available, translating subtitles if not in Chinese, generating Chinese subtitles using whisper if no subtitles exist, converting subtitles to articles using gemini, and manually inserting images from the video into the article. The tool provides a solution for creating blog content from video resources, enhancing accessibility and content creation efficiency.
AiEditor
AiEditor is a next-generation rich text editor for AI, based on Web Component and supporting various front-end frameworks. It offers two themes, light and dark, along with flexible configuration for developing text editing applications. The editor includes features for basic text formatting, enhancements like undo/redo and format painter, support for attachments like images and videos, code-related functionalities, table manipulation, Markdown support, AI-related features such as continuation and optimization, and more. Planned improvements include collaboration, automated testing, AI picture insertion and drawing, enhanced paste features, WORD and PDF export, Notion-like operations, and integration with ChatGPT.
gpt-subtrans
GPT-Subtrans is an open-source subtitle translator that utilizes large language models (LLMs) as translation services. It supports translation between any language pairs that the language model supports. Note that GPT-Subtrans requires an active internet connection, as subtitles are sent to the provider's servers for translation, and their privacy policy applies.
chatgpt-subtitle-translator
This tool utilizes the OpenAI ChatGPT API to translate text, with a focus on line-based translation, particularly for SRT subtitles. It optimizes token usage by removing SRT overhead and grouping text into batches, allowing for arbitrary length translations without excessive token consumption while maintaining a one-to-one match between line input and output.
TeroSubtitler
Tero Subtitler is an open source, cross-platform, and free subtitle editing software with a user-friendly interface. It offers fully fledged editing with SMPTE and MEDIA modes, support for various subtitle formats, multi-level undo/redo, search and replace, auto-backup, source and transcription modes, translation memory, audiovisual preview, timeline with waveform visualizer, manipulation tools, formatting options, quality control features, translation and transcription capabilities, validation tools, automation for correcting errors, and more. It also includes features like exporting subtitles to MP3, importing/exporting Blu-ray SUP format, generating blank video, generating video with hardcoded subtitles, video dubbing, and more. The tool utilizes powerful multimedia playback engines like mpv, advanced audio/video manipulation tools like FFmpeg, tools for automatic transcription like whisper.cpp/Faster-Whisper, auto-translation API like Google Translate, and ElevenLabs TTS for video dubbing.
AiNiee
AiNiee is a tool focused on AI translation, capable of automatically translating RPG SLG games, Epub TXT novels, Srt Lrc subtitles, and more. It provides features for configuring AI platforms, proxies, and translation settings. Users can utilize this tool for translating game scripts, novels, and subtitles efficiently. The tool supports multiple AI platforms and offers tutorials for beginners. It also includes functionalities for extracting and translating game text, with options for customizing translation projects and managing translation tasks effectively.
auto-subs
Auto-subs is a tool designed to automatically transcribe editing timelines using OpenAI Whisper and Stable-TS for extreme accuracy. It generates subtitles in a custom style, is completely free, and runs locally within Davinci Resolve. It works on Mac, Linux, and Windows, supporting both Free and Studio versions of Resolve. Users can jump to positions on the timeline using the Subtitle Navigator and translate from any language to English. The tool provides a user-friendly interface for creating and customizing subtitles for video content.
Srt-AI-Voice-Assistant
Srt-AI-Voice-Assistant is a convenient tool that generates audio from uploaded .srt subtitle files by calling APIs such as Bert-VITS2 (HiyoriUI), GPT-SoVITS, and Microsoft TTS (online). The code is currently not perfect, and feedback on bugs or suggestions can be provided at https://github.com/YYuX-1145/Srt-AI-Voice-Assistant/issues. Recent updates include adding custom API functionality with a focus on security, support for Microsoft online TTS (requires key configuration), error handling improvements, automatic project path detection, compatibility with API-v1 for limited functionality, and significant feature updates supporting card synthesis.
For similar jobs
exif-photo-blog
EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.
obsidian-textgenerator-plugin
Text Generator is an open-source AI Assistant Tool that leverages Generative Artificial Intelligence to enhance knowledge creation and organization in Obsidian. It allows users to generate ideas, titles, summaries, outlines, and paragraphs based on their knowledge database, offering endless possibilities. The plugin is free and open source, compatible with Obsidian for a powerful Personal Knowledge Management system. It provides flexible prompts, template engine for repetitive tasks, community templates for shared use cases, and highly flexible configuration with services like Google Generative AI, OpenAI, and HuggingFace.
video2blog
video2blog is an open-source project aimed at converting videos into textual notes. The tool follows a process of extracting video information using yt-dlp, downloading the video, downloading subtitles if available, translating subtitles if not in Chinese, generating Chinese subtitles using whisper if no subtitles exist, converting subtitles to articles using gemini, and manually inserting images from the video into the article. The tool provides a solution for creating blog content from video resources, enhancing accessibility and content creation efficiency.
obsidian-weaver
Obsidian Weaver is a plugin that integrates ChatGPT/GPT-3 into the note-taking workflow of Obsidian. It allows users to easily access AI-generated suggestions and insights within Obsidian, enhancing the writing and brainstorming process. The plugin respects Obsidian's philosophy of storing notes locally, ensuring data security and privacy. Weaver offers features like creating new chat sessions with the AI assistant and receiving instant responses, all within the Obsidian environment. It provides a seamless integration with Obsidian's interface, making the writing process efficient and helping users stay focused. The plugin is constantly being improved with new features and updates to enhance the note-taking experience.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
daily-poetry-image
Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.
SillyTavern
SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.
Twitter-Insight-LLM
This project enables you to fetch liked tweets from Twitter (using Selenium), save it to JSON and Excel files, and perform initial data analysis and image captions. This is part of the initial steps for a larger personal project involving Large Language Models (LLMs).