cine-flow
一款基于Python和PyQt6开发的智能视频混剪工具,专为短剧创作者设计。利用多种AI大模型实现自动字幕识别、智能混剪、特效添加等功能,并与剪映等主流剪辑软件深度集成。
Stars: 231
CineFlow AI is an AI-driven video creation desktop tool that covers various functionalities such as AI video narration, video editing, first-person monologue, short film clipping, and product promotion. It utilizes Python + PyQt6 and supports macOS and Windows. The tool offers capabilities in video understanding, subtitle extraction, audio-video synchronization, and AI voiceover. Users can export their projects in various formats like JSON for KineMaster, Adobe Premiere Pro XML, FCPXML, DaVinci Resolve, SRT subtitles, ASS subtitles, and MP4 video files. CineFlow AI supports different LLM providers like OpenAI (GPT-4o), 通义千问, Gemini, Kimi, GLM-5, Claude, Ollama, and Edge TTS for text, visual, and voice support. The project structure includes core components, AI services, audio and video analysis, export functionalities, MVVM ViewModel layer, PyQt6 UI, and a plugin system. The technology stack comprises PyQt6, OpenAI, 通义千问, Gemini, Kimi, GLM, OpenCV, FFmpeg, GPU acceleration, librosa, soundfile, Edge TTS, OpenAI TTS, and Whisper.
README:
AI 驱动的视频创作桌面工具,从画面理解到成片导出
CineFlow AI 是一款基于 Python + PyQt6 的 AI 视频创作客户端,支持 macOS 和 Windows。
| 功能 | 说明 |
|---|---|
| 🎙️ AI 视频解说 | 画面分析 → 生成解说文案 → AI 配音 → 动态字幕 |
| 🎵 AI 视频混剪 | 多素材 → 节拍匹配 → 自动转场 → 音画同步 |
| 🎭 AI 第一人称独白 | 画面情感分析 → 情感独白 → 电影字幕 |
| 📺 短剧切片 | 识别高能片段 → 自动切片 → 加字幕 |
| 🛍️ 产品推广 | 画面分析 → 卖点提取 → 推广文案 → 配音 |
- 视频级理解 — Gemini 视频直传 + 多帧连续分析
- 多模型视觉分析 — OpenAI GPT-4o / 通义千问 VL / Gemini Pro Vision,自动 fallback
- 叙事结构识别 — 故事线、角色、情感弧线、高潮标记
- 语音转文字 — Whisper API / 本地 Whisper 模型
- OCR 识别 — Vision API 从画面提取硬字幕
- 双模式合并 — 语音为主 + OCR 补充画面文字
- 节拍检测 — 基于 librosa 的 BPM / 节拍 / 能量分析
- 4 种同步策略 — 节拍踩点 / 乐句段落 / 能量匹配 / 混合模式
- 智能转场 — 强拍硬切、弱拍淡化,速度曲线跟随能量
- 内部生成 — Edge TTS(免费)/ OpenAI TTS
- 外部导入 — 支持 mp3 / wav / m4a 等格式
- 多种声音 — 晓晓、云扬、晓墨、云希等中文声音
| 格式 | 说明 |
|---|---|
| 剪映 |
.json 草稿,直接导入剪映电脑版 |
| Premiere | Adobe Premiere Pro XML |
| Final Cut | FCPXML 格式 |
| 达芬奇 | DaVinci Resolve(FCPXML) |
| SRT 字幕 | 通用字幕格式 |
| ASS 字幕 | 高级样式字幕 |
| 视频文件 | MP4 直接导出(GPU 加速) |
| 提供者 | 文本 | 视觉 | 配音 |
|---|---|---|---|
| OpenAI (GPT-4o) | ✅ | ✅ | ✅ |
| 通义千问 | ✅ | ✅ | ❌ |
| Gemini | ✅ | ✅ | ❌ |
| Kimi (月之暗面) | ✅ | ❌ | ❌ |
| GLM-5 (智谱) | ✅ | ❌ | ❌ |
| Claude | ✅ | ❌ | ❌ |
| 本地 (Ollama) | ✅ | ❌ | ❌ |
| Edge TTS | ❌ | ❌ | ✅ (免费) |
- Python 3.9+
- FFmpeg
git clone [email protected]:Agions/cine-flow.git
cd cine-flow
pip install -r requirements.txt# 方式一:.env 文件
echo 'OPENAI_API_KEY=sk-xxx' > .env
# 方式二:启动后在设置页面配置💡 没有 API Key 也能用!配音使用免费的 Edge TTS,文案可手动输入。
python app/main.pycine-flow/
├── app/
│ ├── core/ # 核心(DI容器、配置、事件总线、任务队列)
│ ├── services/
│ │ ├── ai/ # AI 服务(LLM、视觉分析、字幕提取、配音)
│ │ ├── audio/ # 音频分析(节拍检测、音画同步)
│ │ ├── video/ # 视频制作(解说、混剪、独白)
│ │ ├── video_service/ # 视频底层(GPU渲染、批量处理)
│ │ └── export/ # 导出(剪映/Premiere/达芬奇/字幕)
│ ├── viewmodels/ # MVVM ViewModel 层
│ ├── ui/ # PyQt6 UI
│ └── plugins/ # 插件系统
├── config/ # 配置文件
├── docs/ # 文档
│ ├── guides/ # 用户指南
│ ├── api/ # API 参考
│ └── dev/ # 开发文档
└── tests/ # 测试
- UI: PyQt6 + PyQt6-Fluent-Widgets
- AI: OpenAI / 通义千问 / Gemini / Kimi / GLM
- 视频: OpenCV + FFmpeg + GPU 加速
- 音频: librosa + soundfile
- 语音: Edge TTS + OpenAI TTS + Whisper
MIT License
欢迎提交 Issue 和 PR!详见 贡献指南。
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for cine-flow
Similar Open Source Tools
cine-flow
CineFlow AI is an AI-driven video creation desktop tool that covers various functionalities such as AI video narration, video editing, first-person monologue, short film clipping, and product promotion. It utilizes Python + PyQt6 and supports macOS and Windows. The tool offers capabilities in video understanding, subtitle extraction, audio-video synchronization, and AI voiceover. Users can export their projects in various formats like JSON for KineMaster, Adobe Premiere Pro XML, FCPXML, DaVinci Resolve, SRT subtitles, ASS subtitles, and MP4 video files. CineFlow AI supports different LLM providers like OpenAI (GPT-4o), 通义千问, Gemini, Kimi, GLM-5, Claude, Ollama, and Edge TTS for text, visual, and voice support. The project structure includes core components, AI services, audio and video analysis, export functionalities, MVVM ViewModel layer, PyQt6 UI, and a plugin system. The technology stack comprises PyQt6, OpenAI, 通义千问, Gemini, Kimi, GLM, OpenCV, FFmpeg, GPU acceleration, librosa, soundfile, Edge TTS, OpenAI TTS, and Whisper.
ppt-master
PPT Master is an AI-driven intelligent visual content generation system that converts source documents into high-quality SVG content through multi-role collaboration, supporting various formats such as presentation slides, social media posts, and marketing posters. It provides tools for PDF conversion, SVG post-processing, and PPTX export. Users can interact with AI editors to create content by describing their ideas. The system offers various AI roles for different tasks and provides a comprehensive documentation guide for workflow, design guidelines, canvas formats, image embedding best practices, chart templates, quick references, role definitions, tool usage instructions, example projects, and project workspace structure. Users can contribute to the project by enhancing design templates, chart components, documentation, bug reports, and feature suggestions. The project is open-source under the MIT License.
LunaBox
LunaBox is a lightweight, fast, and feature-rich tool for managing and tracking visual novels, with the ability to customize game categories, automatically track playtime, generate personalized reports through AI analysis, import data from other platforms, backup data locally or on cloud services, and ensure privacy and security by storing sensitive data locally. The tool supports multi-dimensional statistics, offers a variety of customization options, and provides a user-friendly interface for easy navigation and usage.
Lim-Code
LimCode is a powerful VS Code AI programming assistant that supports multiple AI models, intelligent tool invocation, and modular architecture. It features support for various AI channels, a smart tool system for code manipulation, MCP protocol support for external tool extension, intelligent context management, session management, and more. Users can install LimCode from the plugin store or via VSIX, or build it from the source code. The tool offers a rich set of features for AI programming and code manipulation within the VS Code environment.
ai-toolbox
AI Toolbox is a cross-platform desktop application designed to efficiently manage various AI programming assistant configurations. It supports Windows, macOS, and Linux. The tool provides visual management of OpenCode, Oh-My-OpenCode, Slim plugin configurations, Claude Code API supplier configurations, Codex CLI configurations, MCP server management, Skills management, WSL synchronization, AI supplier management, system tray for quick configuration switching, data backup, theme switching, multilingual support, and automatic update checks.
Unity-Skills
UnitySkills is an AI-driven Unity editor automation engine based on REST API. It allows AI to directly control Unity scenes through Skills. The tool offers extreme efficiency with Result Truncation and SKILL.md slimming, a versatile tool library with 282 Skills supporting Batch operations, ensuring transactional safety with automatic rollback, multiple instance support for controlling multiple Unity projects simultaneously, deep integration with Antigravity Slash Commands for interactive experience, compatibility with popular AI terminals like Claude Code, Antigravity, Gemini CLI, and support for Cinemachine 2.x/3.x dual versions with advanced camera control features like MixingCamera, ClearShot, TargetGroup, and Spline.
torch-rechub
Torch-RecHub is a lightweight, efficient, and user-friendly PyTorch recommendation system framework. It provides easy-to-use solutions for industrial-level recommendation systems, with features such as generative recommendation models, modular design for adding new models and datasets, PyTorch-based implementation for GPU acceleration, a rich library of 30+ classic and cutting-edge recommendation algorithms, standardized data loading, training, and evaluation processes, easy configuration through files or command-line parameters, reproducibility of experimental results, ONNX model export for production deployment, cross-engine data processing with PySpark support, and experiment visualization and tracking with integrated tools like WandB, SwanLab, and TensorBoardX.
VideoCaptioner
VideoCaptioner is a video subtitle processing assistant based on a large language model (LLM), supporting speech recognition, subtitle segmentation, optimization, translation, and full-process handling. It is user-friendly and does not require high configuration, supporting both network calls and local offline (GPU-enabled) speech recognition. It utilizes a large language model for intelligent subtitle segmentation, correction, and translation, providing stunning subtitles for videos. The tool offers features such as accurate subtitle generation without GPU, intelligent segmentation and sentence splitting based on LLM, AI subtitle optimization and translation, batch video subtitle synthesis, intuitive subtitle editing interface with real-time preview and quick editing, and low model token consumption with built-in basic LLM model for easy use.
WenShape
WenShape is a context engineering system for creating long novels. It addresses the challenge of narrative consistency over thousands of words by using an orchestrated writing process, dynamic fact tracking, and precise token budget management. All project data is stored in YAML/Markdown/JSONL text format, naturally supporting Git version control.
prisma-ai
Prisma-AI is an open-source tool designed to assist users in their job search process by addressing common challenges such as lack of project highlights, mismatched resumes, difficulty in learning, and lack of answers in interview experiences. The tool utilizes AI to analyze user experiences, generate actionable project highlights, customize resumes for specific job positions, provide study materials for efficient learning, and offer structured interview answers. It also features a user-friendly interface for easy deployment and supports continuous improvement through user feedback and collaboration.
nndeploy
nndeploy is a tool that allows you to quickly build your visual AI workflow without the need for frontend technology. It provides ready-to-use algorithm nodes for non-AI programmers, including large language models, Stable Diffusion, object detection, image segmentation, etc. The workflow can be exported as a JSON configuration file, supporting Python/C++ API for direct loading and running, deployment on cloud servers, desktops, mobile devices, edge devices, and more. The framework includes mainstream high-performance inference engines and deep optimization strategies to help you transform your workflow into enterprise-level production applications.
lingti-bot
lingti-bot is an AI Bot platform that integrates MCP Server, multi-platform message gateway, rich toolset, intelligent conversation, and voice interaction. It offers core advantages like zero-dependency deployment with a single 30MB binary file, cloud relay support for quick integration with enterprise WeChat/WeChat Official Account, built-in browser automation with CDP protocol control, 75+ MCP tools covering various scenarios, native support for Chinese platforms like DingTalk, Feishu, enterprise WeChat, WeChat Official Account, and more. It is embeddable, supports multiple AI backends like Claude, DeepSeek, Kimi, MiniMax, and Gemini, and allows access from platforms like DingTalk, Feishu, enterprise WeChat, WeChat Official Account, Slack, Telegram, and Discord. The bot is designed with simplicity as the highest design principle, focusing on zero-dependency deployment, embeddability, plain text output, code restraint, and cloud relay support.
torra-community
Torra Community Edition is a modern AI workflow and intelligent agent visualization editor based on Nuxt 4. It offers a lightweight but production-ready architecture with frontend VueFlow + Tailwind v4 + shadcn/ui, backend FeathersJS, and built-in LangChain.js runtime. It supports multiple databases (SQLite/MySQL/MongoDB) and local ↔ cloud hot switching. The tool covers various tasks such as visual workflow editing, modern UI, native integration of LangChain.js, pluggable storage options, full-stack TypeScript implementation, and more. It is designed for enterprises looking for an easy-to-deploy and scalable solution for AI workflows.
minimind-notes
MiniMind is a modular training guide for Large Language Models (LLMs), aiming to help developers deeply understand the training mechanism of modern large language models such as Llama and GPT through concise code and comparative experiments. It prioritizes principles over operations, provides experiments for each design choice, consists of 6 independent modules from basic components to complete architecture, and offers low entry barriers for learning. Suitable for individuals preparing for jobs in large model fields, students/researchers in machine learning/deep learning, developers, and learners with basic PyTorch knowledge who seek a deep understanding of LLMs. Not suitable for complete beginners, users only interested in deploying models quickly without caring about principles, and those looking for production-level code and best practices.
tradecat
TradeCat is a comprehensive data analysis and trading platform designed for cryptocurrency, stock, and macroeconomic data. It offers a wide range of features including multi-market data collection, technical indicator modules, AI analysis, signal detection engine, Telegram bot integration, and more. The platform utilizes technologies like Python, TimescaleDB, TA-Lib, Pandas, NumPy, and various APIs to provide users with valuable insights and tools for trading decisions. With a modular architecture and detailed documentation, TradeCat aims to empower users in making informed trading decisions across different markets.
hello-agents
Hello-Agents is a comprehensive tutorial on building intelligent agent systems, covering both theoretical foundations and practical applications. The tutorial aims to guide users in understanding and building AI-native agents, diving deep into core principles, architectures, and paradigms of intelligent agents. Users will learn to develop their own multi-agent applications from scratch, gaining hands-on experience with popular low-code platforms and agent frameworks. The tutorial also covers advanced topics such as memory systems, context engineering, communication protocols, and model training. By the end of the tutorial, users will have the skills to develop real-world projects like intelligent travel assistants and cyber towns.
For similar tasks
cine-flow
CineFlow AI is an AI-driven video creation desktop tool that covers various functionalities such as AI video narration, video editing, first-person monologue, short film clipping, and product promotion. It utilizes Python + PyQt6 and supports macOS and Windows. The tool offers capabilities in video understanding, subtitle extraction, audio-video synchronization, and AI voiceover. Users can export their projects in various formats like JSON for KineMaster, Adobe Premiere Pro XML, FCPXML, DaVinci Resolve, SRT subtitles, ASS subtitles, and MP4 video files. CineFlow AI supports different LLM providers like OpenAI (GPT-4o), 通义千问, Gemini, Kimi, GLM-5, Claude, Ollama, and Edge TTS for text, visual, and voice support. The project structure includes core components, AI services, audio and video analysis, export functionalities, MVVM ViewModel layer, PyQt6 UI, and a plugin system. The technology stack comprises PyQt6, OpenAI, 通义千问, Gemini, Kimi, GLM, OpenCV, FFmpeg, GPU acceleration, librosa, soundfile, Edge TTS, OpenAI TTS, and Whisper.
wunjo.wladradchenko.ru
Wunjo AI is a comprehensive tool that empowers users to explore the realm of speech synthesis, deepfake animations, video-to-video transformations, and more. Its user-friendly interface and privacy-first approach make it accessible to both beginners and professionals alike. With Wunjo AI, you can effortlessly convert text into human-like speech, clone voices from audio files, create multi-dialogues with distinct voice profiles, and perform real-time speech recognition. Additionally, you can animate faces using just one photo combined with audio, swap faces in videos, GIFs, and photos, and even remove unwanted objects or enhance the quality of your deepfakes using the AI Retouch Tool. Wunjo AI is an all-in-one solution for your voice and visual AI needs, offering endless possibilities for creativity and expression.
Open-Sora-Plan
Open-Sora-Plan is a project that aims to create a simple and scalable repo to reproduce Sora (OpenAI, but we prefer to call it "ClosedAI"). The project is still in its early stages, but the team is working hard to improve it and make it more accessible to the open-source community. The project is currently focused on training an unconditional model on a landscape dataset, but the team plans to expand the scope of the project in the future to include text2video experiments, training on video2text datasets, and controlling the model with more conditions.
Rewind-AI-Main
Rewind AI is a free and open-source AI-powered video editing tool that allows users to easily create and edit videos. It features a user-friendly interface, a wide range of editing tools, and support for a variety of video formats. Rewind AI is perfect for beginners and experienced video editors alike.
Whisper-WebUI
Whisper-WebUI is a Gradio-based browser interface for Whisper, serving as an Easy Subtitle Generator. It supports generating subtitles from various sources such as files, YouTube, and microphone. The tool also offers speech-to-text and text-to-text translation features, utilizing Facebook NLLB models and DeepL API. Users can translate subtitle files from other languages to English and vice versa. The project integrates faster-whisper for improved VRAM usage and transcription speed, providing efficiency metrics for optimized whisper models. Additionally, users can choose from different Whisper models based on size and language requirements.
CushyStudio
CushyStudio is a generative AI platform designed for creatives of any level to effortlessly create stunning images, videos, and 3D models. It offers CushyApps, a collection of visual tools tailored for different artistic tasks, and CushyKit, an extensive toolkit for custom apps development and task automation. Users can dive into the AI revolution, unleash their creativity, share projects, and connect with a vibrant community. The platform aims to simplify the AI art creation process and provide a user-friendly environment for designing interfaces, adding custom logic, and accessing various tools.
FunClip
FunClip is an open-source, locally deployable automated video editing tool that utilizes the FunASR Paraformer series models from Alibaba DAMO Academy for speech recognition in videos. Users can select text segments or speakers from the recognition results and click the clip button to obtain the corresponding video segments. FunClip integrates advanced features such as the Paraformer-Large model for accurate Chinese ASR, SeACo-Paraformer for customized hotword recognition, CAM++ speaker recognition model, Gradio interactive interface for easy usage, support for multiple free edits with automatic SRT subtitles generation, and segment-specific SRT subtitles.
ChopperBot
A multifunctional, intelligent, personalized, scalable, easy to build, and fully automated multi platform intelligent live video editing and publishing robot. ChopperBot is a comprehensive AI tool that automatically analyzes and slices the most interesting clips from popular live streaming platforms, generates and publishes content, and manages accounts. It supports plugin DIY development and hot swapping functionality, making it easy to customize and expand. With ChopperBot, users can quickly build their own live video editing platform without the need to install any software, thanks to its visual management interface.
For similar jobs
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
daily-poetry-image
Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.
exif-photo-blog
EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.
SillyTavern
SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.
Twitter-Insight-LLM
This project enables you to fetch liked tweets from Twitter (using Selenium), save it to JSON and Excel files, and perform initial data analysis and image captions. This is part of the initial steps for a larger personal project involving Large Language Models (LLMs).
AISuperDomain
Aila Desktop Application is a powerful tool that integrates multiple leading AI models into a single desktop application. It allows users to interact with various AI models simultaneously, providing diverse responses and insights to their inquiries. With its user-friendly interface and customizable features, Aila empowers users to engage with AI seamlessly and efficiently. Whether you're a researcher, student, or professional, Aila can enhance your AI interactions and streamline your workflow.
ChatGPT-On-CS
This project is an intelligent dialogue customer service tool based on a large model, which supports access to platforms such as WeChat, Qianniu, Bilibili, Douyin Enterprise, Douyin, Doudian, Weibo chat, Xiaohongshu professional account operation, Xiaohongshu, Zhihu, etc. You can choose GPT3.5/GPT4.0/ Lazy Treasure Box (more platforms will be supported in the future), which can process text, voice and pictures, and access external resources such as operating systems and the Internet through plug-ins, and support enterprise AI applications customized based on their own knowledge base.
obs-localvocal
LocalVocal is a live-streaming AI assistant plugin for OBS that allows you to transcribe audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). It's privacy-first, with all data staying on your machine, and requires no GPU, cloud costs, network, or downtime.