BigBanana-AI-Director

BigBanana AI Director是一个一站式 AI短剧，AI漫剧,AI导演平台，面向创作者，实现从灵感到成片的高效生产。它摇弃了传统的“抽卡式”生成，采用 "Script-to-Asset-to-Keyframe" 的工业化工作流。实现 “一句话生成完整短剧，从剧本到成片全自动化”，同时精准控制角色一致性、场景连续性与镜头运动。

Stars: 532

Visit

BigBanana AI Director is an industrial AI motion comic and video workbench platform that provides a one-stop solution for creating short dramas and comics. It utilizes a 'Script-to-Asset-to-Keyframe' workflow with advanced AI models to automate the process from script to final production, ensuring precise control over character consistency, scene continuity, and camera movements. The tool is designed to streamline the production process for creators, enabling efficient production from idea to finished product.

README:

BigBanana AI Director (AI 漫剧工场)

AI 一站式短剧/漫剧生成平台 Industrial AI Motion Comic & Video Workbench

BigBanana AI Director 是一个 AI 一站式短剧/漫剧平台，面向创作者,实现从灵感到成片的高效生产。

它摇弃了传统的"抽卡式"生成，采用 "Script-to-Asset-to-Keyframe" 的工业化工作流。通过深度集成 AntSK API 的先进 AI 模型，实现 "一句话生成完整短剧，从剧本到成片全自动化"，同时精准控制角色一致性、场景连续性与镜头运动。

界面展示

项目管理

Phase 01: 剧本与分镜

Phase 02: 角色与场景资产

Phase 03: 导演工作台

Phase 04: 成片导出

提示词管理

核心理念：关键帧驱动 (Keyframe-Driven)

传统的 Text-to-Video 往往难以控制具体的运镜和起止画面。BigBanana 引入了动画制作中的 关键帧 (Keyframe) 概念：

先画后动：先生成精准的起始帧 (Start) 和结束帧 (End)。
插值生成：利用 Veo 模型在两帧之间生成平滑的视频过渡。
资产约束：所有画面生成均受到“角色定妆照”和“场景概念图”的强约束，杜绝人物变形。

核心功能模块

Phase 01: 剧本与分镜 (Script & Storyboard)

智能剧本拆解：输入小说或故事大纲，AI 自动拆解为包含场次、时间、气氛的标准剧本结构。
视觉化翻译：自动将文字描述转化为专业的 Midjourney/Stable Diffusion 提示词。
节奏控制：支持设定目标时长（如 30s 预告片、3min 短剧），AI 自动规划镜头密度。
✨ 手动编辑 (NEW)：
- 编辑角色视觉描述和分镜画面提示词
- 编辑每个分镜的角色列表（添加/移除角色）
- 编辑分镜的动作描述和台词
- 确保生成结果符合预期，精准控制每个细节

Phase 02: 资产与选角 (Assets & Casting)

一致性定妆 (Character Consistency)：
- 为每个角色生成标准参考图 (Reference Image)。
- 衣橱系统 (Wardrobe System)：支持多套造型 (如：日常、战斗、受伤)，基于 Base Look 保持面部特征一致。
场景概念 (Set Design)：生成环境参考图，确保同一场景下的不同镜头光影统一。

Phase 03: 导演工作台 (Director Workbench)

网格化分镜表：全景式管理所有镜头 (Shots)。
精准控制：
- Start Frame: 生成镜头的起始画面（强一致性）。
- End Frame: (可选) 定义镜头结束时的状态（如：人物回头、光线变化）。
九宫格分镜预览 (NEW)：
- 一键拆分同一镜头的 9 个视角，先确认描述再生成九宫格图。
- 支持“整图用作首帧”或“裁剪单格用作首帧”，快速确定构图方案。
上下文感知：AI 生成镜头时，会自动读取 Context（当前场景图 + 当前角色特定服装图），彻底解决"不连戏"问题。
视频生成双模式：支持单图 Image-to-Video，也支持首尾帧 Keyframe Interpolation。

Phase 04: 成片与导出 (Export)

实时预览：时间轴形式预览生成的漫剧片段。
渲染追踪：实时监控 API 渲染进度。
资产导出：支持导出所有高清关键帧和 MP4 片段，方便导入 Premiere/After Effects 进行后期剪辑。

技术架构

Frontend: React 19, Tailwind CSS (Sony Industrial Design Style)
AI Models:
- Logic/Text: gpt-5.1 (高智能剧本分析)
- Vision: gemini-3-pro-image-preview (高速绘图)
- Video: veo_3_1_i2v_s_fast_fl_landscape / sora-2 (首尾帧视频插值)
Storage: IndexedDB (本地浏览器数据库，数据隐私安全，无后端依赖)

为什么选择 AntSK API？

本项目深度集成 AntSK API 平台，为创作者提供极致性价比的 AI 能力：

🎯 全模型覆盖

文本模型: GPT-5.2、GPT-5.1、Claude 3.5 Sonnet
视觉模型: Gemini 3 Pro、Nano Banana Pro
视频模型: Sora-2、Veo-3.1 (支持关键帧插值)
一站式调用：统一 API 接口，无需多平台切换

💰 超值定价

官方 2 折以下：所有模型价格均低于官方渠道 80%
按需计费：无最低消费，用多少付多少
企业级稳定性：99.9% SLA 保障，7x24 技术支持

🚀 开发者友好

OpenAI 兼容协议：零代码迁移成本
详细文档：完整的 API 文档和示例代码
实时监控：可视化用量统计和费用追踪

立即注册领取免费额度 →

⚠️ 开源与“免费”说明（请务必阅读）

模型使用说明：本开源项目默认工作流需要使用对应能力的模型组合，例如大语言模型（如 GPT-5.2）、图像模型（如 Nano Banana Pro）以及视频模型（如 Sora-2 / Veo-3.1）；如需对接其他渠道或模型，可自行修改与适配。
关于开源初衷：我们做开源，是希望降低使用门槛，让更多创作者能快速上手与集成；项目代码开源，模型配置也开放可替换。
关于 API 服务：我们提供的 API 主要用于帮助大家快速体验与接入，并非依赖这部分收入盈利。
关于选择自由：如果你对我们的 API 不满意，完全可以直接使用 OpenAI 或 Google 官方服务（即使价格更高也没关系），这是正常且被尊重的选择。
关于“永久免费”预期：如果你的核心诉求是长期“必须免费”，并以“免费即唯一标准”评估项目，这个项目可能不适合你；更建议体验千问、元宝、豆包等产品（顺便奶茶红包也别错过 😄）。

💬 加入交流群

扫码加入【大香蕉】产品体验群，与其他创作者交流经验、获取最新功能更新：

微信扫码加入交流群

🎨 轻量级创作工具推荐

如果你需要快速完成单次创作任务，可以试试我们的在线工具平台：

BigBanana 创作工坊 提供：

📷 AI 绘图：文字转图片，支持多种风格
📊 AI PPT：一键生成演示文稿
🎬 AI 视频：智能视频内容生成
📱 小红书文案：爆款标题和内容生成
📖 AI 小说创作：智能小说生成与续写
🎨 AI 动漫生成：动漫风格图片创作
🎭 无需安装：浏览器直接使用，即开即用

适合场景：日常创作、快速原型、灵感验证
本项目更适合：系统化短剧制作、批量视频生产、工业化工作流

客户端下载

直接下载安装包，开箱即用，无需配置开发环境：

📥 下载 BigBanana AI Director 客户端 (Windows)

💡 下载后双击安装即可使用，支持 Windows 系统。

项目启动

方式一：本地开发

# 1. 克隆项目
git clone https://github.com/shuyu-labs/BigBanana-AI-Director.git
cd BigBanana-AI-Director

# 2. 安装依赖
npm install

# 3. 启动开发服务器
npm run dev

# 4. 访问应用
# 浏览器打开 http://localhost:3000

方式二：Docker 部署（推荐）

# 1. 克隆项目
git clone https://github.com/shuyu-labs/BigBanana-AI-Director.git
cd BigBanana-AI-Director

# 2. 使用 Docker Compose 构建并启动
docker-compose up -d --build

# 3. 访问应用
# 浏览器打开 http://localhost:3005

# 查看日志
docker-compose logs -f

# 停止容器
docker-compose down

方式三：使用 Docker 命令

# 1. 克隆项目
git clone https://github.com/shuyu-labs/BigBanana-AI-Director.git
cd BigBanana-AI-Director

# 2. 构建镜像
docker build -t bigbanana-ai .

# 3. 运行容器
docker run -d -p 3005:80 --name bigbanana-ai-app bigbanana-ai

# 4. 访问应用
# 浏览器打开 http://localhost:3005

# 查看日志
docker logs -f bigbanana-ai-app

# 停止容器
docker stop bigbanana-ai-app

其他命令

# 构建生产版本
npm run build

# 预览生产版本
npm run preview

# 强制无缓存重新构建 Docker 镜像
docker-compose build --no-cache
docker-compose up -d --force-recreate

快速开始

配置密钥: 启动应用，输入 AntSK API Key。立即购买
故事输入: 在 Phase 01 输入你的故事创意，点击"生成分镜脚本"。
美术设定: 进入 Phase 02，生成主角定妆照和核心场景图。
分镜制作: 进入 Phase 03，先生成首帧；如需更强可控性可补充尾帧，或用九宫格分镜预览来挑选首帧构图。
动效生成: 选定视频模型后生成片段；仅首帧可单图出片，首尾帧可获得更稳定的镜头过渡。

项目来源

本项目基于 CineGen-AI 进行二次开发，在原项目基础上进行了功能增强和优化。

感谢原作者的开源贡献！

许可证

本项目采用 CC BY-NC-SA 4.0 许可证。

✅ 允许个人学习和非商业用途
✅ 允许修改和二次创作（需使用相同许可证）
❌ 禁止商业用途（需获得商业授权）

如需商业授权，请联系：[email protected]

Built for Creators, by BigBanana.

For Tasks:

Click tags to check more tools for each tasks

create storyboards generate character assets control camera movements export final videos manage production assets

For Jobs:

content creator video editor graphic designer creative director storyboard artist

Alternative AI tools for BigBanana-AI-Director

Similar Open Source Tools

BigBanana-AI-Director

github

: 532

aio-hub

AIO Hub is a cross-platform AI hub built on Tauri + Vue 3 + TypeScript, aiming to provide developers and creators with precise LLM control experience and efficient toolchain. It features a chat function designed for complex tasks and deep exploration, a unified context pipeline for controlling every token sent to the model, interactive AI buttons, dual-view management for non-linear conversation mapping, open ecosystem compatibility with various AI models, and a rich text renderer for LLM output. The tool also includes features for media workstation, developer productivity, system and asset management, regex applier, collaboration enhancement between developers and AI, and more.

github

: 89

chatwiki

ChatWiki is an open-source knowledge base AI question-answering system. It is built on large language models (LLM) and retrieval-augmented generation (RAG) technologies, providing out-of-the-box data processing, model invocation capabilities, and helping enterprises quickly build their own knowledge base AI question-answering systems. It offers exclusive AI question-answering system, easy integration of models, data preprocessing, simple user interface design, and adaptability to different business scenarios.

github

: 415

manga-translator-ui

This repository is a manga image translator tool that allows users to translate text in manga images automatically. It supports various types of manga, including Japanese, Korean, and American, in both black and white and color formats. The tool can detect, translate, and embed text, supporting multiple languages such as Japanese, Chinese, and English. It also includes a visual editor for adjusting text boxes. Users can interact with the tool through a Qt interface or command-line mode for batch processing. The tool offers features like intelligent text detection, multi-language OCR, multiple translation engines, high-quality translation using AI models, automatic term extraction, AI sentence segmentation, intelligent typesetting, PSD export, and batch processing. Additionally, it provides a visual editor for region editing, text editing, mask editing, undo/redo functionality, shortcut key support, and mouse wheel shortcuts.

github

: 879

bk-lite

Blueking Lite is an AI First lightweight operation product with low deployment resource requirements, low usage costs, and progressive experience, providing essential tools for operation administrators.

github

: 119

Snap-Solver

Snap-Solver is a revolutionary AI tool for online exam solving, designed for students, test-takers, and self-learners. With just a keystroke, it automatically captures any question on the screen, analyzes it using AI, and provides detailed answers. Whether it's complex math formulas, physics problems, coding issues, or challenges from other disciplines, Snap-Solver offers clear, accurate, and structured solutions to help you better understand and master the subject matter.

github

: 74

AutoGLM-GUI

AutoGLM-GUI is an AI-driven Android automation productivity tool that supports scheduled tasks, remote deployment, and 24/7 AI assistance. It features core functionalities such as deploying to servers, scheduling tasks, and creating an AI automation assistant. The tool enhances productivity by automating repetitive tasks, managing multiple devices, and providing a layered agent mode for complex task planning and execution. It also supports real-time screen preview, direct device control, and zero-configuration deployment. Users can easily download the tool for Windows, macOS, and Linux systems, and can also install it via Python package. The tool is suitable for various use cases such as server automation, batch device management, development testing, and personal productivity enhancement.

github

: 856

bella-openapi

Bella OpenAPI is an API gateway that provides rich AI capabilities, similar to openrouter. In addition to chat completion ability, it also offers text embedding, ASR, TTS, image-to-image, and text-to-image AI capabilities. It integrates billing, rate limiting, and resource management functions. All integrated capabilities have been validated in large-scale production environments. The tool supports various AI capabilities, metadata management, unified login service, billing and rate limiting, and has been validated in large-scale production environments for stability and reliability. It offers a user-friendly experience with Java-friendly technology stack, convenient cloud-based experience service, and Dockerized deployment.

github

: 120

Flux-AI-Pro

Flux AI Pro - NanoBanana Edition is a high-performance, single-file AI image generation solution built on Cloudflare Workers. It integrates top AI providers like Pollinations.ai, Infip/Ghostbot, Aqua Server, Kinai API, and Airforce API to offer a serverless, fast, and feature-rich creative experience. It provides seamless interface for generating high-quality AI art without complex server setups. The tool supports multiple languages, smart language detection, RTL support, AI prompt generator, high-definition image generation, and local history storage with export/import functionality.

github

: 66

AI-Drug-Discovery-Design

AI-Drug-Discovery-Design is a repository focused on Artificial Intelligence-assisted Drug Discovery and Design. It explores the use of AI technology to accelerate and optimize the drug development process. The advantages of AI in drug design include speeding up research cycles, improving accuracy through data-driven models, reducing costs by minimizing experimental redundancies, and enabling personalized drug design for specific patients or disease characteristics.

github

: 77

Daily-DeepLearning

Daily-DeepLearning is a repository that covers various computer science topics such as data structures, operating systems, computer networks, Python programming, data science packages like numpy, pandas, matplotlib, machine learning theories, deep learning theories, NLP concepts, machine learning practical applications, deep learning practical applications, and big data technologies like Hadoop and Hive. It also includes coding exercises related to '剑指offer'. The repository provides detailed explanations and examples for each topic, making it a comprehensive resource for learning and practicing different aspects of computer science and data-related fields.

github

: 666

LabelQuick

LabelQuick_V2.0 is a fast image annotation tool designed and developed by the AI Horizon team. This version has been optimized and improved based on the previous version. It provides an intuitive interface and powerful annotation and segmentation functions to efficiently complete dataset annotation work. The tool supports video object tracking annotation, quick annotation by clicking, and various video operations. It introduces the SAM2 model for accurate and efficient object detection in video frames, reducing manual intervention and improving annotation quality. The tool is designed for Windows systems and requires a minimum of 6GB of memory.

github

: 70

get_jobs

Get Jobs is a tool designed to help users find and apply for job positions on various recruitment platforms in China. It features AI job matching, automatic cover letter generation, multi-platform job application, automated filtering of inactive HR and headhunter positions, real-time WeChat message notifications, blacklisted company updates, driver adaptation for Win11, centralized configuration, long-lasting cookie login, XPathHelper plugin, global logging, and more. The tool supports platforms like Boss直聘, 猎聘, 拉勾, 51job, and 智联招聘. Users can configure the tool for customized job searches and applications.

github

: 3.9k

douyin-chatgpt-bot

Douyin ChatGPT Bot is an AI-driven system for automatic replies on Douyin, including comment and private message replies. It offers features such as comment filtering, customizable robot responses, and automated account management. The system aims to enhance user engagement and brand image on the Douyin platform, providing a seamless experience for managing interactions with followers and potential customers.

github

: 166

TypeTale

TypeTale is an AIGC creation software designed specifically for content creators, primarily used for novel promotion. It offers a wide range of AI capabilities such as image, video, and audio generation, as well as text processing and story extraction. The tool also provides workflow customization, AI assistant support, and a vast library of creative materials. With a user-friendly interface and system requirements compatible with Windows operating systems, TypeTale aims to streamline the content creation process for writers and creators.

github

: 202

InterPilot

InterPilot is an AI-based assistant tool that captures audio from Windows input/output devices, transcribes it into text, and then calls the Large Language Model (LLM) API to provide answers. The project includes recording, transcription, and AI response modules, aiming to provide support for personal legitimate learning, work, and research. It may assist in scenarios like interviews, meetings, and learning, but it is strictly for learning and communication purposes only. The tool can hide its interface using third-party tools to prevent screen recording or screen sharing, but it does not have this feature built-in. Users bear the risk of using third-party tools independently.

github

: 88

For similar tasks

BigBanana-AI-Director

github

: 532

clapper

Clapper is an open-source AI story visualization tool that can interpret screenplays and render them into storyboards, videos, voice, sound, and music. It is currently in early development stages and not recommended for general use due to some non-functional features and lack of tutorials. A public alpha version is available on Hugging Face's platform. Users can sponsor specific features through bounties and developers can contribute to the project under the GPL v3 license. The tool lacks automated tests and code conventions like Prettier or a Linter.

github

: 2.0k

For similar jobs

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

daily-poetry-image

Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.

github

: 492

exif-photo-blog

EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.

github

: 1.4k

SillyTavern

SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.

github

: 23.1k

Twitter-Insight-LLM

This project enables you to fetch liked tweets from Twitter (using Selenium), save it to JSON and Excel files, and perform initial data analysis and image captions. This is part of the initial steps for a larger personal project involving Large Language Models (LLMs).

github

: 401

AISuperDomain

Aila Desktop Application is a powerful tool that integrates multiple leading AI models into a single desktop application. It allows users to interact with various AI models simultaneously, providing diverse responses and insights to their inquiries. With its user-friendly interface and customizable features, Aila empowers users to engage with AI seamlessly and efficiently. Whether you're a researcher, student, or professional, Aila can enhance your AI interactions and streamline your workflow.

github

: 1.2k

ChatGPT-On-CS

This project is an intelligent dialogue customer service tool based on a large model, which supports access to platforms such as WeChat, Qianniu, Bilibili, Douyin Enterprise, Douyin, Doudian, Weibo chat, Xiaohongshu professional account operation, Xiaohongshu, Zhihu, etc. You can choose GPT3.5/GPT4.0/ Lazy Treasure Box (more platforms will be supported in the future), which can process text, voice and pictures, and access external resources such as operating systems and the Internet through plug-ins, and support enterprise AI applications customized based on their own knowledge base.

github

: 768

obs-localvocal

LocalVocal is a live-streaming AI assistant plugin for OBS that allows you to transcribe audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). It's privacy-first, with all data staying on your machine, and requires no GPU, cloud costs, network, or downtime.

github

: 248