ai-comic-factory

Generate comic panels using a LLM + SDXL. Powered by Hugging Face 🤗

Stars: 1012

Visit

The AI Comic Factory is a tool that allows you to create your own AI comics with a single prompt. It uses a large language model (LLM) to generate the story and dialogue, and a rendering API to generate the panel images. The AI Comic Factory is open-source and can be run on your own website or computer. It is a great tool for anyone who wants to create their own comics, or for anyone who is interested in the potential of AI for storytelling.

README:

title: AI Comic Factory emoji: 👩‍🎨 colorFrom: red colorTo: yellow sdk: docker pinned: true app_port: 3000 disable_embedding: false short_description: Create your own AI comic with a single prompt hf_oauth: true hf_oauth_expiration_minutes: 43200 hf_oauth_scopes: [inference-api]

AI Comic Factory

Last release: AI Comic Factory 1.2

The AI Comic Factory will soon have an official website: aicomicfactory.app

For more information about my other projects please check linktr.ee/FLNGR.

Running the project at home

First, I would like to highlight that everything is open-source (see here, here, here, here).

However the project isn't a monolithic Space that can be duplicated and ran immediately: it requires various components to run for the frontend, backend, LLM, SDXL etc.

If you try to duplicate the project, open the .env you will see it requires some variables.

Provider config:

LLM_ENGINE: can be one of INFERENCE_API, INFERENCE_ENDPOINT, OPENAI, GROQ, ANTHROPIC
RENDERING_ENGINE: can be one of: "INFERENCE_API", "INFERENCE_ENDPOINT", "REPLICATE", "VIDEOCHAIN", "OPENAI" for now, unless you code your custom solution

Auth config:

AUTH_HF_API_TOKEN: if you decide to use Hugging Face for the LLM engine (inference api model or a custom inference endpoint)
AUTH_OPENAI_API_KEY: to use OpenAI for the LLM engine
AUTH_GROQ_API_KEY: to use Groq for the LLM engine
AUTH_ANTHROPIC_API_KEY: to use Anthropic (Claude) for the LLM engine
AUTH_VIDEOCHAIN_API_TOKEN: secret token to access the VideoChain API server
AUTH_REPLICATE_API_TOKEN: in case you want to use Replicate.com

Rendering config:

RENDERING_HF_INFERENCE_ENDPOINT_URL: necessary if you decide to use a custom inference endpoint
RENDERING_REPLICATE_API_MODEL_VERSION: url to the VideoChain API server
RENDERING_HF_INFERENCE_ENDPOINT_URL: optional, default to nothing
RENDERING_HF_INFERENCE_API_BASE_MODEL: optional, defaults to "stabilityai/stable-diffusion-xl-base-1.0"
RENDERING_HF_INFERENCE_API_REFINER_MODEL: optional, defaults to "stabilityai/stable-diffusion-xl-refiner-1.0"
RENDERING_REPLICATE_API_MODEL: optional, defaults to "stabilityai/sdxl"
RENDERING_REPLICATE_API_MODEL_VERSION: optional, in case you want to change the version

Language model config (depending on the LLM engine you decide to use):

LLM_HF_INFERENCE_ENDPOINT_URL: ""
LLM_HF_INFERENCE_API_MODEL: "HuggingFaceH4/zephyr-7b-beta"
LLM_OPENAI_API_BASE_URL: "https://api.openai.com/v1"
LLM_OPENAI_API_MODEL: "gpt-4-turbo"
LLM_GROQ_API_MODEL: "mixtral-8x7b-32768"
LLM_ANTHROPIC_API_MODEL: "claude-3-opus-20240229"

In addition, there are some community sharing variables that you can just ignore. Those variables are not required to run the AI Comic Factory on your own website or computer (they are meant to create a connection with the Hugging Face community, and thus only make sense for official Hugging Face apps):

NEXT_PUBLIC_ENABLE_COMMUNITY_SHARING: you don't need this
COMMUNITY_API_URL: you don't need this
COMMUNITY_API_TOKEN: you don't need this
COMMUNITY_API_ID: you don't need this

Please read the .env default config file for more informations. To customise a variable locally, you should create a .env.local (do not commit this file as it will contain your secrets).

-> If you intend to run it with local, cloud-hosted and/or proprietary models you are going to need to code 👨‍💻.

The LLM API (Large Language Model)

Currently the AI Comic Factory uses zephyr-7b-beta through an Inference Endpoint.

You have multiple options:

Option 1: Use an Inference API model

This is a new option added recently, where you can use one of the models from the Hugging Face Hub. By default we suggest to use zephyr-7b-beta as it will provide better results than the 7b model.

To activate it, create a .env.local configuration file:

LLM_ENGINE="INFERENCE_API"

HF_API_TOKEN="Your Hugging Face token"

# "HuggingFaceH4/zephyr-7b-beta" is used by default, but you can change this
# note: You should use a model able to generate JSON responses,
# so it is storngly suggested to use at least the 34b model
HF_INFERENCE_API_MODEL="HuggingFaceH4/zephyr-7b-beta"

Option 2: Use an Inference Endpoint URL

If you would like to run the AI Comic Factory on a private LLM running on the Hugging Face Inference Endpoint service, create a .env.local configuration file:

LLM_ENGINE="INFERENCE_ENDPOINT"

HF_API_TOKEN="Your Hugging Face token"

HF_INFERENCE_ENDPOINT_URL="path to your inference endpoint url"

To run this kind of LLM locally, you can use TGI (Please read this post for more information about the licensing).

Option 3: Use an OpenAI API Key

This is a new option added recently, where you can use OpenAI API with an OpenAI API Key.

To activate it, create a .env.local configuration file:

LLM_ENGINE="OPENAI"

# default openai api base url is: https://api.openai.com/v1
LLM_OPENAI_API_BASE_URL="A custom OpenAI API Base URL if you have some special privileges"

LLM_OPENAI_API_MODEL="gpt-4-turbo"

AUTH_OPENAI_API_KEY="Yourown OpenAI API Key"

Option 4: (new, experimental) use Groq

LLM_ENGINE="GROQ"

LLM_GROQ_API_MODEL="mixtral-8x7b-32768"

AUTH_GROQ_API_KEY="Your own GROQ API Key"

Option 5: (new, experimental) use Anthropic (Claude)

LLM_ENGINE="ANTHROPIC"

LLM_ANTHROPIC_API_MODEL="claude-3-opus-20240229"

AUTH_ANTHROPIC_API_KEY="Your own ANTHROPIC API Key"

Option 6: Fork and modify the code to use a different LLM system

Another option could be to disable the LLM completely and replace it with another LLM protocol and/or provider (eg. Claude, Replicate), or a human-generated story instead (by returning mock or static data).

Notes

It is possible that I modify the AI Comic Factory to make it easier in the future (eg. add support for Claude or Replicate)

The Rendering API

This API is used to generate the panel images. This is an API I created for my various projects at Hugging Face.

I haven't written documentation for it yet, but basically it is "just a wrapper ™" around other existing APIs:

The hysts/SD-XL Space by @hysts
And other APIs for making videos, adding audio etc.. but you won't need them for the AI Comic Factory

Option 1: Deploy VideoChain yourself

You will have to clone the source-code

Unfortunately, I haven't had the time to write the documentation for VideoChain yet. (When I do I will update this document to point to the VideoChain's README)

Option 2: Use Replicate

To use Replicate, create a .env.local configuration file:

RENDERING_ENGINE="REPLICATE"

RENDERING_REPLICATE_API_MODEL="stabilityai/sdxl"

RENDERING_REPLICATE_API_MODEL_VERSION="da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf"

AUTH_REPLICATE_API_TOKEN="Your Replicate token"

Option 3: Use another SDXL API

If you fork the project you will be able to modify the code to use the Stable Diffusion technology of your choice (local, open-source, proprietary, your custom HF Space etc).

It would even be something else, such as Dall-E.

For Tasks:

Click tags to check more tools for each tasks

create comics generate dialogue generate images tell stories write scripts

For Jobs:

comic book writer graphic designer illustrator storyboard artist visual artist

Alternative AI tools for ai-comic-factory

Similar Open Source Tools

ai-comic-factory

github

: 1.0k

litlyx

Litlyx is a single-line code analytics solution that integrates with every JavaScript/TypeScript framework. It allows you to track 10+ KPIs and custom events for your website or web app. The tool comes with an AI Data Analyst Assistant that can analyze your data, compare data, query metadata, visualize charts, and more. Litlyx is open-source, allowing users to self-host it and create their own version of the dashboard. The tool is user-friendly and supports various JavaScript/TypeScript frameworks, making it versatile for different projects.

github

: 1.1k

vectara-answer

Vectara Answer is a sample app for Vectara-powered Summarized Semantic Search (or question-answering) with advanced configuration options. For examples of what you can build with Vectara Answer, check out Ask News, LegalAid, or any of the other demo applications.

github

: 249

dir-assistant

Dir-assistant is a tool that allows users to interact with their current directory's files using local or API Language Models (LLMs). It supports various platforms and provides API support for major LLM APIs. Users can configure and customize their local LLMs and API LLMs using the tool. Dir-assistant also supports model downloads and configurations for efficient usage. It is designed to enhance file interaction and retrieval using advanced language models.

github

: 324

opencommit

OpenCommit is a tool that auto-generates meaningful commits using AI, allowing users to quickly create commit messages for their staged changes. It provides a CLI interface for easy usage and supports customization of commit descriptions, emojis, and AI models. Users can configure local and global settings, switch between different AI providers, and set up Git hooks for integration with IDE Source Control. Additionally, OpenCommit can be used as a GitHub Action to automatically improve commit messages on push events, ensuring all commits are meaningful and not generic. Payments for OpenAI API requests are handled by the user, with the tool storing API keys locally.

github

: 5.9k

llm-ollama

LLM-ollama is a plugin that provides access to models running on an Ollama server. It allows users to query the Ollama server for a list of models, register them with LLM, and use them for prompting, chatting, and embedding. The plugin supports image attachments, embeddings, JSON schemas, async models, model aliases, and model options. Users can interact with Ollama models through the plugin in a seamless and efficient manner.

github

: 247

codespin

CodeSpin.AI is a set of open-source code generation tools that leverage large language models (LLMs) to automate coding tasks. With CodeSpin, you can generate code in various programming languages, including Python, JavaScript, Java, and C++, by providing natural language prompts. CodeSpin offers a range of features to enhance code generation, such as custom templates, inline prompting, and the ability to use ChatGPT as an alternative to API keys. Additionally, CodeSpin provides options for regenerating code, executing code in prompt files, and piping data into the LLM for processing. By utilizing CodeSpin, developers can save time and effort in coding tasks, improve code quality, and explore new possibilities in code generation.

github

: 60

slack-bot

The Slack Bot is a tool designed to enhance the workflow of development teams by integrating with Jenkins, GitHub, GitLab, and Jira. It allows for custom commands, macros, crons, and project-specific commands to be implemented easily. Users can interact with the bot through Slack messages, execute commands, and monitor job progress. The bot supports features like starting and monitoring Jenkins jobs, tracking pull requests, querying Jira information, creating buttons for interactions, generating images with DALL-E, playing quiz games, checking weather, defining custom commands, and more. Configuration is managed via YAML files, allowing users to set up credentials for external services, define custom commands, schedule cron jobs, and configure VCS systems like Bitbucket for automated branch lookup in Jenkins triggers.

github

: 188

aider.nvim

Aider.nvim is a Neovim plugin that integrates the Aider AI coding assistant, allowing users to open a terminal window within Neovim to run Aider. It provides functions like AiderOpen to open the terminal window, AiderAddModifiedFiles to add git-modified files to the Aider chat, and customizable keybindings. Users can configure the plugin using the setup function to manage context, keybindings, debug logging, and ignore specific buffer names.

github

: 241

supabase-mcp

github

: 299

hordelib

horde-engine is a wrapper around ComfyUI designed to run inference pipelines visually designed in the ComfyUI GUI. It enables users to design inference pipelines in ComfyUI and then call them programmatically, maintaining compatibility with the existing horde implementation. The library provides features for processing Horde payloads, initializing the library, downloading and validating models, and generating images based on input data. It also includes custom nodes for preprocessing and tasks such as face restoration and QR code generation. The project depends on various open source projects and bundles some dependencies within the library itself. Users can design ComfyUI pipelines, convert them to the backend format, and run them using the run_image_pipeline() method in hordelib.comfy.Comfy(). The project is actively developed and tested using git, tox, and a specific model directory structure.

github

: 56

basehub

JavaScript / TypeScript SDK for BaseHub, the first AI-native content hub. **Features:** * ✨ Infers types from your BaseHub repository... _meaning IDE autocompletion works great._ * 🏎️ No dependency on graphql... _meaning your bundle is more lightweight._ * 🌐 Works everywhere `fetch` is supported... _meaning you can use it anywhere._

github

: 183

tiledesk-dashboard

Tiledesk is an open-source live chat platform with integrated chatbots written in Node.js and Express. It is designed to be a multi-channel platform for web, Android, and iOS, and it can be used to increase sales or provide post-sales customer service. Tiledesk's chatbot technology allows for automation of conversations, and it also provides APIs and webhooks for connecting external applications. Additionally, it offers a marketplace for apps and features such as CRM, ticketing, and data export.

github

: 258

mlx-lm

MLX LM is a Python package designed for generating text and fine-tuning large language models on Apple silicon using MLX. It offers integration with the Hugging Face Hub for easy access to thousands of LLMs, support for quantizing and uploading models to the Hub, low-rank and full model fine-tuning capabilities, and distributed inference and fine-tuning with `mx.distributed`. Users can interact with the package through command line options or the Python API, enabling tasks such as text generation, chatting with language models, model conversion, streaming generation, and sampling. MLX LM supports various Hugging Face models and provides tools for efficient scaling to long prompts and generations, including a rotating key-value cache and prompt caching. It requires macOS 15.0 or higher for optimal performance.

github

: 339

mcp-server-qdrant

github

: 386

autoscraper

AutoScraper is a smart, automatic, fast, and lightweight web scraping tool for Python. It simplifies the process of web scraping by learning scraping rules based on sample data provided by the user. The tool can extract text, URLs, or HTML tag values from web pages and return similar elements. Users can utilize the learned object to scrape similar content or exact elements from new pages. AutoScraper is compatible with Python 3 and offers easy installation from various sources. It provides functionalities for fetching similar and exact results from web pages, such as extracting post titles from Stack Overflow or live stock prices from Yahoo Finance. The tool allows customization with custom requests module parameters like proxies or headers. Users can save and load models for future use and explore advanced usages through tutorials and examples.

github

: 6.2k

For similar tasks

ai-comic-factory

github

: 1.0k

agnai

Agnaistic is an AI roleplay chat tool that allows users to interact with personalized characters using their favorite AI services. It supports multiple AI services, persona schema formats, and features such as group conversations, user authentication, and memory/lore books. Agnaistic can be self-hosted or run using Docker, and it provides a range of customization options through its settings.json file. The tool is designed to be user-friendly and accessible, making it suitable for both casual users and developers.

github

: 576

LLaMa2lang

This repository contains convenience scripts to finetune LLaMa3-8B (or any other foundation model) for chat towards any language (that isn't English). The rationale behind this is that LLaMa3 is trained on primarily English data and while it works to some extent for other languages, its performance is poor compared to English.

github

: 210

lollms-webui

LoLLMs WebUI (Lord of Large Language Multimodal Systems: One tool to rule them all) is a user-friendly interface to access and utilize various LLM (Large Language Models) and other AI models for a wide range of tasks. With over 500 AI expert conditionings across diverse domains and more than 2500 fine tuned models over multiple domains, LoLLMs WebUI provides an immediate resource for any problem, from car repair to coding assistance, legal matters, medical diagnosis, entertainment, and more. The easy-to-use UI with light and dark mode options, integration with GitHub repository, support for different personalities, and features like thumb up/down rating, copy, edit, and remove messages, local database storage, search, export, and delete multiple discussions, make LoLLMs WebUI a powerful and versatile tool.

github

: 4.6k

daily-poetry-image

Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.

github

: 492

InvokeAI

InvokeAI is a leading creative engine built to empower professionals and enthusiasts alike. Generate and create stunning visual media using the latest AI-driven technologies. InvokeAI offers an industry leading Web Interface, interactive Command Line Interface, and also serves as the foundation for multiple commercial products.

github

: 24.8k

LocalAI

LocalAI is a free and open-source OpenAI alternative that acts as a drop-in replacement REST API compatible with OpenAI (Elevenlabs, Anthropic, etc.) API specifications for local AI inferencing. It allows users to run LLMs, generate images, audio, and more locally or on-premises with consumer-grade hardware, supporting multiple model families and not requiring a GPU. LocalAI offers features such as text generation with GPTs, text-to-audio, audio-to-text transcription, image generation with stable diffusion, OpenAI functions, embeddings generation for vector databases, constrained grammars, downloading models directly from Huggingface, and a Vision API. It provides a detailed step-by-step introduction in its Getting Started guide and supports community integrations such as custom containers, WebUIs, model galleries, and various bots for Discord, Slack, and Telegram. LocalAI also offers resources like an LLM fine-tuning guide, instructions for local building and Kubernetes installation, projects integrating LocalAI, and a how-tos section curated by the community. It encourages users to cite the repository when utilizing it in downstream projects and acknowledges the contributions of various software from the community.

github

: 31.5k

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

For similar jobs

ai-comic-factory

github

: 1.0k

daily-poetry-image

github

: 492

InvokeAI

github

: 24.8k

ap-plugin

AP-PLUGIN is an AI drawing plugin for the Yunzai series robot framework, allowing you to have a convenient AI drawing experience in the input box. It uses the open source Stable Diffusion web UI as the backend, deploys it for free, and generates a variety of images with richer functions.

github

: 103

photoprism

PhotoPrism is an AI-powered photos app for the decentralized web. It uses the latest technologies to tag and find pictures automatically without getting in your way. You can run it at home, on a private server, or in the cloud.

github

: 36.9k

facefusion

FaceFusion is a next-generation face swapper and enhancer that allows users to seamlessly swap faces in images and videos, as well as enhance facial features for a more polished and refined look. With its advanced deep learning models, FaceFusion provides users with a wide range of options for customizing their face swaps and enhancements, making it an ideal tool for content creators, artists, and anyone looking to explore their creativity with facial manipulation.

github

: 21.9k

99AI

99AI is a commercializable AI web application based on NineAI 2.4.2 (no authorization, no backdoors, no piracy, integrated front-end and back-end integration packages, supports Docker rapid deployment). The uncompiled source code is temporarily closed. Compared with the stable version, the development version is faster.

github

: 736

wunjo.wladradchenko.ru

Wunjo AI is a comprehensive tool that empowers users to explore the realm of speech synthesis, deepfake animations, video-to-video transformations, and more. Its user-friendly interface and privacy-first approach make it accessible to both beginners and professionals alike. With Wunjo AI, you can effortlessly convert text into human-like speech, clone voices from audio files, create multi-dialogues with distinct voice profiles, and perform real-time speech recognition. Additionally, you can animate faces using just one photo combined with audio, swap faces in videos, GIFs, and photos, and even remove unwanted objects or enhance the quality of your deepfakes using the AI Retouch Tool. Wunjo AI is an all-in-one solution for your voice and visual AI needs, offering endless possibilities for creativity and expression.

github

: 820