ComfyUI-OllamaGemini

AI-api text generation

Stars: 120

Visit

ComfyUI GeminiOllama Extension integrates Google's Gemini API, OpenAI (ChatGPT), Anthropic's Claude, Ollama, Qwen, and image processing tools into ComfyUI for leveraging powerful models and features directly within workflows. Features include multiple AI API integrations, advanced prompt engineering, Gemini image generation, background removal, SVG conversion, FLUX resolutions, ComfyUI Styler, smart prompt generator, and more. The extension offers comprehensive API integration, advanced prompt engineering with researched templates, high-quality tools like Smart Prompt Generator and BRIA RMBG, and supports video & audio processing. It provides a single interface to access powerful AI models, transform prompts into detailed instructions, and use various tools for image processing, styling, and content generation.

README:

🚀 ComfyUI GeminiOllama Extension

Supercharge your ComfyUI workflows with AI superpowers

This extension integrates Google's Gemini API, OpenAI (ChatGPT), Anthropic's Claude, Ollama, Qwen, and various image processing tools into ComfyUI, allowing users to leverage these powerful models and features directly within their ComfyUI workflows.

Features

1️⃣ Multiple AI API Integrations

https://github.com/user-attachments/assets/6ffba8bc-47e9-42c5-be98-5849ffb03547

Google Gemini: Access gemini-2.0-pro, gemini-2.0-flash, gemini-1.5-pro and more with dynamic model list updates
OpenAI: Use gpt-4o, gpt-4-turbo, gpt-3.5-turbo, and DeepSeek models with automatic model discovery
Anthropic Claude: Leverage claude-3.7-sonnet, claude-3.5-sonnet, claude-3-opus and more
Alibaba Qwen: Access qwen-max, qwen-plus, qwen-turbo models
Ollama: Run local models with customizable parameters
Video & Audio Support: Process video frames and audio inputs with Gemini and Ollama

2️⃣ Advanced Prompt Engineering

Transform simple prompts into detailed, model-specific instructions
Extensively researched prompt templates optimized for different models:
- SDXL: Premium tag-based prompts with precise artistic control, structured in order of importance
- FLUX.1-dev: Hyper-detailed cinematographic prompts with technical precision and artistic vision
- VideoGen: Professional video generation prompts with subject, context, action, cinematography, and style
- Imagen4: Structured, layered prompts optimized for Google's Imagen 4 model
- GeminiNanaBananaEdit: Conversational, mask-free image editing prompts for intuitive and precise modifications
AI-powered prompt enhancement with expert-level guidance
Returns only the enhanced prompt without additional commentary

3️⃣ - Gemini Image Generation

Generate images directly with Google's Gemini 2.0 Flash model
Customize with prompts and negative prompts
Automatic saving to ComfyUI's output directory

4️⃣ Background Removal (BRIA RMBG)

High-quality background removal with fine detail preservation
Preserves complex edges, hair, thin stems, and transparent elements
Generates both transparent images and alpha masks

5️⃣ SVG Conversion

Convert raster images to high-quality vector graphics
Multiple vectorization parameters for precise control
Save and preview SVG files directly in ComfyUI

6️⃣ FLUX Resolutions

Precise image sizing with predefined and custom options
Multiple resolution presets for various use cases
Custom sizing parameters for complete control

7️⃣ ComfyUI Styler

Hundreds of artistic styles for creative control
Categories include art styles, camera settings, moods, and more
Easily combine multiple style elements

8️⃣ Smart Prompt Generator

Create highly detailed, creative prompts by combining multiple style categories
AI-powered enhancement using Gemini API to refine and expand prompts
Completely random prompt generation with four different randomization modes
Automatic random seed generation for unique results on every run
Control creativity level and focus areas for targeted results
Auto-generate appropriate negative prompts
Seamlessly combines styles from artists, movies, art styles, and more
Supports reproducible results with manual seed setting

💻 Installation & Setup

📦 Installation

Method 1: ComfyUI Manager (Recommended)

Install ComfyUI Manager if you don't have it already
In ComfyUI, go to the Manager tab and search for "OllamaGemini"
Click Install

Method 2: Manual Installation

Clone this repository into your ComfyUI's custom_nodes directory:

cd /path/to/ComfyUI/custom_nodes
git clone https://github.com/al-swaiti/ComfyUI-OllamaGemini.git

Install the required dependencies:

pip install pip install google-genai google-generativeai openai>=1.3.0 anthropic>=0.8.0 requests>=2.31.0 vtracer>=0.6.0 dashscope>=1.13.6 Pillow>=10.0.0 scipy>=1.10.0 opencv-python transformers>=4.30.0 torch torchaudio

Restart ComfyUI

🔑 API Key Setup

### Obtaining API Keys

Provider	Where to Get	Free Tier
Google Gemini	Google AI Studio	✅ Yes
OpenAI	OpenAI Platform	❌ No
Anthropic Claude	Anthropic Console	✅ Limited
Ollama	Ollama (runs locally)	✅ Yes
Alibaba Qwen	DashScope Console	✅ Limited

Option 1: Using the Config File

Edit the config.json to add your API keys:

{
  "GEMINI_API_KEY": "your_gemini_api_key",
  "OPENAI_API_KEY": "your_openai_api_key",
  "ANTHROPIC_API_KEY": "your_claude_api_key",
  "OLLAMA_URL": "http://localhost:11434",
  "QWEN_API_KEY": "your_qwen_api_key"
}

🔹 Quick Start Guide

💬 Using AI API Services

Add the appropriate API node to your workflow (Gemini API, OpenAI API, Claude API, etc.)
Enter your prompt in the text field
Select the desired model from the dropdown
Adjust parameters like temperature and max tokens as needed
For enhanced prompts, enable "structure_output" and select a prompt structure template
Connect the output to other nodes in your workflow

🖼️ Generating Images with Gemini

Add the "Gemini Image Generator" node to your workflow
Enter your prompt describing the desired image
Optionally add a negative prompt to exclude unwanted elements
Connect the output to a preview node to see the generated image

🪄 Removing Backgrounds

Add the "BRIA RMBG" node to your workflow
Connect an image source to the input
Set model_version to 2.0 for best results
Connect the image output to see the transparent result
Connect the mask output to see the generated mask

✨ Using the Smart Prompt Generator

Add the "Smart Prompt Generator" node to your workflow
Choose your preferred randomization mode:
- Disabled: Use your own prompt and manually select styles
- Random Styles Only: Keep your base prompt but apply random styles
- Random Base+Styles: Generate a random base prompt with random styles
- Fully Random: Let the AI create a completely random prompt from scratch
Set the number of random styles to apply and optionally set a randomize seed
Set your preferred "creativity_level" (Low, Medium, High, Extreme)
Choose a "focus_on" option to guide the AI enhancement:
- Realism: Focuses on photorealistic details
- Fantasy: Emphasizes fantastical and imaginative elements
- Abstract: Highlights abstract artistic concepts
- Artistic: Prioritizes artistic techniques and expression
- Cinematic: Adds film-like qualities and composition
Connect the output to a Text node or directly to image generation nodes

The Smart Prompt Generator works in four modes:

Manual Mode: Combine styles you manually select with your own base prompt
Random Styles Mode: Apply random style combinations to your base prompt
Random Base+Styles Mode: Generate a random prompt and apply random styles
Fully Random Mode: Let the AI create a completely new prompt from scratch

Using a randomize_seed of 0 will generate different results every time you run the node, while setting a specific seed will produce consistent results that can be reproduced.

✒️ Converting Images to SVG

Add the "Convert Image to SVG" node to your workflow
Connect an image source to the input
Configure the vectorization parameters
Connect the output to the "Save SVG File" node
Set a filename prefix and enable preview

🎬 Using Video and Audio Inputs

Add the "GeminiAPI" or "OllamaAPI" node to your workflow
Set "input_type" to "video" or "audio" depending on your media
Connect a video tensor (sequence of frames) to the "video" input or an audio file to the "audio" input
Enter your prompt describing what you want to analyze about the media
Select the desired model from the dropdown
The AI will analyze the video frames or audio and provide a detailed response

For video inputs:

The system automatically samples frames from the video for analysis
Works best with models that support multimodal inputs

🌟 Why Choose This Extension?

Comprehensive API Integration

Access the most powerful AI models through a single interface:

Google Gemini: gemini-2.0-pro, gemini-2.0-flash, gemini-1.5-pro, and more with dynamic model list updates
OpenAI: gpt-4o, gpt-4-turbo, gpt-3.5-turbo, and DeepSeek models with automatic model discovery
Anthropic Claude: claude-3.7-sonnet, claude-3.5-sonnet, claude-3-opus, and more
Alibaba Qwen: qwen-max, qwen-plus, qwen-turbo, qwen-max-longcontext
Ollama: Run any local model with customizable parameters
Multimodal Support: Process text, images, video frames, and audio inputs

Advanced Prompt Engineering

Transform simple prompts into detailed, model-specific instructions with extensively researched templates:

SDXL: Premium tag-based prompts with precise artistic control, structured in order of importance with professional terminology
FLUX.1-dev: Hyper-detailed cinematographic prompts with technical precision, artistic vision, and professional lighting/camera specifications
VideoGen: Professional video generation prompts with subject, context, action, cinematography, and style elements optimized for modern video models
Custom: Create your own prompt structure for specific needs

Each template is the result of deep research into model-specific optimization techniques and professional terminology from photography, cinematography, and visual arts.

High-Quality Tools

Smart Prompt Generator: Advanced prompt creation with automatic random seed generation for unique results every time
BRIA RMBG: Best-in-class background removal with fine detail preservation
SVG Conversion: High-quality vectorization with vtracer
FLUX Resolutions: Precise image sizing with predefined and custom options
ComfyUI Styler: Hundreds of artistic styles for creative control
Video & Audio Processing: Analyze and extract insights from video frames and audio files

👨‍💻 Contributing

Contributions are welcome! Here's how you can help:

Bug Reports: Open an issue describing the bug and how to reproduce it
Feature Requests: Suggest new features or improvements
Pull Requests: Submit PRs for bug fixes or new features
Documentation: Help improve or translate the documentation

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

⭐ If you find this extension useful, please consider giving it a star! ⭐

💖 Support This Project

If you enjoy using this extension and would like to support continued development, please consider buying me a coffee. Every contribution helps keep this project going and enables new features!

🔗 Connect With Me

Models & LoRAs: Civitai | Hugging Face
Image Gallery: DeviantArt
Professional Profile: LinkedIn (Open for work and collaborations)

For Tasks:

Click tags to check more tools for each tasks

generate images remove backgrounds convert images to svg create prompts analyze video & audio

For Jobs:

ai engineer data scientist content creator graphic designer creative director

Alternative AI tools for ComfyUI-OllamaGemini

Similar Open Source Tools

ComfyUI-OllamaGemini

github

: 120

shadcn-chatbot-kit

A comprehensive chatbot component kit built on top of and fully compatible with the shadcn/ui ecosystem. Build beautiful, customizable AI chatbots in minutes while maintaining full control over your components. The kit includes pre-built chat components, auto-scroll message area, message input with auto-resize textarea and file upload support, prompt suggestions, message actions, loading states, and more. Fully themeable, highly customizable, and responsive design. Built with modern web standards and best practices. Installation instructions available with detailed documentation. Customizable using CSS variables.

github

: 226

crawl4ai

Crawl4AI is a powerful and free web crawling service that extracts valuable data from websites and provides LLM-friendly output formats. It supports crawling multiple URLs simultaneously, replaces media tags with ALT, and is completely free to use and open-source. Users can integrate Crawl4AI into Python projects as a library or run it as a standalone local server. The tool allows users to crawl and extract data from specified URLs using different providers and models, with options to include raw HTML content, force fresh crawls, and extract meaningful text blocks. Configuration settings can be adjusted in the `crawler/config.py` file to customize providers, API keys, chunk processing, and word thresholds. Contributions to Crawl4AI are welcome from the open-source community to enhance its value for AI enthusiasts and developers.

github

: 37.5k

agent-squad

Agent Squad is a flexible, lightweight open-source framework for orchestrating multiple AI agents to handle complex conversations. It intelligently routes queries, maintains context across interactions, and offers pre-built components for quick deployment. The system allows easy integration of custom agents and conversation messages storage solutions, making it suitable for various applications from simple chatbots to sophisticated AI systems, scaling efficiently.

github

: 6.8k

multi-agent-orchestrator

Multi-Agent Orchestrator is a flexible and powerful framework for managing multiple AI agents and handling complex conversations. It intelligently routes queries to the most suitable agent based on context and content, supports dual language implementation in Python and TypeScript, offers flexible agent responses, context management across agents, extensible architecture for customization, universal deployment options, and pre-built agents and classifiers. It is suitable for various applications, from simple chatbots to sophisticated AI systems, accommodating diverse requirements and scaling efficiently.

github

: 4.6k

swift-chat

SwiftChat is a fast and responsive AI chat application developed with React Native and powered by Amazon Bedrock. It offers real-time streaming conversations, AI image generation, multimodal support, conversation history management, and cross-platform compatibility across Android, iOS, and macOS. The app supports multiple AI models like Amazon Bedrock, Ollama, DeepSeek, and OpenAI, and features a customizable system prompt assistant. With a minimalist design philosophy and robust privacy protection, SwiftChat delivers a seamless chat experience with various features like rich Markdown support, comprehensive multimodal analysis, creative image suite, and quick access tools. The app prioritizes speed in launch, request, render, and storage, ensuring a fast and efficient user experience. SwiftChat also emphasizes app privacy and security by encrypting API key storage, minimal permission requirements, local-only data storage, and a privacy-first approach.

github

: 343

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k

midscene

Midscene.js is an AI-powered automation SDK that allows users to control web pages, perform assertions, and extract data in JSON format using natural language. It offers features such as natural language interaction, understanding UI and providing responses in JSON, intuitive assertion based on AI understanding, compatibility with public multimodal LLMs like GPT-4o, visualization tool for easy debugging, and a brand new experience in automation development.

github

: 10.3k

Alice

Alice is an open-source AI companion designed to live on your desktop, providing voice interaction, intelligent context awareness, and powerful tooling. More than a chatbot, Alice is emotionally engaging and deeply useful, assisting with daily tasks and creative work. Key features include voice interaction with natural-sounding responses, memory and context management, vision and visual output capabilities, computer use tools, function calling for web search and task scheduling, wake word support, dedicated Chrome extension, and flexible settings interface. Technologies used include Vue.js, Electron, OpenAI, Go, hnswlib-node, and more. Alice is customizable and offers a dedicated Chrome extension, wake word support, and various tools for computer use and productivity tasks.

github

: 170

gemini-coder

Gemini Coder is a free 2M context AI coding assistant that allows users to conveniently copy folders and files for chatbots. It provides FIM completions, file refactoring, and AI-suggested changes. The extension is versatile, private, and lightweight, offering unmatched accuracy, speed, and cost in AI assistance. Users have full control over the context and coding conventions included, ensuring high performance and signal to noise ratio. Gemini Coder supports various chatbots and provides quick start guides for chat and FIM completions. It also offers commands for FIM completions, refactoring, applying changes, chat, and context copying. Users can set up custom model providers for API features and contribute to the project through pull requests or discussions. The tool is licensed under the MIT License.

github

: 67

replexica

Replexica is an i18n toolkit for React, to ship multi-language apps fast. It doesn't require extracting text into JSON files, and uses AI-powered API for content processing. It comes in two parts: 1. Replexica Compiler - an open-source compiler plugin for React; 2. Replexica API - an i18n API in the cloud that performs translations using LLMs. (Usage based, has a free tier.) Replexica supports several i18n formats: 1. JSON-free Replexica compiler format; 2. .md files for Markdown content; 3. Legacy JSON and YAML-based formats.

github

: 1.3k

ai

Jetify's AI SDK for Go is a unified interface for interacting with multiple AI providers including OpenAI, Anthropic, and more. It addresses the challenges of fragmented ecosystems, vendor lock-in, poor Go developer experience, and complex multi-modal handling by providing a unified interface, Go-first design, production-ready features, multi-modal support, and extensible architecture. The SDK supports language models, embeddings, image generation, multi-provider support, multi-modal inputs, tool calling, and structured outputs.

github

: 140

gateway

CentralMind Gateway is an AI-first data gateway that securely connects any data source and automatically generates secure, LLM-optimized APIs. It filters out sensitive data, adds traceability, and optimizes for AI workloads. Suitable for companies deploying AI agents for customer support and analytics.

github

: 210

ApeRAG

ApeRAG is a production-ready platform for Retrieval-Augmented Generation (RAG) that combines Graph RAG, vector search, and full-text search with advanced AI agents. It is ideal for building Knowledge Graphs, Context Engineering, and deploying intelligent AI agents for autonomous search and reasoning across knowledge bases. The platform offers features like advanced index types, intelligent AI agents with MCP support, enhanced Graph RAG with entity normalization, multimodal processing, hybrid retrieval engine, MinerU integration for document parsing, production-grade deployment with Kubernetes, enterprise management features, MCP integration, and developer-friendly tools for customization and contribution.

github

: 780

JamAIBase

JamAI Base is an open-source platform integrating SQLite and LanceDB databases with managed memory and RAG capabilities. It offers built-in LLM, vector embeddings, and reranker orchestration accessible through a spreadsheet-like UI and REST API. Users can transform static tables into dynamic entities, facilitate real-time interactions, manage structured data, and simplify chatbot development. The tool focuses on ease of use, scalability, flexibility, declarative paradigm, and innovative RAG techniques, making complex data operations accessible to users with varying technical expertise.

github

: 192

chunkhound

ChunkHound is a modern tool for transforming your codebase into a searchable knowledge base for AI assistants. It utilizes semantic search via the cAST algorithm and regex search, integrating with AI assistants through the Model Context Protocol (MCP). With features like cAST Algorithm, Multi-Hop Semantic Search, Regex search, and support for 22 languages, ChunkHound offers a local-first approach to code analysis and discovery. It provides intelligent code discovery, universal language support, and real-time indexing capabilities, making it a powerful tool for developers looking to enhance their coding experience.

github

: 90

For similar tasks

ComfyUI-OllamaGemini

github

: 120

ai-commits-intellij-plugin

AI Commits is a plugin for IntelliJ-based IDEs and Android Studio that generates commit messages using git diff and OpenAI. It offers features such as generating commit messages from diff using OpenAI API, computing diff only from selected files and lines in the commit dialog, creating custom prompts for commit message generation, using predefined variables and hints to customize prompts, choosing any of the models available in OpenAI API, setting OpenAI network proxy, and setting custom OpenAI compatible API endpoint.

github

: 664

extensionOS

Extension | OS is an open-source browser extension that brings AI directly to users' web browsers, allowing them to access powerful models like LLMs seamlessly. Users can create prompts, fix grammar, and access intelligent assistance without switching tabs. The extension aims to revolutionize online information interaction by integrating AI into everyday browsing experiences. It offers features like Prompt Factory for tailored prompts, seamless LLM model access, secure API key storage, and a Mixture of Agents feature. The extension was developed to empower users to unleash their creativity with custom prompts and enhance their browsing experience with intelligent assistance.

github

: 73

img-prompt

IMGPrompt is an AI prompt editor tailored for image and video generation tools like Stable Diffusion, Midjourney, DALL·E, FLUX, and Sora. It offers a clean interface for viewing and combining prompts with translations in multiple languages. The tool includes features like smart recommendations, translation, random color generation, prompt tagging, interactive editing, categorized tag display, character count, and localization. Users can enhance their creative workflow by simplifying prompt creation and boosting efficiency.

github

: 180

5ire

5ire is a cross-platform desktop client that integrates a local knowledge base for multilingual vectorization, supports parsing and vectorization of various document formats, offers usage analytics to track API spending, provides a prompts library for creating and organizing prompts with variable support, allows bookmarking of conversations, and enables quick keyword searches across conversations. It is licensed under the GNU General Public License version 3.

github

: 4.6k

sidecar

Sidecar is the AI brains of Aide the editor, responsible for creating prompts, interacting with LLM, and ensuring seamless integration of all functionalities. It includes 'tool_box.rs' for handling language-specific smartness, 'symbol/' for smart and independent symbols, 'llm_prompts/' for creating prompts, and 'repomap' for creating a repository map using page rank on code symbols. Users can contribute by submitting bugs, feature requests, reviewing source code changes, and participating in the development workflow.

github

: 517

labs-ai-tools-for-devs

This repository provides AI tools for developers through Docker containers, enabling agentic workflows. It allows users to create complex workflows using Dockerized tools and Markdown, leveraging various LLM models. The core features include Dockerized tools, conversation loops, multi-model agents, project-first design, and trackable prompts stored in a git repo.

github

: 174

Prompt_Engineering

Prompt Engineering Techniques is a comprehensive repository for learning, building, and sharing prompt engineering techniques, from basic concepts to advanced strategies for leveraging large language models. It provides step-by-step tutorials, practical implementations, and a platform for showcasing innovative prompt engineering techniques. The repository covers fundamental concepts, core techniques, advanced strategies, optimization and refinement, specialized applications, and advanced applications in prompt engineering.

github

: 3.0k

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 668

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

github

: 2.2k