comfyui_LLM_Polymath

An advanced chat node integrating LLMs, real-time web search, image handling, and image scraping. Supports APIs from OpenAI, Google, Anthropic, Grok, DeepSeek, and local Ollama. Includes custom node finder, smart assistant tools, and growing subnodes like text masking and concept eraser.

Stars: 54

Visit

README:

LLM Polymath Chat Node

An advanced Chat Node for ComfyUI that integrates large language models to build text-driven applications and automate data processes (RAGs), enhancing prompt responses by optionally incorporating real-time web search, linked content extraction, and custom agent instructions. It supports both OpenAI’s GPT-like models and alternative models served via a local Ollama API. At its core, two essential instructions—the Comfy Node Finder, which retrieves relevant custom nodes from a the ComfyUi- Manager Custom-node-JSON database based on your queries, and the Smart Assistant, which ingests your workflow JSON to deliver tailored, actionable recommendations—drive its powerful, context-aware functionality. Additionally, a range of other agents such as Flux Prompter, several Custom Instructors, a Python debugger and scripter, and many more further extend its capabilities.

---

Features

Prompt Processing

Placeholder Replacement: If your prompt contains the placeholder {additional_text}, the node replaces it with the provided additional text. Otherwise, any additional text is appended.
Dynamic Augmentation: Depending on the settings, the node can automatically augment your prompt with web-fetched content or search results.

Web Search Integration

URL Extraction: Scans the input prompt for URLs and uses BeautifulSoup to extract text from the linked pages.
Google Search Results: If enabled, it performs a Google search for your query, retrieves a specified number of results (controlled via num_search_results), and appends the extracted content to the prompt.
Source Listing: Optionally appends all fetched sources to the prompt so that the language model’s response can reference them.

Model & API Integration

Model Loading: Loads model configurations from a local config.json file and fetches additional models from an Ollama API endpoint (http://127.0.0.1:11434/api/tags).
API Selection:
- The model automatically selects the API depending on which model is selected from the list. The polymath currently supports Grok, Gemini, Gemma, Deepseek, and Claude.
- If Ollama is installed and running, the node uses the Ollama API.
Chat History: Optionally retains context from previous interactions to allow for more natural, continuous conversations.

Custom Instructions

Instruction Files: The node scans a custom_instructions directory for .txt files and makes them available as options.
Node Finder Specialization: If the custom instruction named “node finder” is selected, the node loads and appends information from a JSON file (custom-node-list.json) to aid in finding specific nodes.

Image Handling

Image Conversion: Converts provided image tensors into PIL images and encodes them as base64 strings. These are then included in the payload sent to the language model, enabling multimodal interactions.

Logging & Debugging

Console Logging: When enabled (Console_log), detailed information about the prompt augmentation process and API calls is printed to the console.
Seed Control: Uses a user-provided seed to help manage randomness and reproducibility.

Output Compression

Compression Options: If compression is enabled, the node appends a directive to the prompt that instructs the model to produce concise output. Three levels are available:
- Soft: Maximum output size ~250 characters.
- Medium: Maximum output size ~150 characters.
- Hard: Maximum output size ~75 characters.

USP Comfy Agents

ComfyUI Node Assistant

An advanced Agent that analyzes your specific use case and strictly uses the provided ../ComfyUI-Manager/custom-node-list.json reference to deliver consistent structured, ranked recommendations featuring node names, detailed descriptions, categories, inputs/outputs, and usage notes; it dynamically refines suggestions based on your requirements, ensuring you access both top-performing and underrated nodes categorized as Best Image Processing Nodes, Top Text-to-Image Nodes, Essential Utility Nodes, Best Inpainting Nodes, Advanced Control Nodes, Performance Optimization Nodes, Hidden Gems, Latent Processing Nodes, Mathematical Nodes, Noise Processing Nodes, Randomization Nodes, and Display & Show Nodes for optimal functionality, efficiency, and compatibility.

ComfyUI Smart Assistant

ComfyUI Smart Assistant Instruction: An advanced, context-aware AI integration that ingests your workflow JSON to thoroughly analyze your unique use case and deliver tailored, high-impact recommendations presented as structured, ranked insights—with each recommendation accompanied by names, detailed descriptions, categorical breakdowns, input/output specifications, and usage notes—while dynamically adapting to your evolving requirements through in-depth comparisons, alternative methodologies, and layered workflow enhancements; its robust capabilities extend to executing wildcard searches, deploying comprehensive error-handling strategies, offering real-time monitoring insights, and providing seamless integration guidance, all meticulously organized into key sections such as "Best Workflow Enhancements," "Essential Automation Tools," "Performance Optimization Strategies," "Advanced Customization Tips," "Hidden Gems & Lesser-Known Features," "Troubleshooting & Debugging," "Integration & Compatibility Advice," "Wildcard & Exploratory Searches," "Security & Compliance Measures," and "Real-Time Feedback & Monitoring"—ensuring peak functionality, efficiency, and compatibility while maximizing productivity and driving continuous improvement.

Polymath Scraper

An automated web scraper node designed for seamless gallery extraction, allowing users to input a gallery website URL and retrieve image data efficiently. Built on gallery-dl, it supports all websites listed in the official repository. with key keatures such as:

URL-Based Extraction: Simply input a gallery URL to fetch images.
Wide Website Support: Compatible with all sites supported by gallery-dl.
Output-Ready for Training: Provides structured outputs:
- List of Image Files: Downloaded images ready for use.
- List of Filenames: Organized for captioning and dataset creation.
Modular Integration: Stack with the LLM Polymath Node for automated captioning, enabling end-to-end dataset preparation.

Ideal for creating large, labeled datasets for AI model training, reducing manual effort and streamlining workflow efficiency.

Polymath Settings Node

A versatile configuration node providing essential settings for language models (LLMs) and image generation workflows. Designed for maximum flexibility and control, it allows fine-tuning of generative behavior across multiple engines including text and image generation APIs.

Comprehensive LLM Controls: Fine-tune generative text outputs with key parameters:
- Temperature: Adjusts randomness in output (0.0–2.0, default 0.8).
- Top-p (Nucleus Sampling): Controls diversity via probability mass (0.0–1.0, default 0.95).
- Top-k: Limits to top-k most likely tokens (0–100, default 40).
- Max Output Tokens: Sets maximum length of output (-1 to 65536, default 1024).
- Response Format JSON: Toggle structured JSON output (default: False).
- Ollama Keep Alive: Controls idle timeout for Ollama connections (1–10, default 5).
- Request Timeout: Timeout for generation requests (0–600 sec, default 120).
DALL·E Image Settings: Customize generation style and quality:
- Quality: Choose between "standard" and "hd" (default: standard).
- Style: Select image tone, either "vivid" or "natural" (default: vivid).
- Size: Specify dimensions (1024x1024, 1792x1024, 1024x1792; default: 1024x1024).
- Number of Images: Set number of outputs per request (1–4, default: 1).

Usage

Input Options

The node exposes a range of configurable inputs:

Prompt: Main query text. Supports {additional_text} placeholders.
Additional Text: Extra text that supplements or replaces the placeholder in the prompt.
Seed: Integer seed for reproducibility.
Model: Dropdown of available models (merged from config.json and Ollama API).
Custom Instruction: Choose from available instruction files.
Enable Web Search: Toggle for fetching web content.
List Sources: Whether to append the fetched sources to the prompt.
Number of Search Results: Determines how many search results to process.
Keep Context: Whether to retain chat history across interactions.
Compress & Compression Level: Enable output compression and choose the level.
Console Log: Toggle detailed logging.
Image: Optionally pass an image tensor for multimodal input.

Installation

Clone the Repository:

git clone https://github.com/lum3on/comfyui_LLM_Polymath.git
cd comfyui_LLM_Polymath

Install Dependencies: The node automatically attempts to install missing Python packages (such as googlesearch, requests, and bs4). However, you can also manually install dependencies using:
```
pip install -r requirements.txt
```

Set the key in your Environment Variables: create a .env file in your comfy root folder and set your api-keys in the file like this:

OPENAI_API_KEY="your_api_key_here"
ANTHROPIC_API_KEY="your_anthropic_api_key_here"
XAI_API_KEY="your_xai_api_key_here"
DEEPSEEK_API_KEY="your_deepseek_api_key_here"
GEMINI_API_KEY="your_gemini_api_key_here"

Ollama Installation & Model Download

OLLAMA (Ollama) enables you to run large language models locally with a few simple commands. Follow these instructions to install OLLAMA and download models.

Installing OLLAMA

On macOS:

Download the installer from the official website or install via Homebrew:

brew install ollama

On Linux:

Run the installation script directly from your terminal:

curl -fsSL https://ollama.com/install.sh | sh

On Windows:

Visit the Ollama Download Page and run the provided installer.

Downloading Models

Once OLLAMA is installed, you can easily pull and run models. For example, to download the lightweight Gemma 2B model:

ollama pull gemma:2b

After downloading, you can start interacting with the model using:

ollama run gemma:2b

For a full list of available models (including various sizes and specialized variants), please visit the official Ollama Model Library.

Model Availability in Comfy

After you download a model via Ollama, it will automatically be listed in the model dropdown in Comfy after you restart it. This seamless integration means you don’t need to perform any additional configuration—the model is ready for use immediately within Comfy.

Example Workflow

Install OLLAMA on your system using the method appropriate for your operating system.
Download a Model with the ollama pull command or use the run command and the model gets auto downloaded.
Run the Model with ollama run <model-name> to start a REPL and interact with it.
Keep the Cli open so that Comfy can acess the local Olama api
Restart Comfy to have the downloaded model automatically appear in the model dropdown for easy selection.

By following these steps, you can quickly set up OLLAMA on your machine and begin experimenting with different large language models locally.

For further details on model customization and advanced usage, refer to the official documentation at Ollama Docs.

Planned Updates

The following features are planned for the next Update.

[ ] Node Finder Implementation in ComfyUI Manager: Integrate a full-featured node finder in the Comfy Manager
[X] Advanced Parameter Node: Introduce an enhanced parameter node offering additional customization and control.
[ ] Speed Improvements: Optimize processing and API response times for a more fluid user experience.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

This node integrates several libraries and APIs to deliver an advanced multimodal, web-augmented chat experience. Special thanks to all contributors and open source projects that made this work possible.

For any questions or further assistance, please open an issue on GitHub or contact the maintainer.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for comfyui_LLM_Polymath

Similar Open Source Tools

comfyui_LLM_Polymath

github

: 54

eole

EOLE is an open language modeling toolkit based on PyTorch. It aims to provide a research-friendly approach with a comprehensive yet compact and modular codebase for experimenting with various types of language models. The toolkit includes features such as versatile training and inference, dynamic data transforms, comprehensive large language model support, advanced quantization, efficient finetuning, flexible inference, and tensor parallelism. EOLE is a work in progress with ongoing enhancements in configuration management, command line entry points, reproducible recipes, core API simplification, and plans for further simplification, refactoring, inference server development, additional recipes, documentation enhancement, test coverage improvement, logging enhancements, and broader model support.

github

: 106

kollektiv

Kollektiv is a Retrieval-Augmented Generation (RAG) system designed to enable users to chat with their favorite documentation easily. It aims to provide LLMs with access to the most up-to-date knowledge, reducing inaccuracies and improving productivity. The system utilizes intelligent web crawling, advanced document processing, vector search, multi-query expansion, smart re-ranking, AI-powered responses, and dynamic system prompts. The technical stack includes Python/FastAPI for backend, Supabase, ChromaDB, and Redis for storage, OpenAI and Anthropic Claude 3.5 Sonnet for AI/ML, and Chainlit for UI. Kollektiv is licensed under a modified version of the Apache License 2.0, allowing free use for non-commercial purposes.

github

: 74

Director

Director is a framework to build video agents that can reason through complex video tasks like search, editing, compilation, generation, etc. It enables users to summarize videos, search for specific moments, create clips instantly, integrate GenAI projects and APIs, add overlays, generate thumbnails, and more. Built on VideoDB's 'video-as-data' infrastructure, Director is perfect for developers, creators, and teams looking to simplify media workflows and unlock new possibilities.

github

: 791

Simplifine

Simplifine is an open-source library designed for easy LLM finetuning, enabling users to perform tasks such as supervised fine tuning, question-answer finetuning, contrastive loss for embedding tasks, multi-label classification finetuning, and more. It provides features like WandB logging, in-built evaluation tools, automated finetuning parameters, and state-of-the-art optimization techniques. The library offers bug fixes, new features, and documentation updates in its latest version. Users can install Simplifine via pip or directly from GitHub. The project welcomes contributors and provides comprehensive documentation and support for users.

github

: 65

llm-answer-engine

This repository contains the code and instructions needed to build a sophisticated answer engine that leverages the capabilities of Groq, Mistral AI's Mixtral, Langchain.JS, Brave Search, Serper API, and OpenAI. Designed to efficiently return sources, answers, images, videos, and follow-up questions based on user queries, this project is an ideal starting point for developers interested in natural language processing and search technologies.

github

: 4.5k

TaskingAI

TaskingAI brings Firebase's simplicity to **AI-native app development**. The platform enables the creation of GPTs-like multi-tenant applications using a wide range of LLMs from various providers. It features distinct, modular functions such as Inference, Retrieval, Assistant, and Tool, seamlessly integrated to enhance the development process. TaskingAI’s cohesive design ensures an efficient, intelligent, and user-friendly experience in AI application development.

github

: 6.1k

voice-pro

Voice-Pro is an integrated solution for subtitles, translation, and TTS. It offers features like multilingual subtitles, live translation, vocal remover, and supports OpenAI Whisper and Open-Source Translator. The tool provides a Studio tab for various functions, Whisper Caption tab for subtitle creation, Translate tab for translation, TTS tab for text-to-speech, Live Translation tab for real-time voice recognition, and Batch tab for processing multiple files. Users can download YouTube videos, improve voice recognition accuracy, create automatic subtitles, and produce multilingual videos with ease. The tool is easy to install with one-click and offers a Web-UI for user convenience.

github

: 233

MARBLE

MARBLE (Multi-Agent Coordination Backbone with LLM Engine) is a modular framework for developing, testing, and evaluating multi-agent systems leveraging Large Language Models. It provides a structured environment for agents to interact in simulated environments, utilizing cognitive abilities and communication mechanisms for collaborative or competitive tasks. The framework features modular design, multi-agent support, LLM integration, shared memory, flexible environments, metrics and evaluation, industrial coding standards, and Docker support.

github

: 61

restai

RestAI is an AIaaS (AI as a Service) platform that allows users to create and consume AI agents (projects) using a simple REST API. It supports various types of agents, including RAG (Retrieval-Augmented Generation), RAGSQL (RAG for SQL), inference, vision, and router. RestAI features automatic VRAM management, support for any public LLM supported by LlamaIndex or any local LLM supported by Ollama, a user-friendly API with Swagger documentation, and a frontend for easy access. It also provides evaluation capabilities for RAG agents using deepeval.

github

: 416

open-webui-tools

Open WebUI Tools Collection is a set of tools for structured planning, arXiv paper search, Hugging Face text-to-image generation, prompt enhancement, and multi-model conversations. It enhances LLM interactions with academic research, image generation, and conversation management. Tools include arXiv Search Tool and Hugging Face Image Generator. Function Pipes like Planner Agent offer autonomous plan generation and execution. Filters like Prompt Enhancer improve prompt quality. Installation and configuration instructions are provided for each tool and pipe.

github

: 131

clearml-server

ClearML Server is a backend service infrastructure for ClearML, facilitating collaboration and experiment management. It includes a web app, RESTful API, and file server for storing images and models. Users can deploy ClearML Server using Docker, AWS EC2 AMI, or Kubernetes. The system design supports single IP or sub-domain configurations with specific open ports. ClearML-Agent Services container allows launching long-lasting jobs and various use cases like auto-scaler service, controllers, optimizer, and applications. Advanced functionality includes web login authentication and non-responsive experiments watchdog. Upgrading ClearML Server involves stopping containers, backing up data, downloading the latest docker-compose.yml file, configuring ClearML-Agent Services, and spinning up docker containers. Community support is available through ClearML FAQ, Stack Overflow, GitHub issues, and email contact.

github

: 364

omniscient

Omniscient is an advanced AI Platform offered as a SaaS, empowering projects with cutting-edge artificial intelligence capabilities. Seamlessly integrating with Next.js 14, React, Typescript, and APIs like OpenAI and Replicate, it provides solutions for code generation, conversation simulation, image creation, music composition, and video generation.

github

: 82

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k

gemini-android

Gemini Android is a repository showcasing Google's Generative AI on Android using Stream Chat SDK for Compose. It demonstrates the Gemini API for Android, implements UI elements with Jetpack Compose, utilizes Android architecture components like Hilt and AppStartup, performs background tasks with Kotlin Coroutines, and integrates chat systems with Stream Chat Compose SDK for real-time event handling. The project also provides technical content, instructions on building the project, tech stack details, architecture overview, modularization strategies, and a contribution guideline. It follows Google's official architecture guidance and offers a real-world example of app architecture implementation.

github

: 303

Vodalus-Expert-LLM-Forge

Vodalus Expert LLM Forge is a tool designed for crafting datasets and efficiently fine-tuning models using free open-source tools. It includes components for data generation, LLM interaction, RAG engine integration, model training, fine-tuning, and quantization. The tool is suitable for users at all levels and is accompanied by comprehensive documentation. Users can generate synthetic data, interact with LLMs, train models, and optimize performance for local execution. The tool provides detailed guides and instructions for setup, usage, and customization.

github

: 131

For similar tasks

No tools available

For similar jobs

No tools available