ComfyUI-mnemic-nodes

Nodes: Get File Path, Save Text File, Download Image from URL, Tiktoken Tokenizer, String Cleaning, Groq LLM, VLM, ALM API

Stars: 53

Visit

ComfyUI-mnemic-nodes is a repository hosting a collection of nodes developed for ComfyUI, providing useful components to enhance project functionality. The nodes include features like returning file paths, saving text files, downloading images from URLs, tokenizing text, cleaning strings, querying Groq language models, generating negative prompts, and more. Some nodes are experimental and marked with a 'Caution' label. Installation instructions and setup details are provided for each node, along with examples and presets for different tasks.

README:

Nodes for ComfyUI

This repository hosts a collection of nodes developed for ComfyUI. It aims to share useful components that enhance the functionality of ComfyUI projects. Some nodes are forks or versions of nodes from other packs, some are bespoke and useful, and some are experimental and are quite useless, so they have been marked with a Caution label in this document.

📁 Get File Path - Returns the file path in different formats to a file in your /input-folder.

💾 Save Text File With Path Node - Save text file, and return the saved file's path.

🖼️ Download Image from URL Node - Download an image from the web.

🔠 Tiktoken Tokenizer Info - Returns token information about input text and lets you split it.

🧹 String Cleaning - Cleans up text strings.

🏷️ LoRA Loader Prompt Tags - Loads LoRA models using <lora:MyLoRA:1> in the prompt.

✨💬 Groq LLM API Node - Query Groq large language model.

✨📷 Groq VLM API Node - Query Groq vision language model.

✨📝 Groq ALM API Node - Query Groq Audio Model.

⛔ Generate Negative Prompt Node - Generate negative prompts automatically.

Installation instructions

You may need to manually install the requirements. They should be listed in requirements.txt You may need to install the following libraries using pip install XXX:

configparser
groq
transformers
torch

Configuration (only needed for Groq nodes)

Make a copy of .env.example and remove the .example from the name.
The new file should now be named .env without a normal file name, just a .env extension.
The file should be in the root of the node pack, so the same directory that the .example was in.
Edit .env with a text editor and edit the API key value inside.

📁 Get File Path

This node returns the file path of a given file in the \input-folder.

It is meant to have a browse-button so you can browse to any file, but it doesn't yet.

If you know how to add this, please let me know or do a pull request.

💾 Save Text File With Path Node

This node is adapted and enhanced from the Save Text File node found in the YMC GitHub ymc-node-suite-comfyui pack.

The node can now give you a full file path output if you need it, as well as output the file-name as a separate output, in case you need it for something else.

[!IMPORTANT]

2024-06-05 - Version 1.1.1

The node was severely updated so existing workflows are going to break. I won't do another overhaul like this.

The new node is more consistent in functionality and more intentional with the inputs and outputs.

It now handles more edge cases and supports both a prefix, suffix, a dynamic counting with customizable separator before/after the counter in the right circumstances.

Sorry for any troubles caused.

🖼️ Download Image from URL Node

This node downloads an image from an URL and lets you use it.

It also outputs the Width/Height of the image.

By default, it will save the image to the /input directory.
- Clear the save_path line to prevent saving the image (it will still be saved in the TEMP-folder).
If you enter a name in the save_file_name_override section, the file will be saved with this name.
- You can enter or ignore the file extension.
- If you enter one, it will rename the file to the chosen extension without converting the image.
Supported image formats: JPG, JPEG, PNG, WEBP.
Does not support saving with transparency.

[!IMPORTANT]

2024-09-14 - Version 1.2.0

This node was renamed in the code to match the functionality. This may break existing nodes.

🔠 Tiktoken Tokenizer Info

This node takes text as input, and returns a bunch of data from the tiktoken tokenizer.

It returns the following values:

token_count: Total number of tokens
character_count: Total number of characters
word_count: Total number of words
split_string: Tokenized list of strings
split_string_list: Tokenized list of strings (output as list)
split_token_ids: List of token IDs
split_token_ids_list: List of token IDs (output as list)
text_hash: Text hash
special_tokens_used: Special tokens used
special_tokens_used_list: Special tokens used (output as list)
token_chunk_by_size: Returns the input text, split into different strings in a list by the token_chunk_size value.
token_chunk_by_size_to_word: Same as above but respects "words" by stripping backwards to the nearest space and splitting the chunk there.
token_chunk_by_size_to_section: Same as above, but strips backwards to the nearest newline, period or comma.

Tokenization & Word Count example

Chunking Example

🧹 String Cleaning

This node helps you quickly clean up and format strings by letting you remove leading or trailing spaces, periods, commas, or custom text, as well as removing linebreaks, or replacing them with a period.

input_string: Your input string. Use ComfyUI-Easy-Use for looping through a list of strings.
collapse_sequential_spaces: Collapses sequential spaces (" ") in a string into one.
strip_leading_spaces: Removes any leading spaces from each line of the input string.
strip_trailing_spaces: Removes any trailing spaces from each line of the input string.
strip_leading_symbols: Removes leading punctuation symbols (, . ! ? : ;) from each line of the input string.
strip_trailing_symbols: Removes leading punctuation symbols (, . ! ? : ;) from each line of the input string.
strip_inside_tags: Removes any tags and the characters inside. <> would strip out anything like <html> or </div>, including the < and >
strip_newlines: Removes any linebreaks in the input string.
replace_newlines_with_period_space: Replaces any linebreaks in the input string with a ". ". If multiple linebreaks are found in a row, they will be replaced with a single ". ".
strip_leading_custom: Removes any leading characters, words or symbols from each line of the input string. One entry per line. Space (" ") is supported. Will be processed in order, so you can combine multiple lines. Does not support linebreak removal.
strip_trailing_custom: Removes any trailing characters, words or symbols from each line of the input string. One entry per line. Space (" ") is supported. Will be processed in order, so you can combine multiple lines. Does not support linebreak removal.
strip_all_custom: Removes any characters, words or symbols found anywhere in the text. One entry per line. Space (" ") is supported. Will be processed in order, so you can combine multiple lines. Does not support linebreak removal.
multiline_find: Find and replace for multiple entries. Will be processed in order.
multiline_replace: Find and replace for multiple entries. Will be processed in order.

Remove <think> tags

Clean up HTML

Make a salad

Work on your novel

🏷️ LoRA Loader Prompt Tags

Loads LoRA models using <lora:MyLoRA:1> in the prompt.

Route the model and clip through the node.

Use the output [STRING] to have the prompt without the <lora::>-tags.

✨💬 Groq LLM API Node

[!IMPORTANT]

2025-01-12 - Version 1.2.4

Moved groq API key to a .env instead of a config.ini-file. This will cause existing config setups to break with an update. Apologies for the inconvenience.

2024-09-14 - Version 1.2.0

This node was renamed to match the new VLM and ALM nodes added.

This node makes an API call to groq, and returns the response in text format.

Setup

You need to manually enter your groq API key into the GroqConfig.ini file.

Currently, the Groq API can be used for free, with very friendly and generous rate limits.

Settings

model: Choose from a drop-down one of the available models. The list need to be manually updated when they add additional models.

preset: This is a dropdown with a few preset prompts, the user's own presets, or the option to use a fully custom prompt. See examples and presets below.

system_message: The system message to send to the API. This is only used with the Use [system_message] and [user_input] option in the preset list. The other presets provide their own system message.

user_input: This is used with the Use [system_message] and [user_input], but can also be used with presets. In the system message, just mention the USER to refer to this input field. See the presets for examples.

temperature: Controls the randomness of the response. A higher temperature leads to more varied responses.

max_tokens: The maximum number of tokens that the model can process in a single response. Limits can be found here.

top_p: The threshold for the most probable next token to use. Higher values result in more predictable results.

seed: Random seed. Change the control_after_generate option below if you want to re-use the seed or get a new generation each time.

control_after_generate: Standard comfy seed controls. Set it to fixed or randomize based on your needs.

stop: Enter a word or stopping sequence which will terminate the AI's output. The string itself will not be returned.

Note: stop is not compatible with json_mode.

json_mode: If enabled, the model will output the result in JSON format.

Note: You must include a description of the desired JSON format in the system message. See the examples below.
Note: json_mode is not compatible with stop.

Examples and presets

The following presets can be found in the \nodes\groq\DefaultPrompts.json file. They can be edited, but it's better to copy the presets to the UserPrompts.json-file.

Use [system_message] and [user_input]

This preset, (default), means that the next two fields are fully utilized. Manually enter the instruction to the AI in the system_message field, and if you have any specific requests in the user_input field. Combined they make up the complete instruction to the LLM. Sometimes a system message is enough, and inside the system message you could even refer to the contents of the user input.

Generate a prompt about [user_input]

This is a tailored instruction that will return a randomized Stable Diffusion-like prompt. If you enter some text in the user_input area, you should get a prompt about this subject. You can also leave it empty and it will create its own examples based on the underlying prompt.

You should get better result from providing it with a short sentence to start it off.

Create a negative prompt for [user_input]

This will return a negative prompt which intends to be used together with the user_input string to complement it and enhance a resulting image.

List 10 ideas about [user_input]

This will return a list format of 10 subjects for an image, described in a simple and short style. These work good as user_input for the Generate a prompt about [user_input] preset.

Return a JSON prompt about [user_input]

You should also manually turn on json_mode when using this prompt. You should get a stable json formatted output from it in a similar style to the Generate a prompt about [user_input] above.

Note: You can actually use the entire result (JSON and all), as your prompt. Stable Diffusion seem to handle it quite fine.

Create your own presets

Edit the \nodes\groq\UserPrompts.json file to create your own presets.

Follow the existing structure and look at the DefaultPrompts.json for examples.

✨📷 Groq VLM API Node

[!IMPORTANT]

2024-09-27 - Version 1.2.1

Added new Llama 3.2 vision model to the list, but this model is not yet officially available. Once it is, this should automatically work.

Groq Vision Documentation

This node makes an API call to groq with an attached image and then uses Vision Language Models to return a description of the image, or answer to a question about the image in text format.

Setup

You need to manually enter your groq API key into the GroqConfig.ini file.

Currently, the Groq API can be used for free, with very friendly and generous rate limits.

Restrictions

Image Size Limit: The maximum allowed size for a request containing an image URL as input is 20MB. Requests larger than this limit will return a 400 error.

Request Size Limit (Base64 Enconded Images): The maximum allowed size for a request containing a base64 encoded image is 4MB. Requests larger than this limit will return a 413 error.

Example: Custom prompt

Example: Short Caption

Example: Medium Caption

Example: Long Caption

Example: Primary Color

✨📝 Groq ALM API Node

Groq Speech Documentation

This node makes an API call to groq with an attached audio file and then uses Audio Language Models to transcribe the audio and return the text in different output formats.

The model distil-whisper-large-v3-en only supports the language en. The model whisper-large-v3 supports the languages listed below. It can also be left empty, but this provides worse results than running the model locally.

[!NOTE] The presets / prompt do very little. They are meant to help you guide the output, but I don't get any relevant results.

You can convert the file_path to input to use the Get File Path node to find your files.

Supported Languages

https://www.wikiwand.com/en/articles/List_of_ISO_639_language_codes

is tg uz zh ru tr hi la tk haw fr vi cs hu kk he cy bs sw ht mn gl si mg sa es ja pt lt mr fa sl kn uk ms ta hr bg pa yi fo th lv ln ca br sq jv sn gu ba te bn et sd tl ha de hy so oc nn az km yo ko pl da mi ml ka am tt su yue nl no ne mt my ur ps ar id fi el ro as en it sk be lo lb bo sv sr mk eu

Setup

You need to manually enter your groq API key into the GroqConfig.ini file.

Currently, the Groq API can be used for free, with very friendly and generous rate limits.

Example: Transcribe meeting notes

Example: Generate image based on voice description or a story

Example: Transcribe song lyrics

Karaoke?

You can use this to generate files to use in a Karaoke app.

⛔ Generate Negative Prompt Node

[!CAUTION] This node is highly experimental, and does not produce any useful result right now. It also requires you to download a specially trained model for it. It's just not worth the effort. It's mostly here to share a work in progress project.

This node utilizes a GPT-2 text inference model to generate a negative prompt that is supposed to enhance the aspects of the positive prompt.

[!IMPORTANT] Installation Step: Download the weights.pt file from the project's Hugging Face repository.

Place the weights.pt file in the following directory of your ComfyUI setup without renaming it:
\ComfyUI\custom_nodes\ComfyUI-mnemic-nodes\nodes\negativeprompt
The directory should resemble the following structure:

For additional information, please visit the project's GitHub page.

For Tasks:

Click tags to check more tools for each tasks

save text file download image clean strings query language model generate negative prompt

For Jobs:

software developer data analyst ai engineer web developer content creator

Alternative AI tools for ComfyUI-mnemic-nodes

Similar Open Source Tools

ComfyUI-mnemic-nodes

github

: 53

UltraSinger

UltraSinger is a tool under development that automatically creates UltraStar.txt, midi, and notes from music. It pitches UltraStar files, adds text and tapping, creates separate UltraStar karaoke files, re-pitches current UltraStar files, and calculates in-game score. It uses multiple AI models to extract text from voice and determine pitch. Users should mention UltraSinger in UltraStar.txt files and only use it on Creative Commons licensed songs.

github

: 305

open-parse

Open Parse is a Python library for visually discerning document layouts and chunking them effectively. It is designed to fill the gap in open-source libraries for handling complex documents. Unlike text splitting, which converts a file to raw text and slices it up, Open Parse visually analyzes documents for superior LLM input. It also supports basic markdown for parsing headings, bold, and italics, and has high-precision table support, extracting tables into clean Markdown formats with accuracy that surpasses traditional tools. Open Parse is extensible, allowing users to easily implement their own post-processing steps. It is also intuitive, with great editor support and completion everywhere, making it easy to use and learn.

github

: 2.4k

metavoice-src

MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech). It has been built with the following priorities: * Emotional speech rhythm and tone in English. * Zero-shot cloning for American & British voices, with 30s reference audio. * Support for (cross-lingual) voice cloning with finetuning. * We have had success with as little as 1 minute training data for Indian speakers. * Synthesis of arbitrary length text

github

: 3.1k

browser

Lightpanda Browser is an open-source headless browser designed for fast web automation, AI agents, LLM training, scraping, and testing. It features ultra-low memory footprint, exceptionally fast execution, and compatibility with Playwright and Puppeteer through CDP. Built for performance, Lightpanda offers Javascript execution, support for Web APIs, and is optimized for minimal memory usage. It is a modern solution for web scraping and automation tasks, providing a lightweight alternative to traditional browsers like Chrome.

github

: 9.8k

patchwork

PatchWork is an open-source framework designed for automating development tasks using large language models. It enables users to automate workflows such as PR reviews, bug fixing, security patching, and more through a self-hosted CLI agent and preferred LLMs. The framework consists of reusable atomic actions called Steps, customizable LLM prompts known as Prompt Templates, and LLM-assisted automations called Patchflows. Users can run Patchflows locally in their CLI/IDE or as part of CI/CD pipelines. PatchWork offers predefined patchflows like AutoFix, PRReview, GenerateREADME, DependencyUpgrade, and ResolveIssue, with the flexibility to create custom patchflows. Prompt templates are used to pass queries to LLMs and can be customized. Contributions to new patchflows, steps, and the core framework are encouraged, with chat assistants available to aid in the process. The roadmap includes expanding the patchflow library, introducing a debugger and validation module, supporting large-scale code embeddings, parallelization, fine-tuned models, and an open-source GUI. PatchWork is licensed under AGPL-3.0 terms, while custom patchflows and steps can be shared using the Apache-2.0 licensed patchwork template repository.

github

: 1.3k

gpt-engineer

GPT-Engineer is a tool that allows you to specify a software in natural language, sit back and watch as an AI writes and executes the code, and ask the AI to implement improvements.

github

: 51.9k

honcho

Honcho is a platform for creating personalized AI agents and LLM powered applications for end users. The repository is a monorepo containing the server/API for managing database interactions and storing application state, along with a Python SDK. It utilizes FastAPI for user context management and Poetry for dependency management. The API can be run using Docker or manually by setting environment variables. The client SDK can be installed using pip or Poetry. The project is open source and welcomes contributions, following a fork and PR workflow. Honcho is licensed under the AGPL-3.0 License.

github

: 228

voice-chat-ai

Voice Chat AI is a project that allows users to interact with different AI characters using speech. Users can choose from various characters with unique personalities and voices, and have conversations or role play with them. The project supports OpenAI, xAI, or Ollama language models for chat, and provides text-to-speech synthesis using XTTS, OpenAI TTS, or ElevenLabs. Users can seamlessly integrate visual context into conversations by having the AI analyze their screen. The project offers easy configuration through environment variables and can be run via WebUI or Terminal. It also includes a huge selection of built-in characters for engaging conversations.

github

: 193

sd-webui-agent-scheduler

AgentScheduler is an Automatic/Vladmandic Stable Diffusion Web UI extension designed to enhance image generation workflows. It allows users to enqueue prompts, settings, and controlnets, manage queued tasks, prioritize, pause, resume, and delete tasks, view generation results, and more. The extension offers hidden features like queuing checkpoints, editing queued tasks, and custom checkpoint selection. Users can access the functionality through HTTP APIs and API callbacks. Troubleshooting steps are provided for common errors. The extension is compatible with latest versions of A1111 and Vladmandic. It is licensed under Apache License 2.0.

github

: 646

vidai

vidai is a CLI tool for RunwayML that generates videos using AI. It supports Gen3 and Gen3 Turbo models, allowing users to create videos directly from the command line using text or image prompts. Users can also extend videos, edit videos, and explore unlimited generations. The tool requires a RunwayML account and ffmpeg for extended videos.

github

: 54

lmql

LMQL is a programming language designed for large language models (LLMs) that offers a unique way of integrating traditional programming with LLM interaction. It allows users to write programs that combine algorithmic logic with LLM calls, enabling model reasoning capabilities within the context of the program. LMQL provides features such as Python syntax integration, rich control-flow options, advanced decoding techniques, powerful constraints via logit masking, runtime optimization, sync and async API support, multi-model compatibility, and extensive applications like JSON decoding and interactive chat interfaces. The tool also offers library integration, flexible tooling, and output streaming options for easy model output handling.

github

: 3.4k

Bjornulf_custom_nodes

github

: 87

transcriptionstream

Transcription Stream is a self-hosted diarization service that works offline, allowing users to easily transcribe and summarize audio files. It includes a web interface for file management, Ollama for complex operations on transcriptions, and Meilisearch for fast full-text search. Users can upload files via SSH or web interface, with output stored in named folders. The tool requires a NVIDIA GPU and provides various scripts for installation and running. Ports for SSH, HTTP, Ollama, and Meilisearch are specified, along with access details for SSH server and web interface. Customization options and troubleshooting tips are provided in the documentation.

github

: 74

gitingest

GitIngest is a tool that allows users to turn any Git repository into a prompt-friendly text ingest for LLMs. It provides easy code context by generating a text digest from a git repository URL or directory. The tool offers smart formatting for optimized output format for LLM prompts and provides statistics about file and directory structure, size of the extract, and token count. GitIngest can be used as a CLI tool on Linux and as a Python package for code integration. The tool is built using Tailwind CSS for frontend, FastAPI for backend framework, tiktoken for token estimation, and apianalytics.dev for simple analytics. Users can self-host GitIngest by building the Docker image and running the container. Contributions to the project are welcome, and the tool aims to be beginner-friendly for first-time contributors with a simple Python and HTML codebase.

github

: 7.9k

telemetry-airflow

This repository codifies the Airflow cluster that is deployed at workflow.telemetry.mozilla.org (behind SSO) and commonly referred to as "WTMO" or simply "Airflow". Some links relevant to users and developers of WTMO: * The `dags` directory in this repository contains some custom DAG definitions * Many of the DAGs registered with WTMO don't live in this repository, but are instead generated from ETL task definitions in bigquery-etl * The Data SRE team maintains a WTMO Developer Guide (behind SSO)

github

: 185

For similar tasks

ComfyUI-mnemic-nodes

github

: 53

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 668

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k