
open-webui-tools
a Repository of Open-WebUI tools to use with your favourite LLMs
Stars: 131

Open WebUI Tools Collection is a set of tools for structured planning, arXiv paper search, Hugging Face text-to-image generation, prompt enhancement, and multi-model conversations. It enhances LLM interactions with academic research, image generation, and conversation management. Tools include arXiv Search Tool and Hugging Face Image Generator. Function Pipes like Planner Agent offer autonomous plan generation and execution. Filters like Prompt Enhancer improve prompt quality. Installation and configuration instructions are provided for each tool and pipe.
README:
A collection of tools for Open WebUI that provides structured planning and execution capability, arXiv paper search capabilities, Hugging Face text-to-image generation functionality, prompt enhancement, and multi-model conversations. Perfect for enhancing your LLM interactions with academic research, image generation, and advanced conversation management!
Search arXiv.org for relevant academic papers on any topic. No API key required!
Features:
- Search across paper titles, abstracts, and full text
- Returns detailed paper information including:
- Title
- Authors
- Publication date
- URL
- Abstract
- Automatically sorts by most recent submissions
- Returns up to 5 most relevant papers
Generate high-quality images from text descriptions using Hugging Face's Stable Diffusion models.
Features:
- Multiple image format options:
- Default/Square (1024x1024)
- Landscape (1024x768)
- Landscape Large (1440x1024)
- Portrait (768x1024)
- Portrait Large (1024x1440)
- Customizable model endpoint
- High-resolution output
This powerful agent allows you to define a goal, and it will autonomously generate and execute a plan to achieve it. The Planner is a generalist agent, capable of handling any text-based task, making it ideal for complex, multi-step requests that would typically require multiple prompts and manual intervention.
It features advanced capabilities like:
- Automatic Plan Generation: Breaks down your goal into a sequence of actionable steps with defined dependencies.
- Adaptive Execution: Executes each step, dynamically adjusting to the results of previous actions.
- LLM-Powered Consolidation: Intelligently merges the outputs of different steps into a coherent final result.
- Reflection and Refinement: Analyzes the output of each step, identifies potential issues, and iteratively refines the output through multiple attempts.
- Robust Error Handling: Includes retries and fallback mechanisms to ensure successful execution even with occasional API errors.
- Detailed Execution Summary: Provides a comprehensive report of the plan execution, including timings and potential issues.
Features:
- General Purpose: Can handle a wide range of text-based tasks, from creative writing and code generation to research summarization and problem-solving.
- Multi-Step Task Management: Excels at managing complex tasks that require multiple steps and dependencies.
- Context Awareness: Maintains context throughout the execution process, ensuring that each step builds upon the previous ones.
- Output Optimization: Employs a reflection mechanism to analyze and improve the output of each step through multiple iterations.
Search arXiv.org for relevant academic papers on any topic. No API key required!
Features:
- Comprehensive Search: Searches across paper titles, abstracts, and full text content from both arXiv and the web using Tavily.
- MCTS-Driven Refinement: Employs a Monte Carlo Tree Search (MCTS) approach to iteratively refine a research summary on a given topic.
- Adaptive Temperature Control: Offers both static and dynamic temperature decay settings. Static decay progressively reduces the LLM's temperature with each level of the search tree. Dynamic decay adjusts the temperature based on both depth and parent node scores, allowing the LLM to explore more diverse options when previous results are less promising. This fine-grained control balances exploration and exploitation for optimal refinement.
- Visual Tree Representation: Provides a visual representation of the search tree, offering intuitive feedback on the exploration process and the relationships between different research directions.
- Transparent Intermediate Steps: Shows intermediate steps of the search, allowing users to track the evolution of the research summary and understand the reasoning behind the refinements.
- Configurable Search Scope: Allows users to configure the breadth and depth of the search (tree width and depth) to control the exploration scope and computational resources used.
This pipe allows you to simulate conversations between multiple language models, each acting as a distinct character. You can configure up to 5 participants, each with their own model, alias, and character description (system message). This enables complex and dynamic interactions, perfect for storytelling, roleplaying, or exploring different perspectives on a topic.
Features:
- Multiple Participants: Simulate conversations with up to 5 different language models.
- Character Definition: Craft unique personas for each participant using system messages.
- Round-Robin Turns: Control the flow of conversation with configurable rounds per user message.
- Group-Chat-Manager: Use an LLM model to select the next participant in the conversation. (toggleable in valves)
- Streaming Support: See the conversation unfold in real-time with streaming output.
Analyze resumes and provide tags, first impressions, adversarial analysis, potential interview questions, and career advice.
Features:
- Resume Analysis: Breaks down a resume into relevant categories, highlighting strengths and weaknesses.
- Tags Generation: Identifies key skills and experience from the resume and assigns relevant tags.
- First Impression: Provides an initial assessment of the resume's effectiveness in showcasing the candidate's qualifications for a target role.
- Adversarial Analysis: Compares the analyzed resume to similar ones, offering actionable feedback on areas for improvement.
- Interview Questions: Suggests insightful questions tailored to the candidate's experience and the target role.
- Career Advisor Response: Offers personalized career advice based on the resume analysis and conversation history.
This filter uses an LLM to automatically improve the quality of your prompts before they are sent to the main language model. It analyzes your prompt and the conversation history to create a more detailed, specific, and effective prompt, leading to better responses.
Features:
- Context-Aware Enhancement: Considers the entire conversation history when refining the prompt.
- Customizable Template: Control the behavior of the prompt enhancer with a customizable template.
- Improved Response Quality: Get more relevant and insightful responses from the main LLM.
1. Installing from Haervwe's Open WebUI Hub (Recommended):
-
Visit https://openwebui.com/u/haervwe to access the collection of tools.
-
For Tools (arXiv Search Tool, Hugging Face Image Generator):
- Locate the desired tool on the hub page.
- Click the "Get" button next to the tool. This will redirect you to your Open WebUI instance and automatically populate the installation code.
- (Optional) Review the code, provide a name and description (if needed),
- Save the tool.
-
For Function Pipes (Planner Agent, arXiv Research MCTS Pipe, Multi Model Conversations) and Filters (Prompt Enhancer):
- Locate the desired function pipe or filter on the hub page.
- Click the "Get" button. This will, again, redirect you to your Open WebUI instance with the installation code.
- (Optional) Review the code, provide a different name and description,
- Save the function.
2. Manual Installation from the Open WebUI Interface:
-
For Tools (arXiv Search Tool, Hugging Face Image Generator):
- In your Open WebUI instance, navigate to the "Workspace" tab, then the "Tools" section.
- Click the "+" button.
- Copy the entire code of the respective
.py
file from this repository. - Paste the code into the text area in the Open WebUI interface.
- Provide a name and description , and save the tool.
-
For Function Pipes (Planner Agent, arXiv Research MCTS Pipe, Multi Model Conversations) and Filters (Prompt Enhancer):
- Navigate to the "Workspace" tab, then the "Functions" section.
- Click the "+" button.
- Copy and paste the code from the corresponding
.py
file. - Provide a name and description, and save.
Important Note for the Prompt Enhancer Filter: * To use the Prompt Enhancer, you must create a new model configuration in Open WebUI. * Go to "Workspace" -> "Models" -> "+". * Select a base model. * In the "Filters" section of the model configuration, enable the "Prompt Enhancer" filter.
- Model: the model id from your llm provider conected to Open-WebUI
- Action-Model: the model to be used in the task execution , leave as default to use the same in all the process.
- Concurrency: ("Concurrency support is currently experimental. Due to resource limitations, comprehensive testing of concurrent LLM operations has not been possible. Users may experience unexpected behavior when running multiple LLM processes simultaneously. Further testing and optimization are planned.")
- Max retries: Number of times the refelction step and subsequent refinement can happen per step.
No configuration required! The tool works out of the box.
- Model: The model ID from your LLM provider connected to Open WebUI.
- Tavily API Key: Required. Obtain your API key from tavily.com. This is used for web searches.
- Max Web Search Results: The number of web search results to fetch per query.
- Max arXiv Results: The number of results to fetch from the arXiv API per query.
- Tree Breadth: The number of child nodes explored during each iteration of the MCTS algorithm. This controls the width of the search tree.
- Tree Depth: The number of iterations of the MCTS algorithm. This controls the depth of the search tree.
- Exploration Weight: A constant (recommended range 0-2) controlling the balance between exploration and exploitation. Higher values encourage exploration of new branches, while lower values favor exploitation of promising paths.
- Temperature Decay: Exponentially decreases the LLM's temperature parameter with increasing tree depth. This focuses the LLM's output from creative exploration to refinement as the search progresses.
- Dynamic Temperature Adjustment: Provides finer-grained control over temperature decay based on parent node scores. If a parent node has a low score, the temperature is increased for its children, encouraging more diverse outputs and potentially uncovering better paths.
- Maximum Temperature: The initial temperature of the LLM (0-2, default 1.4). Higher temperatures encourage more diverse and creative outputs at the beginning of the search.
- Minimum Temperature: The final temperature of the LLM at maximum tree depth (0-2, default 0.5). Lower temperatures promote focused refinement of promising branches.
-
Number of Participants: Set the number of participants (1-5).
-
Rounds per User Message: Configure how many rounds of replies occur before the user can send another message.
-
Participant [1-5] Model: Select the model for each participant.
-
Participant [1-5] Alias: Set a display name for each participant.
-
Participant [1-5] System Message: Define the persona and instructions for each participant.
-
All Participants Appended Message: A global instruction appended to each participant's prompt.
-
Temperature, Top_k, Top_p: Standard model parameters.
-
**(note, the valves for the characters that wont be used must be setted to default or have correct paramenters)
- Model: The model ID from your LLM provider connected to Open WebUI.
- Dataset Path: Local path to the resume dataset CSV file. Includes "Category" and "Resume" columns.
- RapidAPI Key (optional): Required for job search functionality. Obtain an API key from RapidAPI Jobs API.
- Web Search: Enable/disable web search for relevant job postings.
- Prompt templates: Customizable templates for all the steps
Required configuration in Open WebUI:
- API Key (Required): Obtain a Hugging Face API key from your HuggingFace account and set it in the tool's configuration in Open WebUI
- API URL (Optional): Uses Stability AI's SD 3.5 Turbo model as Default,Can be customized to use other HF text-to-image model endpoints such as flux
- User Customizable Template: Allows you to tailor the instructions given to the prompt-enhancing LLM.
- Show Status: Displays status updates during the enhancement process.
- Show Enhanced Prompt: Outputs the enhanced prompt to the chat window for visibility.
- Model ID: Select the specific model to use for prompt enhancement.
Select the pipe with the corresponding model, it show as this:
# Example usage in your prompt
"Create a fully-featured Single Page Application (SPA) for the conways game of life, including a responsive UI. No frameworks No preprocessor, No minifing, No back end, ONLY Clean and CORRECT HTML JS AND CSS PLAIN""
Select the pipe with the corresponding model, it show as this:
# Example usage in your prompt
"Do a research summary on "DPO laser LLM training"
1.Select the pipe in the Open WebUI interface.
2.Configure the valves (settings) for the desired conversation setup in the admin panel.
3 Start the conversation by sending a user message to the conversation pipe.
Usage:
- Select the Resume Analyzer Pipe in the Open WebUI interface.
- Configure the valves with the desired model, dataset path (optional), and other settings.
- Send a resume text as an attachment (make sure to user whle document setting) and a message to start the analysis process.
- Review the first impression, adversarial analysis, interview questions, and then ask for career advice.
Example Usage:
# Example usage in your prompt
Analyze this resume:
[Insert resume or resume text here]
The Resume Analyzer Pipe offers a comprehensive analysis of resumes, providing valuable insights and actionable feedback to help candidates improve their job prospects.
(Make sure to turn on the tool in chat before requesting it)
# Example usage in your prompt
Search for recent papers about "tree of thought"
# Example usage in your prompt
Create an image of "beutiful horse running free"
# Specify format
Create a landscape image of "a futuristic cityscape"
Use the custom Model template in the model selector. The filter will automatically process each user message before it's sent to the main LLM. Configure the valves to customize the enhancement process.
Both tools include comprehensive error handling for:
- Network issues
- API timeouts
- Invalid parameters
- Authentication errors (HF Image Generator)
Feel free to contribute to this project by:
- Forking the repository
- Creating your feature branch
- Committing your changes
- Opening a pull request
MIT License
-
Developed by Haervwe
-
Credit to the amazing teams behind:
And all model trainers out there providing these amazing tools.
For issues, questions, or suggestions, please open an issue on the GitHub repository.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for open-webui-tools
Similar Open Source Tools

open-webui-tools
Open WebUI Tools Collection is a set of tools for structured planning, arXiv paper search, Hugging Face text-to-image generation, prompt enhancement, and multi-model conversations. It enhances LLM interactions with academic research, image generation, and conversation management. Tools include arXiv Search Tool and Hugging Face Image Generator. Function Pipes like Planner Agent offer autonomous plan generation and execution. Filters like Prompt Enhancer improve prompt quality. Installation and configuration instructions are provided for each tool and pipe.

Controllable-RAG-Agent
This repository contains a sophisticated deterministic graph-based solution for answering complex questions using a controllable autonomous agent. The solution is designed to ensure that answers are solely based on the provided data, avoiding hallucinations. It involves various steps such as PDF loading, text preprocessing, summarization, database creation, encoding, and utilizing large language models. The algorithm follows a detailed workflow involving planning, retrieval, answering, replanning, content distillation, and performance evaluation. Heuristics and techniques implemented focus on content encoding, anonymizing questions, task breakdown, content distillation, chain of thought answering, verification, and model performance evaluation.

ROSGPT_Vision
ROSGPT_Vision is a new robotic framework designed to command robots using only two prompts: a Visual Prompt for visual semantic features and an LLM Prompt to regulate robotic reactions. It is based on the Prompting Robotic Modalities (PRM) design pattern and is used to develop CarMate, a robotic application for monitoring driver distractions and providing real-time vocal notifications. The framework leverages state-of-the-art language models to facilitate advanced reasoning about image data and offers a unified platform for robots to perceive, interpret, and interact with visual data through natural language. LangChain is used for easy customization of prompts, and the implementation includes the CarMate application for driver monitoring and assistance.

persian-license-plate-recognition
The Persian License Plate Recognition (PLPR) system is a state-of-the-art solution designed for detecting and recognizing Persian license plates in images and video streams. Leveraging advanced deep learning models and a user-friendly interface, it ensures reliable performance across different scenarios. The system offers advanced detection using YOLOv5 models, precise recognition of Persian characters, real-time processing capabilities, and a user-friendly GUI. It is well-suited for applications in traffic monitoring, automated vehicle identification, and similar fields. The system's architecture includes modules for resident management, entrance management, and a detailed flowchart explaining the process from system initialization to displaying results in the GUI. Hardware requirements include an Intel Core i5 processor, 8 GB RAM, a dedicated GPU with at least 4 GB VRAM, and an SSD with 20 GB of free space. The system can be installed by cloning the repository and installing required Python packages. Users can customize the video source for processing and run the application to upload and process images or video streams. The system's GUI allows for parameter adjustments to optimize performance, and the Wiki provides in-depth information on the system's architecture and model training.

Ollama-Colab-Integration
Ollama Colab Integration V4 is a tool designed to enhance the interaction and management of large language models. It allows users to quantize models within their notebook environment, access a variety of models through a user-friendly interface, and manage public endpoints efficiently. The tool also provides features like LiteLLM proxy control, model insights, and customizable model file templating. Users can troubleshoot model loading issues, CPU fallback strategies, and manage VRAM and RAM effectively. Additionally, the tool offers functionalities for downloading model files from Hugging Face, model conversion with high precision, model quantization using Q and Kquants, and securely uploading converted models to Hugging Face.

verl
veRL is a flexible and efficient reinforcement learning training framework designed for large language models (LLMs). It allows easy extension of diverse RL algorithms, seamless integration with existing LLM infrastructures, and flexible device mapping. The framework achieves state-of-the-art throughput and efficient actor model resharding with 3D-HybridEngine. It supports popular HuggingFace models and is suitable for users working with PyTorch FSDP, Megatron-LM, and vLLM backends.

AiTextDetectionBypass
ParaGenie is a script designed to automate the process of paraphrasing articles using the undetectable.ai platform. It allows users to convert lengthy content into unique paraphrased versions by splitting the input text into manageable chunks and processing each chunk individually. The script offers features such as automated paraphrasing, multi-file support for TXT, DOCX, and PDF formats, customizable chunk splitting methods, Gmail-based registration for seamless paraphrasing, purpose-specific writing support, readability level customization, anonymity features for user privacy, error handling and recovery, and output management for easy access and organization of paraphrased content.

postgresml
PostgresML is a powerful Postgres extension that seamlessly combines data storage and machine learning inference within your database. It enables running machine learning and AI operations directly within PostgreSQL, leveraging GPU acceleration for faster computations, integrating state-of-the-art large language models, providing built-in functions for text processing, enabling efficient similarity search, offering diverse ML algorithms, ensuring high performance, scalability, and security, supporting a wide range of NLP tasks, and seamlessly integrating with existing PostgreSQL tools and client libraries.

RetouchGPT
RetouchGPT is a novel framework designed for interactive face retouching using Large Language Models (LLMs). It leverages instruction-driven imperfection prediction and LLM-based embedding to guide the retouching process. The tool allows users to interactively modify imperfection features in face images, achieving high-fidelity retouching results. RetouchGPT outperforms existing methods by integrating textual and visual features to accurately identify imperfections and replace them with normal skin features.

MMStar
MMStar is an elite vision-indispensable multi-modal benchmark comprising 1,500 challenge samples meticulously selected by humans. It addresses two key issues in current LLM evaluation: the unnecessary use of visual content in many samples and the existence of unintentional data leakage in LLM and LVLM training. MMStar evaluates 6 core capabilities across 18 detailed axes, ensuring a balanced distribution of samples across all dimensions.

XLearning
XLearning is a scheduling platform for big data and artificial intelligence, supporting various machine learning and deep learning frameworks. It runs on Hadoop Yarn and integrates frameworks like TensorFlow, MXNet, Caffe, Theano, PyTorch, Keras, XGBoost. XLearning offers scalability, compatibility, multiple deep learning framework support, unified data management based on HDFS, visualization display, and compatibility with code at native frameworks. It provides functions for data input/output strategies, container management, TensorBoard service, and resource usage metrics display. XLearning requires JDK >= 1.7 and Maven >= 3.3 for compilation, and deployment on CentOS 7.2 with Java >= 1.7 and Hadoop 2.6, 2.7, 2.8.

CogVideo
CogVideo is an open-source repository that provides pretrained text-to-video models for generating videos based on input text. It includes models like CogVideoX-2B and CogVideo, offering powerful video generation capabilities. The repository offers tools for inference, fine-tuning, and model conversion, along with demos showcasing the model's capabilities through CLI, web UI, and online experiences. CogVideo aims to facilitate the creation of high-quality videos from textual descriptions, catering to a wide range of applications.

agent-zero
Agent Zero is a personal and organic AI framework designed to be dynamic, organically growing, and learning as you use it. It is fully transparent, readable, comprehensible, customizable, and interactive. The framework uses the computer as a tool to accomplish tasks, with no single-purpose tools pre-programmed. It emphasizes multi-agent cooperation, complete customization, and extensibility. Communication is key in this framework, allowing users to give proper system prompts and instructions to achieve desired outcomes. Agent Zero is capable of dangerous actions and should be run in an isolated environment. The framework is prompt-based, highly customizable, and requires a specific environment to run effectively.

TurtleBenchmark
Turtle Benchmark is a novel and cheat-proof benchmark test used to evaluate large language models (LLMs). It is based on the Turtle Soup game, focusing on logical reasoning and context understanding abilities. The benchmark does not require background knowledge or model memory, providing all necessary information for judgment from stories under 200 words. The results are objective and unbiased, quantifiable as correct/incorrect/unknown, and impossible to cheat due to using real user-generated questions and dynamic data generation during online gameplay.

inngest
Inngest is a platform that offers durable functions to replace queues, state management, and scheduling for developers. It allows writing reliable step functions faster without dealing with infrastructure. Developers can create durable functions using various language SDKs, run a local development server, deploy functions to their infrastructure, sync functions with the Inngest Platform, and securely trigger functions via HTTPS. Inngest Functions support retrying, scheduling, and coordinating operations through triggers, flow control, and steps, enabling developers to build reliable workflows with robust support for various operations.
For similar tasks

AI-Powered-Resume-Analyzer-and-LinkedIn-Scraper-with-Selenium
Resume Analyzer AI is an advanced Streamlit application that specializes in thorough resume analysis. It excels at summarizing resumes, evaluating strengths, identifying weaknesses, and offering personalized improvement suggestions. It also recommends job titles and uses Selenium to extract vital LinkedIn data. The tool simplifies the job-seeking journey by providing comprehensive insights to elevate career opportunities.

AI-Resume-Analyzer-and-LinkedIn-Scraper-using-LLM
Developed an advanced AI application that utilizes LLM and OpenAI for comprehensive resume analysis. It excels at summarizing the resume, evaluating strengths, identifying weaknesses, and offering personalized improvement suggestions, while also recommending the perfect job titles. Additionally, it seamlessly employs Selenium to extract vital LinkedIn data, encompassing company names, job titles, locations, job URLs, and detailed job descriptions. This application simplifies the job-seeking journey by equipping users with comprehensive insights to elevate their career opportunities.

AI-Resume-Analyzer-and-LinkedIn-Scraper-using-Generative-AI
Developed an advanced AI application that utilizes LLM and OpenAI for comprehensive resume analysis. It excels at summarizing the resume, evaluating strengths, identifying weaknesses, and offering personalized improvement suggestions, while also recommending the perfect job titles. Additionally, it seamlessly employs Selenium to extract vital LinkedIn data, encompassing company names, job titles, locations, job URLs, and detailed job descriptions. This application simplifies the job-seeking journey by equipping users with comprehensive insights to elevate their career opportunities.

open-webui-tools
Open WebUI Tools Collection is a set of tools for structured planning, arXiv paper search, Hugging Face text-to-image generation, prompt enhancement, and multi-model conversations. It enhances LLM interactions with academic research, image generation, and conversation management. Tools include arXiv Search Tool and Hugging Face Image Generator. Function Pipes like Planner Agent offer autonomous plan generation and execution. Filters like Prompt Enhancer improve prompt quality. Installation and configuration instructions are provided for each tool and pipe.

lollms-webui
LoLLMs WebUI (Lord of Large Language Multimodal Systems: One tool to rule them all) is a user-friendly interface to access and utilize various LLM (Large Language Models) and other AI models for a wide range of tasks. With over 500 AI expert conditionings across diverse domains and more than 2500 fine tuned models over multiple domains, LoLLMs WebUI provides an immediate resource for any problem, from car repair to coding assistance, legal matters, medical diagnosis, entertainment, and more. The easy-to-use UI with light and dark mode options, integration with GitHub repository, support for different personalities, and features like thumb up/down rating, copy, edit, and remove messages, local database storage, search, export, and delete multiple discussions, make LoLLMs WebUI a powerful and versatile tool.

daily-poetry-image
Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.

InvokeAI
InvokeAI is a leading creative engine built to empower professionals and enthusiasts alike. Generate and create stunning visual media using the latest AI-driven technologies. InvokeAI offers an industry leading Web Interface, interactive Command Line Interface, and also serves as the foundation for multiple commercial products.

LocalAI
LocalAI is a free and open-source OpenAI alternative that acts as a drop-in replacement REST API compatible with OpenAI (Elevenlabs, Anthropic, etc.) API specifications for local AI inferencing. It allows users to run LLMs, generate images, audio, and more locally or on-premises with consumer-grade hardware, supporting multiple model families and not requiring a GPU. LocalAI offers features such as text generation with GPTs, text-to-audio, audio-to-text transcription, image generation with stable diffusion, OpenAI functions, embeddings generation for vector databases, constrained grammars, downloading models directly from Huggingface, and a Vision API. It provides a detailed step-by-step introduction in its Getting Started guide and supports community integrations such as custom containers, WebUIs, model galleries, and various bots for Discord, Slack, and Telegram. LocalAI also offers resources like an LLM fine-tuning guide, instructions for local building and Kubernetes installation, projects integrating LocalAI, and a how-tos section curated by the community. It encourages users to cite the repository when utilizing it in downstream projects and acknowledges the contributions of various software from the community.
For similar jobs

Perplexica
Perplexica is an open-source AI-powered search engine that utilizes advanced machine learning algorithms to provide clear answers with sources cited. It offers various modes like Copilot Mode, Normal Mode, and Focus Modes for specific types of questions. Perplexica ensures up-to-date information by using SearxNG metasearch engine. It also features image and video search capabilities and upcoming features include finalizing Copilot Mode and adding Discover and History Saving features.

KULLM
KULLM (구름) is a Korean Large Language Model developed by Korea University NLP & AI Lab and HIAI Research Institute. It is based on the upstage/SOLAR-10.7B-v1.0 model and has been fine-tuned for instruction. The model has been trained on 8×A100 GPUs and is capable of generating responses in Korean language. KULLM exhibits hallucination and repetition phenomena due to its decoding strategy. Users should be cautious as the model may produce inaccurate or harmful results. Performance may vary in benchmarks without a fixed system prompt.

MMMU
MMMU is a benchmark designed to evaluate multimodal models on college-level subject knowledge tasks, covering 30 subjects and 183 subfields with 11.5K questions. It focuses on advanced perception and reasoning with domain-specific knowledge, challenging models to perform tasks akin to those faced by experts. The evaluation of various models highlights substantial challenges, with room for improvement to stimulate the community towards expert artificial general intelligence (AGI).

1filellm
1filellm is a command-line data aggregation tool designed for LLM ingestion. It aggregates and preprocesses data from various sources into a single text file, facilitating the creation of information-dense prompts for large language models. The tool supports automatic source type detection, handling of multiple file formats, web crawling functionality, integration with Sci-Hub for research paper downloads, text preprocessing, and token count reporting. Users can input local files, directories, GitHub repositories, pull requests, issues, ArXiv papers, YouTube transcripts, web pages, Sci-Hub papers via DOI or PMID. The tool provides uncompressed and compressed text outputs, with the uncompressed text automatically copied to the clipboard for easy pasting into LLMs.

gpt-researcher
GPT Researcher is an autonomous agent designed for comprehensive online research on a variety of tasks. It can produce detailed, factual, and unbiased research reports with customization options. The tool addresses issues of speed, determinism, and reliability by leveraging parallelized agent work. The main idea involves running 'planner' and 'execution' agents to generate research questions, seek related information, and create research reports. GPT Researcher optimizes costs and completes tasks in around 3 minutes. Features include generating long research reports, aggregating web sources, an easy-to-use web interface, scraping web sources, and exporting reports to various formats.

ChatTTS
ChatTTS is a generative speech model optimized for dialogue scenarios, providing natural and expressive speech synthesis with fine-grained control over prosodic features. It supports multiple speakers and surpasses most open-source TTS models in terms of prosody. The model is trained with 100,000+ hours of Chinese and English audio data, and the open-source version on HuggingFace is a 40,000-hour pre-trained model without SFT. The roadmap includes open-sourcing additional features like VQ encoder, multi-emotion control, and streaming audio generation. The tool is intended for academic and research use only, with precautions taken to limit potential misuse.

HebTTS
HebTTS is a language modeling approach to diacritic-free Hebrew text-to-speech (TTS) system. It addresses the challenge of accurately mapping text to speech in Hebrew by proposing a language model that operates on discrete speech representations and is conditioned on a word-piece tokenizer. The system is optimized using weakly supervised recordings and outperforms diacritic-based Hebrew TTS systems in terms of content preservation and naturalness of generated speech.

do-research-in-AI
This repository is a collection of research lectures and experience sharing posts from frontline researchers in the field of AI. It aims to help individuals upgrade their research skills and knowledge through insightful talks and experiences shared by experts. The content covers various topics such as evaluating research papers, choosing research directions, research methodologies, and tips for writing high-quality scientific papers. The repository also includes discussions on academic career paths, research ethics, and the emotional aspects of research work. Overall, it serves as a valuable resource for individuals interested in advancing their research capabilities in the field of AI.