open-webui-tools

a Repository of Open-WebUI tools to use with your favourite LLMs

Stars: 131

Visit

Open WebUI Tools Collection is a set of tools for structured planning, arXiv paper search, Hugging Face text-to-image generation, prompt enhancement, and multi-model conversations. It enhances LLM interactions with academic research, image generation, and conversation management. Tools include arXiv Search Tool and Hugging Face Image Generator. Function Pipes like Planner Agent offer autonomous plan generation and execution. Filters like Prompt Enhancer improve prompt quality. Installation and configuration instructions are provided for each tool and pipe.

README:

Open WebUI Tools Collection

A collection of tools for Open WebUI that provides structured planning and execution capability, arXiv paper search capabilities, Hugging Face text-to-image generation functionality, prompt enhancement, and multi-model conversations. Perfect for enhancing your LLM interactions with academic research, image generation, and advanced conversation management!

Tools Included

1. arXiv Search Tool

Search arXiv.org for relevant academic papers on any topic. No API key required!

Features:

Search across paper titles, abstracts, and full text
Returns detailed paper information including:
- Title
- Authors
- Publication date
- URL
- Abstract
Automatically sorts by most recent submissions
Returns up to 5 most relevant papers

2. Hugging Face Image Generator

Generate high-quality images from text descriptions using Hugging Face's Stable Diffusion models.

Features:

Multiple image format options:
- Default/Square (1024x1024)
- Landscape (1024x768)
- Landscape Large (1440x1024)
- Portrait (768x1024)
- Portrait Large (1024x1440)
Customizable model endpoint
High-resolution output

Function Pipes Included

1. Planner Agent

This powerful agent allows you to define a goal, and it will autonomously generate and execute a plan to achieve it. The Planner is a generalist agent, capable of handling any text-based task, making it ideal for complex, multi-step requests that would typically require multiple prompts and manual intervention.

It features advanced capabilities like:

Automatic Plan Generation: Breaks down your goal into a sequence of actionable steps with defined dependencies.
Adaptive Execution: Executes each step, dynamically adjusting to the results of previous actions.
LLM-Powered Consolidation: Intelligently merges the outputs of different steps into a coherent final result.
Reflection and Refinement: Analyzes the output of each step, identifies potential issues, and iteratively refines the output through multiple attempts.
Robust Error Handling: Includes retries and fallback mechanisms to ensure successful execution even with occasional API errors.
Detailed Execution Summary: Provides a comprehensive report of the plan execution, including timings and potential issues.

Features:

General Purpose: Can handle a wide range of text-based tasks, from creative writing and code generation to research summarization and problem-solving.
Multi-Step Task Management: Excels at managing complex tasks that require multiple steps and dependencies.
Context Awareness: Maintains context throughout the execution process, ensuring that each step builds upon the previous ones.
Output Optimization: Employs a reflection mechanism to analyze and improve the output of each step through multiple iterations.

2. arXiv Research MCTS Pipe

Search arXiv.org for relevant academic papers on any topic. No API key required!

Features:

Comprehensive Search: Searches across paper titles, abstracts, and full text content from both arXiv and the web using Tavily.
MCTS-Driven Refinement: Employs a Monte Carlo Tree Search (MCTS) approach to iteratively refine a research summary on a given topic.
Adaptive Temperature Control: Offers both static and dynamic temperature decay settings. Static decay progressively reduces the LLM's temperature with each level of the search tree. Dynamic decay adjusts the temperature based on both depth and parent node scores, allowing the LLM to explore more diverse options when previous results are less promising. This fine-grained control balances exploration and exploitation for optimal refinement.
Visual Tree Representation: Provides a visual representation of the search tree, offering intuitive feedback on the exploration process and the relationships between different research directions.
Transparent Intermediate Steps: Shows intermediate steps of the search, allowing users to track the evolution of the research summary and understand the reasoning behind the refinements.
Configurable Search Scope: Allows users to configure the breadth and depth of the search (tree width and depth) to control the exploration scope and computational resources used.

3. Multi Model Conversations Pipe

This pipe allows you to simulate conversations between multiple language models, each acting as a distinct character. You can configure up to 5 participants, each with their own model, alias, and character description (system message). This enables complex and dynamic interactions, perfect for storytelling, roleplaying, or exploring different perspectives on a topic.

Features:

Multiple Participants: Simulate conversations with up to 5 different language models.
Character Definition: Craft unique personas for each participant using system messages.
Round-Robin Turns: Control the flow of conversation with configurable rounds per user message.
Group-Chat-Manager: Use an LLM model to select the next participant in the conversation. (toggleable in valves)
Streaming Support: See the conversation unfold in real-time with streaming output.

4. Resume Analyzer Pipe

Analyze resumes and provide tags, first impressions, adversarial analysis, potential interview questions, and career advice.

Features:

Resume Analysis: Breaks down a resume into relevant categories, highlighting strengths and weaknesses.
Tags Generation: Identifies key skills and experience from the resume and assigns relevant tags.
First Impression: Provides an initial assessment of the resume's effectiveness in showcasing the candidate's qualifications for a target role.
Adversarial Analysis: Compares the analyzed resume to similar ones, offering actionable feedback on areas for improvement.
Interview Questions: Suggests insightful questions tailored to the candidate's experience and the target role.
Career Advisor Response: Offers personalized career advice based on the resume analysis and conversation history.

Filters Included

1. Prompt Enhancer Filter

This filter uses an LLM to automatically improve the quality of your prompts before they are sent to the main language model. It analyzes your prompt and the conversation history to create a more detailed, specific, and effective prompt, leading to better responses.

Features:

Context-Aware Enhancement: Considers the entire conversation history when refining the prompt.
Customizable Template: Control the behavior of the prompt enhancer with a customizable template.
Improved Response Quality: Get more relevant and insightful responses from the main LLM.

Installation

1. Installing from Haervwe's Open WebUI Hub (Recommended):

Visit https://openwebui.com/u/haervwe to access the collection of tools.
For Tools (arXiv Search Tool, Hugging Face Image Generator):
- Locate the desired tool on the hub page.
- Click the "Get" button next to the tool. This will redirect you to your Open WebUI instance and automatically populate the installation code.
- (Optional) Review the code, provide a name and description (if needed),
- Save the tool.
For Function Pipes (Planner Agent, arXiv Research MCTS Pipe, Multi Model Conversations) and Filters (Prompt Enhancer):
- Locate the desired function pipe or filter on the hub page.
- Click the "Get" button. This will, again, redirect you to your Open WebUI instance with the installation code.
- (Optional) Review the code, provide a different name and description,
- Save the function.

2. Manual Installation from the Open WebUI Interface:

For Tools (arXiv Search Tool, Hugging Face Image Generator):
- In your Open WebUI instance, navigate to the "Workspace" tab, then the "Tools" section.
- Click the "+" button.
- Copy the entire code of the respective .py file from this repository.
- Paste the code into the text area in the Open WebUI interface.
- Provide a name and description , and save the tool.
For Function Pipes (Planner Agent, arXiv Research MCTS Pipe, Multi Model Conversations) and Filters (Prompt Enhancer):
- Navigate to the "Workspace" tab, then the "Functions" section.
- Click the "+" button.
- Copy and paste the code from the corresponding .py file.
- Provide a name and description, and save.

Important Note for the Prompt Enhancer Filter: * To use the Prompt Enhancer, you must create a new model configuration in Open WebUI. * Go to "Workspace" -> "Models" -> "+". * Select a base model. * In the "Filters" section of the model configuration, enable the "Prompt Enhancer" filter.

Configuration

Planner Agent

Model: the model id from your llm provider conected to Open-WebUI
Action-Model: the model to be used in the task execution , leave as default to use the same in all the process.
Concurrency: ("Concurrency support is currently experimental. Due to resource limitations, comprehensive testing of concurrent LLM operations has not been possible. Users may experience unexpected behavior when running multiple LLM processes simultaneously. Further testing and optimization are planned.")
Max retries: Number of times the refelction step and subsequent refinement can happen per step.

arXiv Search Tool

No configuration required! The tool works out of the box.

arXiv Research MCTS Pipeline

Model: The model ID from your LLM provider connected to Open WebUI.
Tavily API Key: Required. Obtain your API key from tavily.com. This is used for web searches.
Max Web Search Results: The number of web search results to fetch per query.
Max arXiv Results: The number of results to fetch from the arXiv API per query.
Tree Breadth: The number of child nodes explored during each iteration of the MCTS algorithm. This controls the width of the search tree.
Tree Depth: The number of iterations of the MCTS algorithm. This controls the depth of the search tree.
Exploration Weight: A constant (recommended range 0-2) controlling the balance between exploration and exploitation. Higher values encourage exploration of new branches, while lower values favor exploitation of promising paths.
Temperature Decay: Exponentially decreases the LLM's temperature parameter with increasing tree depth. This focuses the LLM's output from creative exploration to refinement as the search progresses.
Dynamic Temperature Adjustment: Provides finer-grained control over temperature decay based on parent node scores. If a parent node has a low score, the temperature is increased for its children, encouraging more diverse outputs and potentially uncovering better paths.
Maximum Temperature: The initial temperature of the LLM (0-2, default 1.4). Higher temperatures encourage more diverse and creative outputs at the beginning of the search.
Minimum Temperature: The final temperature of the LLM at maximum tree depth (0-2, default 0.5). Lower temperatures promote focused refinement of promising branches.

Multi Model Conversations Pipe

Number of Participants: Set the number of participants (1-5).
Rounds per User Message: Configure how many rounds of replies occur before the user can send another message.
Participant [1-5] Model: Select the model for each participant.
Participant [1-5] Alias: Set a display name for each participant.
Participant [1-5] System Message: Define the persona and instructions for each participant.
All Participants Appended Message: A global instruction appended to each participant's prompt.
Temperature, Top_k, Top_p: Standard model parameters.
**(note, the valves for the characters that wont be used must be setted to default or have correct paramenters)

Resume Analyzer Pipe

Model: The model ID from your LLM provider connected to Open WebUI.
Dataset Path: Local path to the resume dataset CSV file. Includes "Category" and "Resume" columns.
RapidAPI Key (optional): Required for job search functionality. Obtain an API key from RapidAPI Jobs API.
Web Search: Enable/disable web search for relevant job postings.
Prompt templates: Customizable templates for all the steps

Hugging Face Image Generator

Required configuration in Open WebUI:

API Key (Required): Obtain a Hugging Face API key from your HuggingFace account and set it in the tool's configuration in Open WebUI
API URL (Optional): Uses Stability AI's SD 3.5 Turbo model as Default,Can be customized to use other HF text-to-image model endpoints such as flux

Prompt Enhancer Filter

User Customizable Template: Allows you to tailor the instructions given to the prompt-enhancing LLM.
Show Status: Displays status updates during the enhancement process.
Show Enhanced Prompt: Outputs the enhanced prompt to the chat window for visibility.
Model ID: Select the specific model to use for prompt enhancement.

Usage for Pipes

Planner Agent

Select the pipe with the corresponding model, it show as this:

# Example usage in your prompt
"Create a fully-featured Single Page Application (SPA) for the conways game of life, including a responsive UI. No frameworks No preprocessor, No minifing, No back end, ONLY Clean and CORRECT HTML JS AND CSS PLAIN""

arXiv Research MCTS Pipe

Select the pipe with the corresponding model, it show as this:

# Example usage in your prompt
"Do a research summary on "DPO laser LLM training"

Multi Model Conversations Pipe

1.Select the pipe in the Open WebUI interface.
2.Configure the valves (settings) for the desired conversation setup in the admin panel. 3 Start the conversation by sending a user message to the conversation pipe.

Resume Analyzer Pipe

Usage:

Select the Resume Analyzer Pipe in the Open WebUI interface.
Configure the valves with the desired model, dataset path (optional), and other settings.
Send a resume text as an attachment (make sure to user whle document setting) and a message to start the analysis process.
Review the first impression, adversarial analysis, interview questions, and then ask for career advice.

Example Usage:

# Example usage in your prompt
Analyze this resume:
[Insert resume or resume text here]

The Resume Analyzer Pipe offers a comprehensive analysis of resumes, providing valuable insights and actionable feedback to help candidates improve their job prospects.

Usage for tools

(Make sure to turn on the tool in chat before requesting it)

arXiv Search

# Example usage in your prompt
Search for recent papers about "tree of thought"

Image Generation

# Example usage in your prompt
Create an image of "beutiful horse running free"

# Specify format
Create a landscape image of "a futuristic cityscape"

Usage for Filters

Prompt Enhancer Filter

Use the custom Model template in the model selector. The filter will automatically process each user message before it's sent to the main LLM. Configure the valves to customize the enhancement process.

Error Handling

Both tools include comprehensive error handling for:

Network issues
API timeouts
Invalid parameters
Authentication errors (HF Image Generator)

Contributing

Feel free to contribute to this project by:

Forking the repository
Creating your feature branch
Committing your changes
Opening a pull request

License

MIT License

Credits

Developed by Haervwe
Credit to the amazing teams behind:
- https://github.com/ollama/ollama
- https://github.com/open-webui/open-webui

And all model trainers out there providing these amazing tools.

Support

For issues, questions, or suggestions, please open an issue on the GitHub repository.

For Tasks:

Click tags to check more tools for each tasks

search papers generate images plan tasks simulate conversations analyze resumes

For Jobs:

academic researcher data scientist software engineer content creator career advisor

Alternative AI tools for open-webui-tools

Similar Open Source Tools

open-webui-tools

github

: 131

comfyui_LLM_Polymath

github

: 54

ROSGPT_Vision

ROSGPT_Vision is a new robotic framework designed to command robots using only two prompts: a Visual Prompt for visual semantic features and an LLM Prompt to regulate robotic reactions. It is based on the Prompting Robotic Modalities (PRM) design pattern and is used to develop CarMate, a robotic application for monitoring driver distractions and providing real-time vocal notifications. The framework leverages state-of-the-art language models to facilitate advanced reasoning about image data and offers a unified platform for robots to perceive, interpret, and interact with visual data through natural language. LangChain is used for easy customization of prompts, and the implementation includes the CarMate application for driver monitoring and assistance.

github

: 74

eole

EOLE is an open language modeling toolkit based on PyTorch. It aims to provide a research-friendly approach with a comprehensive yet compact and modular codebase for experimenting with various types of language models. The toolkit includes features such as versatile training and inference, dynamic data transforms, comprehensive large language model support, advanced quantization, efficient finetuning, flexible inference, and tensor parallelism. EOLE is a work in progress with ongoing enhancements in configuration management, command line entry points, reproducible recipes, core API simplification, and plans for further simplification, refactoring, inference server development, additional recipes, documentation enhancement, test coverage improvement, logging enhancements, and broader model support.

github

: 106

coding-aider

Coding-Aider is a plugin for IntelliJ IDEA that seamlessly integrates Aider's AI-powered coding assistance into the IDE. It boosts productivity by offering rapid access for precision code generation and refactoring, with complete control over the context utilized by the LLM. The plugin provides various features such as AI-powered coding assistance, intuitive access through keyboard shortcuts, persistent file management, dual execution modes, Git integration, real-time progress tracking, multi-file support, web crawling, clipboard image support, and various specialized actions. It also supports structured mode and plans for managing complex features, working directory support, summarized output, and the ability to specify additional arguments for Aider commands. Coding-Aider addresses limitations in existing IntelliJ plugins by offering optimized token usage, a feature-rich terminal interface, a wide range of commands, and robust recovery mechanisms with seamless Git integration.

github

: 66

Simplifine

Simplifine is an open-source library designed for easy LLM finetuning, enabling users to perform tasks such as supervised fine tuning, question-answer finetuning, contrastive loss for embedding tasks, multi-label classification finetuning, and more. It provides features like WandB logging, in-built evaluation tools, automated finetuning parameters, and state-of-the-art optimization techniques. The library offers bug fixes, new features, and documentation updates in its latest version. Users can install Simplifine via pip or directly from GitHub. The project welcomes contributors and provides comprehensive documentation and support for users.

github

: 65

Ollama-Colab-Integration

Ollama Colab Integration V4 is a tool designed to enhance the interaction and management of large language models. It allows users to quantize models within their notebook environment, access a variety of models through a user-friendly interface, and manage public endpoints efficiently. The tool also provides features like LiteLLM proxy control, model insights, and customizable model file templating. Users can troubleshoot model loading issues, CPU fallback strategies, and manage VRAM and RAM effectively. Additionally, the tool offers functionalities for downloading model files from Hugging Face, model conversion with high precision, model quantization using Q and Kquants, and securely uploading converted models to Hugging Face.

github

: 93

kollektiv

Kollektiv is a Retrieval-Augmented Generation (RAG) system designed to enable users to chat with their favorite documentation easily. It aims to provide LLMs with access to the most up-to-date knowledge, reducing inaccuracies and improving productivity. The system utilizes intelligent web crawling, advanced document processing, vector search, multi-query expansion, smart re-ranking, AI-powered responses, and dynamic system prompts. The technical stack includes Python/FastAPI for backend, Supabase, ChromaDB, and Redis for storage, OpenAI and Anthropic Claude 3.5 Sonnet for AI/ML, and Chainlit for UI. Kollektiv is licensed under a modified version of the Apache License 2.0, allowing free use for non-commercial purposes.

github

: 74

postgresml

PostgresML is a powerful Postgres extension that seamlessly combines data storage and machine learning inference within your database. It enables running machine learning and AI operations directly within PostgreSQL, leveraging GPU acceleration for faster computations, integrating state-of-the-art large language models, providing built-in functions for text processing, enabling efficient similarity search, offering diverse ML algorithms, ensuring high performance, scalability, and security, supporting a wide range of NLP tasks, and seamlessly integrating with existing PostgreSQL tools and client libraries.

github

: 6.1k

MMStar

MMStar is an elite vision-indispensable multi-modal benchmark comprising 1,500 challenge samples meticulously selected by humans. It addresses two key issues in current LLM evaluation: the unnecessary use of visual content in many samples and the existence of unintentional data leakage in LLM and LVLM training. MMStar evaluates 6 core capabilities across 18 detailed axes, ensuring a balanced distribution of samples across all dimensions.

github

: 84

KG-LLM-MDQA

This repository contains code and demo for Knowledge Graph Prompting for Multi-Document Question Answering. It includes modules for data collection, training DPR and MDR models, fine-tuning T5 and LLaMA, and reproducing KGP-LLM algorithm. The workflow involves document collection, knowledge graph construction, fine-tuning models, and reproducing main table results. The repository provides instructions for environment setup, folder architecture, and running different modules.

github

: 290

motia

Motia is an AI agent framework designed for software engineers to create, test, and deploy production-ready AI agents quickly. It provides a code-first approach, allowing developers to write agent logic in familiar languages and visualize execution in real-time. With Motia, developers can focus on business logic rather than infrastructure, offering zero infrastructure headaches, multi-language support, composable steps, built-in observability, instant APIs, and full control over AI logic. Ideal for building sophisticated agents and intelligent automations, Motia's event-driven architecture and modular steps enable the creation of GenAI-powered workflows, decision-making systems, and data processing pipelines.

github

: 1.5k

llm-answer-engine

This repository contains the code and instructions needed to build a sophisticated answer engine that leverages the capabilities of Groq, Mistral AI's Mixtral, Langchain.JS, Brave Search, Serper API, and OpenAI. Designed to efficiently return sources, answers, images, videos, and follow-up questions based on user queries, this project is an ideal starting point for developers interested in natural language processing and search technologies.

github

: 4.5k

inngest

Inngest is a platform that offers durable functions to replace queues, state management, and scheduling for developers. It allows writing reliable step functions faster without dealing with infrastructure. Developers can create durable functions using various language SDKs, run a local development server, deploy functions to their infrastructure, sync functions with the Inngest Platform, and securely trigger functions via HTTPS. Inngest Functions support retrying, scheduling, and coordinating operations through triggers, flow control, and steps, enabling developers to build reliable workflows with robust support for various operations.

github

: 2.7k

Autonomous-Agents

github

: 447

gemini-android

Gemini Android is a repository showcasing Google's Generative AI on Android using Stream Chat SDK for Compose. It demonstrates the Gemini API for Android, implements UI elements with Jetpack Compose, utilizes Android architecture components like Hilt and AppStartup, performs background tasks with Kotlin Coroutines, and integrates chat systems with Stream Chat Compose SDK for real-time event handling. The project also provides technical content, instructions on building the project, tech stack details, architecture overview, modularization strategies, and a contribution guideline. It follows Google's official architecture guidance and offers a real-world example of app architecture implementation.

github

: 303

For similar tasks

AI-Powered-Resume-Analyzer-and-LinkedIn-Scraper-with-Selenium

Resume Analyzer AI is an advanced Streamlit application that specializes in thorough resume analysis. It excels at summarizing resumes, evaluating strengths, identifying weaknesses, and offering personalized improvement suggestions. It also recommends job titles and uses Selenium to extract vital LinkedIn data. The tool simplifies the job-seeking journey by providing comprehensive insights to elevate career opportunities.

github

: 97

AI-Resume-Analyzer-and-LinkedIn-Scraper-using-LLM

Developed an advanced AI application that utilizes LLM and OpenAI for comprehensive resume analysis. It excels at summarizing the resume, evaluating strengths, identifying weaknesses, and offering personalized improvement suggestions, while also recommending the perfect job titles. Additionally, it seamlessly employs Selenium to extract vital LinkedIn data, encompassing company names, job titles, locations, job URLs, and detailed job descriptions. This application simplifies the job-seeking journey by equipping users with comprehensive insights to elevate their career opportunities.

github

: 122

AI-Resume-Analyzer-and-LinkedIn-Scraper-using-Generative-AI

github

: 144

open-webui-tools

github

: 131

lollms-webui

LoLLMs WebUI (Lord of Large Language Multimodal Systems: One tool to rule them all) is a user-friendly interface to access and utilize various LLM (Large Language Models) and other AI models for a wide range of tasks. With over 500 AI expert conditionings across diverse domains and more than 2500 fine tuned models over multiple domains, LoLLMs WebUI provides an immediate resource for any problem, from car repair to coding assistance, legal matters, medical diagnosis, entertainment, and more. The easy-to-use UI with light and dark mode options, integration with GitHub repository, support for different personalities, and features like thumb up/down rating, copy, edit, and remove messages, local database storage, search, export, and delete multiple discussions, make LoLLMs WebUI a powerful and versatile tool.

github

: 4.6k

daily-poetry-image

Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.

github

: 492

InvokeAI

InvokeAI is a leading creative engine built to empower professionals and enthusiasts alike. Generate and create stunning visual media using the latest AI-driven technologies. InvokeAI offers an industry leading Web Interface, interactive Command Line Interface, and also serves as the foundation for multiple commercial products.

github

: 24.8k

LocalAI

LocalAI is a free and open-source OpenAI alternative that acts as a drop-in replacement REST API compatible with OpenAI (Elevenlabs, Anthropic, etc.) API specifications for local AI inferencing. It allows users to run LLMs, generate images, audio, and more locally or on-premises with consumer-grade hardware, supporting multiple model families and not requiring a GPU. LocalAI offers features such as text generation with GPTs, text-to-audio, audio-to-text transcription, image generation with stable diffusion, OpenAI functions, embeddings generation for vector databases, constrained grammars, downloading models directly from Huggingface, and a Vision API. It provides a detailed step-by-step introduction in its Getting Started guide and supports community integrations such as custom containers, WebUIs, model galleries, and various bots for Discord, Slack, and Telegram. LocalAI also offers resources like an LLM fine-tuning guide, instructions for local building and Kubernetes installation, projects integrating LocalAI, and a how-tos section curated by the community. It encourages users to cite the repository when utilizing it in downstream projects and acknowledges the contributions of various software from the community.

github

: 31.5k

For similar jobs

Perplexica

Perplexica is an open-source AI-powered search engine that utilizes advanced machine learning algorithms to provide clear answers with sources cited. It offers various modes like Copilot Mode, Normal Mode, and Focus Modes for specific types of questions. Perplexica ensures up-to-date information by using SearxNG metasearch engine. It also features image and video search capabilities and upcoming features include finalizing Copilot Mode and adding Discover and History Saving features.

github

: 21.0k

KULLM

KULLM (구름) is a Korean Large Language Model developed by Korea University NLP & AI Lab and HIAI Research Institute. It is based on the upstage/SOLAR-10.7B-v1.0 model and has been fine-tuned for instruction. The model has been trained on 8×A100 GPUs and is capable of generating responses in Korean language. KULLM exhibits hallucination and repetition phenomena due to its decoding strategy. Users should be cautious as the model may produce inaccurate or harmful results. Performance may vary in benchmarks without a fixed system prompt.

github

: 527

MMMU

MMMU is a benchmark designed to evaluate multimodal models on college-level subject knowledge tasks, covering 30 subjects and 183 subfields with 11.5K questions. It focuses on advanced perception and reasoning with domain-specific knowledge, challenging models to perform tasks akin to those faced by experts. The evaluation of various models highlights substantial challenges, with room for improvement to stimulate the community towards expert artificial general intelligence (AGI).

github

: 374

1filellm

1filellm is a command-line data aggregation tool designed for LLM ingestion. It aggregates and preprocesses data from various sources into a single text file, facilitating the creation of information-dense prompts for large language models. The tool supports automatic source type detection, handling of multiple file formats, web crawling functionality, integration with Sci-Hub for research paper downloads, text preprocessing, and token count reporting. Users can input local files, directories, GitHub repositories, pull requests, issues, ArXiv papers, YouTube transcripts, web pages, Sci-Hub papers via DOI or PMID. The tool provides uncompressed and compressed text outputs, with the uncompressed text automatically copied to the clipboard for easy pasting into LLMs.

github

: 292

gpt-researcher

GPT Researcher is an autonomous agent designed for comprehensive online research on a variety of tasks. It can produce detailed, factual, and unbiased research reports with customization options. The tool addresses issues of speed, determinism, and reliability by leveraging parallelized agent work. The main idea involves running 'planner' and 'execution' agents to generate research questions, seek related information, and create research reports. GPT Researcher optimizes costs and completes tasks in around 3 minutes. Features include generating long research reports, aggregating web sources, an easy-to-use web interface, scraping web sources, and exporting reports to various formats.

github

: 20.7k

ChatTTS

ChatTTS is a generative speech model optimized for dialogue scenarios, providing natural and expressive speech synthesis with fine-grained control over prosodic features. It supports multiple speakers and surpasses most open-source TTS models in terms of prosody. The model is trained with 100,000+ hours of Chinese and English audio data, and the open-source version on HuggingFace is a 40,000-hour pre-trained model without SFT. The roadmap includes open-sourcing additional features like VQ encoder, multi-emotion control, and streaming audio generation. The tool is intended for academic and research use only, with precautions taken to limit potential misuse.

github

: 33.9k

HebTTS

HebTTS is a language modeling approach to diacritic-free Hebrew text-to-speech (TTS) system. It addresses the challenge of accurately mapping text to speech in Hebrew by proposing a language model that operates on discrete speech representations and is conditioned on a word-piece tokenizer. The system is optimized using weakly supervised recordings and outperforms diacritic-based Hebrew TTS systems in terms of content preservation and naturalness of generated speech.

github

: 52

do-research-in-AI

This repository is a collection of research lectures and experience sharing posts from frontline researchers in the field of AI. It aims to help individuals upgrade their research skills and knowledge through insightful talks and experiences shared by experts. The content covers various topics such as evaluating research papers, choosing research directions, research methodologies, and tips for writing high-quality scientific papers. The repository also includes discussions on academic career paths, research ethics, and the emotional aspects of research work. Overall, it serves as a valuable resource for individuals interested in advancing their research capabilities in the field of AI.

github

: 61