mcp-omnisearch

🔍 A Model Context Protocol (MCP) server providing unified access to multiple search engines (Tavily, Brave, Kagi), AI tools (Perplexity, FastGPT), and content processing services (Jina AI, Kagi). Combines search, AI responses, content processing, and enhancement features through a single interface.

Stars: 195

Visit

mcp-omnisearch is a Model Context Protocol (MCP) server that acts as a unified gateway to multiple search providers and AI tools. It integrates Tavily, Perplexity, Kagi, Jina AI, Brave, Exa AI, and Firecrawl to offer a wide range of search, AI response, content processing, and enhancement features through a single interface. The server provides powerful search capabilities, AI response generation, content extraction, summarization, web scraping, structured data extraction, and more. It is designed to work flexibly with the API keys available, enabling users to activate only the providers they have keys for and easily add more as needed.

README:

mcp-omnisearch

A Model Context Protocol (MCP) server that provides unified access to multiple search providers and AI tools. This server combines the capabilities of Tavily, Perplexity, Kagi, Jina AI, Brave, Exa AI, and Firecrawl to offer comprehensive search, AI responses, content processing, and enhancement features through a single interface.

Features

🔍 Search Tools

Tavily Search: Optimized for factual information with strong citation support. Supports domain filtering through API parameters (include_domains/exclude_domains).
Brave Search: Privacy-focused search with good technical content coverage. Features native support for search operators (site:, -site:, filetype:, intitle:, inurl:, before:, after:, and exact phrases).
Kagi Search: High-quality search results with minimal advertising influence, focused on authoritative sources. Supports search operators in query string (site:, -site:, filetype:, intitle:, inurl:, before:, after:, and exact phrases).
Exa Search: AI-powered web search using neural and keyword search. Optimized for AI applications with semantic understanding, content extraction, and research capabilities.
GitHub Search: Comprehensive code search across public GitHub repositories with three specialized tools:
- Code Search: Find code examples, function definitions, and files using advanced syntax (filename:, path:, repo:, user:, language:, in:file)
- Repository Search: Discover repositories with sorting by stars, forks, or recent updates
- User Search: Find GitHub users and organizations

🎯 Search Operators

MCP Omnisearch provides powerful search capabilities through operators and parameters:

Common Search Features

Domain filtering: Available across all providers
- Tavily: Through API parameters (include_domains/exclude_domains)
- Brave & Kagi: Through site: and -site: operators
File type filtering: Available in Brave and Kagi (filetype:)
Title and URL filtering: Available in Brave and Kagi (intitle:, inurl:)
Date filtering: Available in Brave and Kagi (before:, after:)
Exact phrase matching: Available in Brave and Kagi ("phrase")

Example Usage

// Using Brave or Kagi with query string operators
{
  "query": "filetype:pdf site:microsoft.com typescript guide"
}

// Using Tavily with API parameters
{
  "query": "typescript guide",
  "include_domains": ["microsoft.com"],
  "exclude_domains": ["github.com"]
}

Provider Capabilities

Brave Search: Full native operator support in query string
Kagi Search: Complete operator support in query string
Tavily Search: Domain filtering through API parameters
Exa Search: Domain filtering through API parameters, semantic search with neural understanding
GitHub Search: Advanced code search syntax with qualifiers:
- filename:remote.ts - Search for specific files
- path:src/lib - Search within specific directories
- repo:user/repo - Search within specific repositories
- user:username - Search within a user's repositories
- language:typescript - Filter by programming language
- in:file "export function" - Search for text within files

🤖 AI Response Tools

Perplexity AI: Advanced response generation combining real-time web search with GPT-4 Omni and Claude 3
Kagi FastGPT: Quick AI-generated answers with citations (900ms typical response time)
Exa Answer: Get direct AI-generated answers to questions using Exa Answer API

📄 Content Processing Tools

Jina AI Reader: Clean content extraction with image captioning and PDF support
Kagi Universal Summarizer: Content summarization for pages, videos, and podcasts
Tavily Extract: Extract raw content from single or multiple web pages with configurable extraction depth ('basic' or 'advanced'). Returns both combined content and individual URL content, with metadata including word count and extraction statistics
Firecrawl Scrape: Extract clean, LLM-ready data from single URLs with enhanced formatting options
Firecrawl Crawl: Deep crawling of all accessible subpages on a website with configurable depth limits
Firecrawl Map: Fast URL collection from websites for comprehensive site mapping
Firecrawl Extract: Structured data extraction with AI using natural language prompts
Firecrawl Actions: Support for page interactions (clicking, scrolling, etc.) before extraction for dynamic content
Exa Contents: Extract full content from Exa search result IDs
Exa Similar: Find web pages semantically similar to a given URL using Exa

🔄 Enhancement Tools

Kagi Enrichment API: Supplementary content from specialized indexes (Teclis, TinyGem)
Jina AI Grounding: Real-time fact verification against web knowledge

Flexible API Key Requirements

MCP Omnisearch is designed to work with the API keys you have available. You don't need to have keys for all providers - the server will automatically detect which API keys are available and only enable those providers.

For example:

If you only have a Tavily and Perplexity API key, only those providers will be available
If you don't have a Kagi API key, Kagi-based services won't be available, but all other providers will work normally
The server will log which providers are available based on the API keys you've configured

This flexibility makes it easy to get started with just one or two providers and add more as needed.

Configuration

This server requires configuration through your MCP client. Here are examples for different environments:

Cline Configuration

Add this to your Cline MCP settings:

{
	"mcpServers": {
		"mcp-omnisearch": {
			"command": "node",
			"args": ["/path/to/mcp-omnisearch/dist/index.js"],
			"env": {
				"TAVILY_API_KEY": "your-tavily-key",
				"PERPLEXITY_API_KEY": "your-perplexity-key",
				"KAGI_API_KEY": "your-kagi-key",
				"JINA_AI_API_KEY": "your-jina-key",
				"BRAVE_API_KEY": "your-brave-key",
				"GITHUB_API_KEY": "your-github-key",
				"EXA_API_KEY": "your-exa-key",
				"FIRECRAWL_API_KEY": "your-firecrawl-key",
				"FIRECRAWL_BASE_URL": "http://localhost:3002"
			},
			"disabled": false,
			"autoApprove": []
		}
	}
}

Claude Desktop with WSL Configuration

For WSL environments, add this to your Claude Desktop configuration:

{
	"mcpServers": {
		"mcp-omnisearch": {
			"command": "wsl.exe",
			"args": [
				"bash",
				"-c",
				"TAVILY_API_KEY=key1 PERPLEXITY_API_KEY=key2 KAGI_API_KEY=key3 JINA_AI_API_KEY=key4 BRAVE_API_KEY=key5 GITHUB_API_KEY=key6 EXA_API_KEY=key7 FIRECRAWL_API_KEY=key8 FIRECRAWL_BASE_URL=http://localhost:3002 node /path/to/mcp-omnisearch/dist/index.js"
			]
		}
	}
}

Environment Variables

The server uses API keys for each provider. You don't need keys for all providers - only the providers corresponding to your available API keys will be activated:

TAVILY_API_KEY: For Tavily Search
PERPLEXITY_API_KEY: For Perplexity AI
KAGI_API_KEY: For Kagi services (FastGPT, Summarizer, Enrichment)
JINA_AI_API_KEY: For Jina AI services (Reader, Grounding)
BRAVE_API_KEY: For Brave Search
GITHUB_API_KEY: For GitHub search services (Code, Repository, User search)
EXA_API_KEY: For Exa AI services (Search, Answer, Contents, Similar)
FIRECRAWL_API_KEY: For Firecrawl services (Scrape, Crawl, Map, Extract, Actions)
FIRECRAWL_BASE_URL: For self-hosted Firecrawl instances (optional, defaults to Firecrawl cloud service)

You can start with just one or two API keys and add more later as needed. The server will log which providers are available on startup.

GitHub API Key Setup

To use GitHub search features, you'll need a GitHub personal access token with public repository access only for security:

Go to GitHub Settings: Navigate to GitHub Settings > Developer settings > Personal access tokens
Create a new token: Click "Generate new token" → "Generate new token (classic)"
Configure token settings:
- Name: MCP Omnisearch - Public Search
- Expiration: Choose your preferred expiration (90 days recommended)
- Scopes: Leave all checkboxes UNCHECKED
  
  ⚠️ Important: Do not select any scopes. An empty scope token can only access public repositories and user profiles, which is exactly what we want for search functionality.
Generate and copy: Click "Generate token" and copy the token immediately
Add to environment: Set GITHUB_API_KEY=your_token_here

Security Notes:

This token configuration ensures no access to private repositories
Only public code search, repository discovery, and user profiles are accessible
Rate limits: 5,000 requests/hour for code search, 10 requests/minute for code search specifically
You can revoke the token anytime from GitHub settings if needed

Self-Hosted Firecrawl Configuration

If you're running a self-hosted instance of Firecrawl, you can configure MCP Omnisearch to use it by setting the FIRECRAWL_BASE_URL environment variable. This allows you to maintain complete control over your data processing pipeline.

Self-hosted Firecrawl setup:

Follow the Firecrawl self-hosting guide
Set up your Firecrawl instance (default runs on http://localhost:3002)
Configure MCP Omnisearch with your self-hosted URL:

FIRECRAWL_BASE_URL=http://localhost:3002
# or for a remote self-hosted instance:
FIRECRAWL_BASE_URL=https://your-firecrawl-domain.com

Important notes:

If FIRECRAWL_BASE_URL is not set, MCP Omnisearch will default to the Firecrawl cloud service
Self-hosted instances support the same API endpoints (/v1/scrape, /v1/crawl, etc.)
You'll still need a FIRECRAWL_API_KEY even for self-hosted instances
Self-hosted Firecrawl provides enhanced security and customization options

API

The server implements MCP Tools organized by category:

Search Tools

search_tavily

Search the web using Tavily Search API. Best for factual queries requiring reliable sources and citations.

Parameters:

query (string, required): Search query

Example:

{
	"query": "latest developments in quantum computing"
}

search_brave

Privacy-focused web search with good coverage of technical topics.

Parameters:

query (string, required): Search query

Example:

{
	"query": "rust programming language features"
}

search_kagi

High-quality search results with minimal advertising influence. Best for finding authoritative sources and research materials.

Parameters:

query (string, required): Search query
language (string, optional): Language filter (e.g., "en")
no_cache (boolean, optional): Bypass cache for fresh results

Example:

{
	"query": "latest research in machine learning",
	"language": "en"
}

github_search

Search for code on GitHub using advanced syntax. This tool searches through file contents in public repositories and provides code snippets with metadata.

Parameters:

query (string, required): Search query with GitHub search syntax
limit (number, optional): Maximum number of results (1-50, default: 10)

Example:

{
	"query": "filename:remote.ts @sveltejs/kit",
	"limit": 5
}

Advanced query examples:

"filename:config.json path:src" - Find config.json files in src directories
"function fetchData language:typescript" - Find fetchData functions in TypeScript
"repo:microsoft/vscode extension" - Search within specific repository
"user:torvalds language:c" - Search user's repositories for C code

github_repository_search

Discover GitHub repositories with enhanced metadata including stars, forks, language, and last update information.

Parameters:

query (string, required): Repository search query
limit (number, optional): Maximum number of results (1-50, default: 10)
sort (string, optional): Sort results by 'stars', 'forks', or 'updated'

Example:

{
	"query": "sveltekit remote functions",
	"sort": "stars",
	"limit": 5
}

github_user_search

Find GitHub users and organizations with profile information.

Parameters:

query (string, required): User/organization search query
limit (number, optional): Maximum number of results (1-50, default: 10)

Example:

{
	"query": "Rich-Harris",
	"limit": 3
}

exa_search

AI-powered web search using neural and keyword search. Automatically chooses between traditional keyword search and Exa's embeddings-based model to find the most relevant results for your query.

Parameters:

query (string, required): Search query
limit (number, optional): Maximum number of results (1-100, default: 10)
include_domains (array, optional): Only include results from these domains
exclude_domains (array, optional): Exclude results from these domains

Example:

{
	"query": "latest AI research papers",
	"limit": 15,
	"include_domains": ["arxiv.org", "scholar.google.com"]
}

AI Response Tools

ai_perplexity

AI-powered response generation with real-time web search integration.

Parameters:

query (string, required): Question or topic for AI response

Example:

{
	"query": "Explain the differences between REST and GraphQL"
}

ai_kagi_fastgpt

Quick AI-generated answers with citations.

Parameters:

query (string, required): Question for quick AI response

Example:

{
	"query": "What are the main features of TypeScript?"
}

exa_answer

Get direct AI-generated answers to questions using Exa Answer API.

Parameters:

query (string, required): Question for AI response
include_domains (array, optional): Only include sources from these domains
exclude_domains (array, optional): Exclude sources from these domains

Example:

{
	"query": "How does machine learning work?",
	"include_domains": ["arxiv.org", "nature.com"]
}

Content Processing Tools

process_jina_reader

Convert URLs to clean, LLM-friendly text with image captioning.

Parameters:

url (string, required): URL to process

Example:

{
	"url": "https://example.com/article"
}

process_kagi_summarizer

Summarize content from URLs.

Parameters:

url (string, required): URL to summarize

Example:

{
	"url": "https://example.com/long-article"
}

process_tavily_extract

Extract raw content from web pages with Tavily Extract.

Parameters:

url (string | string[], required): Single URL or array of URLs to extract content from
extract_depth (string, optional): Extraction depth - 'basic' (default) or 'advanced'

Example:

{
	"url": [
		"https://example.com/article1",
		"https://example.com/article2"
	],
	"extract_depth": "advanced"
}

Response includes:

Combined content from all URLs
Individual raw content for each URL
Metadata with word count, successful extractions, and any failed URLs

firecrawl_scrape_process

Extract clean, LLM-ready data from single URLs with enhanced formatting options.

Parameters:

url (string | string[], required): Single URL or array of URLs to extract content from
extract_depth (string, optional): Extraction depth - 'basic' (default) or 'advanced'

Example:

{
	"url": "https://example.com/article",
	"extract_depth": "basic"
}

Response includes:

Clean, markdown-formatted content
Metadata including title, word count, and extraction statistics

firecrawl_crawl_process

Deep crawling of all accessible subpages on a website with configurable depth limits.

Parameters:

url (string | string[], required): Starting URL for crawling
extract_depth (string, optional): Extraction depth - 'basic' (default) or 'advanced' (controls crawl depth and limits)

Example:

{
	"url": "https://example.com",
	"extract_depth": "advanced"
}

Response includes:

Combined content from all crawled pages
Individual content for each page
Metadata including title, word count, and crawl statistics

firecrawl_map_process

Fast URL collection from websites for comprehensive site mapping.

Parameters:

url (string | string[], required): URL to map
extract_depth (string, optional): Extraction depth - 'basic' (default) or 'advanced' (controls map depth)

Example:

{
	"url": "https://example.com",
	"extract_depth": "basic"
}

Response includes:

List of all discovered URLs
Metadata including site title and URL count

firecrawl_extract_process

Structured data extraction with AI using natural language prompts.

Parameters:

url (string | string[], required): URL to extract structured data from
extract_depth (string, optional): Extraction depth - 'basic' (default) or 'advanced'

Example:

{
	"url": "https://example.com",
	"extract_depth": "basic"
}

Response includes:

Structured data extracted from the page
Metadata including title, extraction statistics

firecrawl_actions_process

Support for page interactions (clicking, scrolling, etc.) before extraction for dynamic content.

Parameters:

url (string | string[], required): URL to interact with and extract content from
extract_depth (string, optional): Extraction depth - 'basic' (default) or 'advanced' (controls complexity of interactions)

Example:

{
	"url": "https://news.ycombinator.com",
	"extract_depth": "basic"
}

Response includes:

Content extracted after performing interactions
Description of actions performed
Screenshot of the page (if available)
Metadata including title and extraction statistics

exa_contents_process

Extract full content from Exa search result IDs.

Parameters:

ids (string | string[], required): Exa search result ID(s) to extract content from
extract_depth (string, optional): Extraction depth - 'basic' (default) or 'advanced'

Example:

{
	"ids": ["exa-result-id-123", "exa-result-id-456"],
	"extract_depth": "advanced"
}

Response includes:

Combined content from all result IDs
Individual raw content for each ID
Metadata with word count and extraction statistics

exa_similar_process

Find web pages semantically similar to a given URL using Exa.

Parameters:

url (string, required): URL to find similar pages for
extract_depth (string, optional): Extraction depth - 'basic' (default) or 'advanced'

Example:

{
	"url": "https://arxiv.org/abs/2106.09685",
	"extract_depth": "advanced"
}

Response includes:

Combined content from all similar pages
Similarity scores and metadata
Individual content for each similar page

Enhancement Tools

enhance_kagi_enrichment

Get supplementary content from specialized indexes.

Parameters:

query (string, required): Query for enrichment

Example:

{
	"query": "emerging web technologies"
}

enhance_jina_grounding

Verify statements against web knowledge.

Parameters:

statement (string, required): Statement to verify

Example:

{
	"statement": "TypeScript adds static typing to JavaScript"
}

Docker Deployment

MCP Omnisearch supports containerized deployment using Docker with MCPO (Model Context Protocol Over HTTP) integration, enabling cloud deployment and OpenAPI access.

Quick Start with Docker

Using Docker Compose (Recommended):

# Clone the repository
git clone https://github.com/spences10/mcp-omnisearch.git
cd mcp-omnisearch

# Create .env file with your API keys
echo "TAVILY_API_KEY=your-tavily-key" > .env
echo "KAGI_API_KEY=your-kagi-key" >> .env
echo "PERPLEXITY_API_KEY=your-perplexity-key" >> .env
echo "EXA_API_KEY=your-exa-key" >> .env
# Add other API keys as needed
echo "GITHUB_API_KEY=your-github-key" >> .env

# Start the container
docker-compose up -d

Using Docker directly:

docker build -t mcp-omnisearch .
docker run -d \
  -p 8000:8000 \
  -e TAVILY_API_KEY=your-tavily-key \
  -e KAGI_API_KEY=your-kagi-key \
  -e PERPLEXITY_API_KEY=your-perplexity-key \
  -e EXA_API_KEY=your-exa-key \
  -e GITHUB_API_KEY=your-github-key \
  --name mcp-omnisearch \
  mcp-omnisearch

Container Environment Variables

Configure the container using environment variables for each provider:

TAVILY_API_KEY: For Tavily Search
PERPLEXITY_API_KEY: For Perplexity AI
KAGI_API_KEY: For Kagi services (FastGPT, Summarizer, Enrichment)
JINA_AI_API_KEY: For Jina AI services (Reader, Grounding)
BRAVE_API_KEY: For Brave Search
GITHUB_API_KEY: For GitHub search services
EXA_API_KEY: For Exa AI services
FIRECRAWL_API_KEY: For Firecrawl services
FIRECRAWL_BASE_URL: For self-hosted Firecrawl instances (optional)
PORT: Container port (defaults to 8000)

OpenAPI Access

Once deployed, the MCP server is accessible via OpenAPI at:

Base URL: http://your-container-host:8000
OpenAPI Endpoint: /omnisearch
Compatible with: OpenWebUI and other tools expecting OpenAPI

Cloud Deployment

The containerized version can be deployed to any container platform that supports Docker:

Cloud Run (Google Cloud)
Container Instances (Azure)
ECS/Fargate (AWS)
Railway, Render, Fly.io
Any Kubernetes cluster

Example deployment to a cloud platform:

# Build and tag for your registry
docker build -t your-registry/mcp-omnisearch:latest .
docker push your-registry/mcp-omnisearch:latest

# Deploy with your platform's CLI or web interface
# Configure environment variables through your platform's settings

Development

Setup

Clone the repository
Install dependencies:

pnpm install

Build the project:

pnpm run build

Run in development mode:

pnpm run dev

Publishing

Update version in package.json
Build the project:

pnpm run build

Publish to npm:

pnpm publish

Troubleshooting

API Keys and Access

Each provider requires its own API key and may have different access requirements:

Tavily: Requires an API key from their developer portal
Perplexity: API access through their developer program
Kagi: Some features limited to Business (Team) plan users
Jina AI: API key required for all services
Brave: API key from their developer portal
GitHub: Personal access token with no scopes selected (public access only)
Exa AI: API key from their dashboard at dashboard.exa.ai
Firecrawl: API key required from their developer portal

Rate Limits

Each provider has its own rate limits. The server will handle rate limit errors gracefully and return appropriate error messages.

Contributing

Please read CONTRIBUTING.md before opening a PR. In short:

Start by opening an issue to propose your change and align scope.
Prefer small, focused PRs with a clear explanation (problem → approach → verification).
Follow provider conventions: use src/common/http.ts (http_json) for HTTP, read keys from src/config/env.ts, respect timeouts, and surface errors via ProviderError.

License

MIT License - see the LICENSE file for details.

Acknowledgments

Built on:

For Tasks:

Click tags to check more tools for each tasks

search the web generate ai responses extract content summarize content enhance content

For Jobs:

data scientist ai engineer web developer content curator research analyst

Alternative AI tools for mcp-omnisearch

Similar Open Source Tools

mcp-omnisearch

github

: 195

mcp-documentation-server

The mcp-documentation-server is a lightweight server application designed to serve documentation files for projects. It provides a simple and efficient way to host and access project documentation, making it easy for team members and stakeholders to find and reference important information. The server supports various file formats, such as markdown and HTML, and allows for easy navigation through the documentation. With mcp-documentation-server, teams can streamline their documentation process and ensure that project information is easily accessible to all involved parties.

github

: 205

memento-mcp

Memento MCP is a scalable, high-performance knowledge graph memory system designed for LLMs. It offers semantic retrieval, contextual recall, and temporal awareness to any LLM client supporting the model context protocol. The system is built on core concepts like entities and relations, utilizing Neo4j as its storage backend for unified graph and vector search capabilities. With advanced features such as semantic search, temporal awareness, confidence decay, and rich metadata support, Memento MCP provides a robust solution for managing knowledge graphs efficiently and effectively.

github

: 217

LightRAG

LightRAG is a repository hosting the code for LightRAG, a system that supports seamless integration of custom knowledge graphs, Oracle Database 23ai, Neo4J for storage, and multiple file types. It includes features like entity deletion, batch insert, incremental insert, and graph visualization. LightRAG provides an API server implementation for RESTful API access to RAG operations, allowing users to interact with it through HTTP requests. The repository also includes evaluation scripts, code for reproducing results, and a comprehensive code structure.

github

: 21.3k

wikipedia-mcp

The Wikipedia MCP Server is a Model Context Protocol (MCP) server that provides real-time access to Wikipedia information for Large Language Models (LLMs). It allows AI assistants to retrieve accurate and up-to-date information from Wikipedia to enhance their responses. The server offers features such as searching Wikipedia, retrieving article content, getting article summaries, extracting specific sections, discovering links within articles, finding related topics, supporting multiple languages and country codes, optional caching for improved performance, and compatibility with Google ADK agents and other AI frameworks. Users can install the server using pipx, Smithery, PyPI, virtual environment, or from source. The server can be run with various options for transport protocol, language, country/locale, caching, access token, and more. It also supports Docker and Kubernetes deployment. The server provides MCP tools for interacting with Wikipedia, such as searching articles, getting article content, summaries, sections, links, coordinates, related topics, and extracting key facts. It also supports country/locale codes and language variants for languages like Chinese, Serbian, Kurdish, and Norwegian. The server includes example prompts for querying Wikipedia and provides MCP resources for interacting with Wikipedia through MCP endpoints. The project structure includes main packages, API implementation, core functionality, utility functions, and a comprehensive test suite for reliability and functionality testing.

github

: 99

ck

ck (seek) is a semantic grep tool that finds code by meaning, not just keywords. It replaces traditional grep by understanding the user's search intent. It allows users to search for code based on concepts like 'error handling' and retrieves relevant code even if the exact keywords are not present. ck offers semantic search, drop-in grep compatibility, hybrid search combining keyword precision with semantic understanding, agent-friendly output in JSONL format, smart file filtering, and various advanced features. It supports multiple search modes, relevance scoring, top-K results, and smart exclusions. Users can index projects for semantic search, choose embedding models, and search specific files or directories. The tool is designed to improve code search efficiency and accuracy for developers and AI agents.

github

: 742

zotero-mcp

Zotero MCP seamlessly connects your Zotero research library with AI assistants like ChatGPT and Claude via the Model Context Protocol. It offers AI-powered semantic search, access to library content, PDF annotation extraction, and easy updates. Users can search their library, analyze citations, and get summaries, making it ideal for research tasks. The tool supports multiple embedding models, intelligent search results, and flexible access methods for both local and remote collaboration. With advanced features like semantic search and PDF annotation extraction, Zotero MCP enhances research efficiency and organization.

github

: 513

code_puppy

Code Puppy is an AI-powered code generation agent designed to understand programming tasks, generate high-quality code, and explain its reasoning. It supports multi-language code generation, interactive CLI, and detailed code explanations. The tool requires Python 3.9+ and API keys for various models like GPT, Google's Gemini, Cerebras, and Claude. It also integrates with MCP servers for advanced features like code search and documentation lookups. Users can create custom JSON agents for specialized tasks and access a variety of tools for file management, code execution, and reasoning sharing.

github

: 154

open-edison

OpenEdison is a secure MCP control panel that connects AI to data/software with additional security controls to reduce data exfiltration risks. It helps address the lethal trifecta problem by providing visibility, monitoring potential threats, and alerting on data interactions. The tool offers features like data leak monitoring, controlled execution, easy configuration, visibility into agent interactions, a simple API, and Docker support. It integrates with LangGraph, LangChain, and plain Python agents for observability and policy enforcement. OpenEdison helps gain observability, control, and policy enforcement for AI interactions with systems of records, existing company software, and data to reduce risks of AI-caused data leakage.

github

: 187

oxylabs-mcp

The Oxylabs MCP Server acts as a bridge between AI models and the web, providing clean, structured data from any site. It enables scraping of URLs, rendering JavaScript-heavy pages, content extraction for AI use, bypassing anti-scraping measures, and accessing geo-restricted web data from 195+ countries. The implementation utilizes the Model Context Protocol (MCP) to facilitate secure interactions between AI assistants and web content. Key features include scraping content from any site, automatic data cleaning and conversion, bypassing blocks and geo-restrictions, flexible setup with cross-platform support, and built-in error handling and request management.

github

: 61

supergateway

Supergateway is a tool that allows running MCP stdio-based servers over SSE (Server-Sent Events) with one command. It is useful for remote access, debugging, or connecting to SSE-based clients when your MCP server only speaks stdio. The tool supports running in SSE to Stdio mode as well, where it connects to a remote SSE server and exposes a local stdio interface for downstream clients. Supergateway can be used with ngrok to share local MCP servers with remote clients and can also be run in a Docker containerized deployment. It is designed with modularity in mind, ensuring compatibility and ease of use for AI tools exchanging data.

github

: 599

pocketgroq

PocketGroq is a tool that provides advanced functionalities for text generation, web scraping, web search, and AI response evaluation. It includes features like an Autonomous Agent for answering questions, web crawling and scraping capabilities, enhanced web search functionality, and flexible integration with Ollama server. Users can customize the agent's behavior, evaluate responses using AI, and utilize various methods for text generation, conversation management, and Chain of Thought reasoning. The tool offers comprehensive methods for different tasks, such as initializing RAG, error handling, and tool management. PocketGroq is designed to enhance development processes and enable the creation of AI-powered applications with ease.

github

: 178

quantalogic

QuantaLogic is a ReAct framework for building advanced AI agents that seamlessly integrates large language models with a robust tool system. It aims to bridge the gap between advanced AI models and practical implementation in business processes by enabling agents to understand, reason about, and execute complex tasks through natural language interaction. The framework includes features such as ReAct Framework, Universal LLM Support, Secure Tool System, Real-time Monitoring, Memory Management, and Enterprise Ready components.

github

: 376

aider-desk

AiderDesk is a desktop application that enhances coding workflow by leveraging AI capabilities. It offers an intuitive GUI, project management, IDE integration, MCP support, settings management, cost tracking, structured messages, visual file management, model switching, code diff viewer, one-click reverts, and easy sharing. Users can install it by downloading the latest release and running the executable. AiderDesk also supports Python version detection and auto update disabling. It includes features like multiple project management, context file management, model switching, chat mode selection, question answering, cost tracking, MCP server integration, and MCP support for external tools and context. Development setup involves cloning the repository, installing dependencies, running in development mode, and building executables for different platforms. Contributions from the community are welcome following specific guidelines.

github

: 769

nexus

Nexus is a tool that acts as a unified gateway for multiple LLM providers and MCP servers. It allows users to aggregate, govern, and control their AI stack by connecting multiple servers and providers through a single endpoint. Nexus provides features like MCP Server Aggregation, LLM Provider Routing, Context-Aware Tool Search, Protocol Support, Flexible Configuration, Security features, Rate Limiting, and Docker readiness. It supports tool calling, tool discovery, and error handling for STDIO servers. Nexus also integrates with AI assistants, Cursor, Claude Code, and LangChain for seamless usage.

github

: 339

LLMVoX

LLMVoX is a lightweight 30M-parameter, LLM-agnostic, autoregressive streaming Text-to-Speech (TTS) system designed to convert text outputs from Large Language Models into high-fidelity streaming speech with low latency. It achieves significantly lower Word Error Rate compared to speech-enabled LLMs while operating at comparable latency and speech quality. Key features include being lightweight & fast with only 30M parameters, LLM-agnostic for easy integration with existing models, multi-queue streaming for continuous speech generation, and multilingual support for easy adaptation to new languages.

github

: 167

For similar tasks

mcp-omnisearch

github

: 195

1filellm

1filellm is a command-line data aggregation tool designed for LLM ingestion. It aggregates and preprocesses data from various sources into a single text file, facilitating the creation of information-dense prompts for large language models. The tool supports automatic source type detection, handling of multiple file formats, web crawling functionality, integration with Sci-Hub for research paper downloads, text preprocessing, and token count reporting. Users can input local files, directories, GitHub repositories, pull requests, issues, ArXiv papers, YouTube transcripts, web pages, Sci-Hub papers via DOI or PMID. The tool provides uncompressed and compressed text outputs, with the uncompressed text automatically copied to the clipboard for easy pasting into LLMs.

github

: 292

AudioNotes

AudioNotes is a system built on FunASR and Qwen2 that can quickly extract content from audio and video, and organize it using large models into structured markdown notes for easy reading. Users can interact with the audio and video content, install Ollama, pull models, and deploy services using Docker or locally with a PostgreSQL database. The system provides a seamless way to convert audio and video into structured notes for efficient consumption.

github

: 102

dom-to-semantic-markdown

DOM to Semantic Markdown is a tool that converts HTML DOM to Semantic Markdown for use in Large Language Models (LLMs). It maximizes semantic information, token efficiency, and preserves metadata to enhance LLMs' processing capabilities. The tool captures rich web content structure, including semantic tags, image metadata, table structures, and link destinations. It offers customizable conversion options and supports both browser and Node.js environments.

github

: 708

scrape-it-now

Scrape It Now is a versatile tool for scraping websites with features like decoupled architecture, CLI functionality, idempotent operations, and content storage options. The tool includes a scraper component for efficient scraping, ad blocking, link detection, markdown extraction, dynamic content loading, and anonymity features. It also offers an indexer component for creating AI search indexes, chunking content, embedding chunks, and enabling semantic search. The tool supports various configurations for Azure services and local storage, providing flexibility and scalability for web scraping and indexing tasks.

github

: 452

open-deep-research

Open Deep Research is an open-source tool designed to generate AI-powered reports from web search results efficiently. It combines Bing Search API for search results retrieval, JinaAI for content extraction, and customizable report generation. Users can customize settings, export reports in multiple formats, and benefit from rate limiting for stability. The tool aims to streamline research and report creation in a user-friendly platform.

github

: 231

DevDocs

DevDocs is a platform designed to simplify the process of digesting technical documentation for software engineers and developers. It automates the extraction and conversion of web content into markdown format, making it easier for users to access and understand the information. By crawling through child pages of a given URL, DevDocs provides a streamlined approach to gathering relevant data and integrating it into various tools for software development. The tool aims to save time and effort by eliminating the need for manual research and content extraction, ultimately enhancing productivity and efficiency in the development process.

github

: 469

datalore-localgen-cli

Datalore is a terminal tool for generating structured datasets from local files like PDFs, Word docs, images, and text. It extracts content, uses semantic search to understand context, applies instructions through a generated schema, and outputs clean, structured data. Perfect for converting raw or unstructured local documents into ready-to-use datasets for training, analysis, or experimentation, all without manual formatting.

github

: 73

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 668

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k