oxylabs-mcp

Official Oxylabs MCP integration

Stars: 61

Visit

The Oxylabs MCP Server acts as a bridge between AI models and the web, providing clean, structured data from any site. It enables scraping of URLs, rendering JavaScript-heavy pages, content extraction for AI use, bypassing anti-scraping measures, and accessing geo-restricted web data from 195+ countries. The implementation utilizes the Model Context Protocol (MCP) to facilitate secure interactions between AI assistants and web content. Key features include scraping content from any site, automatic data cleaning and conversion, bypassing blocks and geo-restrictions, flexible setup with cross-platform support, and built-in error handling and request management.

README:

Oxylabs MCP Server

The missing link between AI models and the real‑world web: one API that delivers clean, structured data from any site.

📖 Overview

The Oxylabs MCP server provides a bridge between AI models and the web. It enables them to scrape any URL, render JavaScript-heavy pages, extract and format content for AI use, bypass anti-scraping measures, and access geo-restricted web data from 195+ countries.

This implementation leverages the Model Context Protocol (MCP) to create a secure, standardized way for AI assistants to interact with web content.

Why Oxylabs MCP? 🕸️ ➜ 📦 ➜ 🤖

Imagine telling your LLM "Summarise the latest Hacker News discussion about GPT‑7" – and it simply answers.
MCP (Multi‑Client Proxy) makes that happen by doing the boring parts for you:

What Oxylabs MCP does	Why it matters to you
Bypasses anti‑bot walls with the Oxylabs global proxy network	Keeps you unblocked and anonymous
Renders JavaScript in headless Chrome	Single‑page apps, sorted
Cleans HTML → JSON	Drop straight into vector DBs or prompts
Optional structured parsers (Google, Amazon, etc.)	One‑line access to popular targets

✨ Key Features

Scrape content from any site

Extract data from any URL, including complex single-page applications
Fully render dynamic websites using headless browser support
Choose full JavaScript rendering, HTML-only, or none
Emulate Mobile and Desktop viewports for realistic rendering

Automatically get AI-ready data

Automatically clean and convert HTML to Markdown for improved readability
Use automated parsers for popular targets like Google, Amazon, and etc.

Bypass blocks & geo-restrictions

Bypass sophisticated bot protection systems with high success rate
Reliably scrape even the most complex websites
Get automatically rotating IPs from a proxy pool covering 195+ countries

Flexible setup & cross-platform support

Set rendering and parsing options if needed
Feed data directly into AI models or analytics tools
Works on macOS, Windows, and Linux

Built-in error handling and request management

Comprehensive error handling and reporting
Smart rate limiting and request management

🛠️ MCP Tools

Oxylabs MCP provides two sets of tools that can be used together or independently:

Oxylabs Web Scraper API Tools

universal_scraper: Uses Oxylabs Web Scraper API for general website scraping.
google_search_scraper: Uses Oxylabs Web Scraper API to extract results from Google Search.
amazon_search_scraper: Uses Oxylabs Web Scraper API to scrape Amazon search result pages.
amazon_product_scraper: Uses Oxylabs Web Scraper API to extract data from individual Amazon product pages.

Oxylabs AI Studio Tools

The Oxylabs AI Studio MCP server provides various AI tools for your agents:

ai_scraper: Scrape content from any URL in JSON or Markdown format with AI-powered data extraction.
ai_crawler: Based on a prompt, crawls a website and collects data in Markdown or JSON format across multiple pages.
ai_browser_agent: Given a task, the agent controls a browser to achieve the given objective and returns data in Markdown, JSON, HTML, or screenshot formats.
ai_search: Search the web for URLs and their contents with AI-powered content extraction.

💡 Example Queries

When you've set up the MCP server with Claude, you can make requests like:

Web Scraper API Examples

Could you scrape https://www.google.com/search?q=ai page?
Scrape https://www.amazon.de/-/en/Smartphone-Contract-Function-Manufacturer-Exclusive/dp/B0CNKD651V with parse enabled
Scrape https://www.amazon.de/-/en/gp/bestsellers/beauty/ref=zg_bs_nav_beauty_0 with parse and render enabled
Use web unblocker with render to scrape https://www.bestbuy.com/site/top-deals/all-electronics-on-sale/pcmcat1674241939957.c

AI Studio Examples

Use AI scraper to get top news headlines from https://news-site.com in JSON format.
Use AI crawler with prompt "extract all product information" to crawl https://example-store.com
Use browser agent with task "log in and extract dashboard data" on https://complex-app.com
Use AI search to find 5 "latest AI developments" and return URLs with their content

✅ Prerequisites

Before you begin, make sure you have:

Oxylabs Web Scraper API Account: Obtain your username and password from Oxylabs (1-week free trial available)
Oxylabs AI Studio API Key (Optional): For AI-powered tools, obtain your API key from Oxylabs AI Studio (separate service)

Basic Usage

Via Smithery CLI:

Node.js (v16+)
npx command-line tool

Via uv:

uv package manager – install it using this guide

Local/Dev Setup

Python 3.12+
uv package manager – install it using this guide

🧩 API Parameters

The Oxylabs MCP Universal Scraper accepts these parameters:

Parameter	Description	Values
`url`	The URL to scrape	Any valid URL
`render`	Use headless browser rendering	`html` or `None`
`geo_location`	Sets the proxy's geo location to retrieve data.	`Brasil`, `Canada`, etc.
`user_agent_type`	Device type and browser	`desktop`, `tablet`, etc.
`output_format`	The format of the output	`links`, `md`, `html`

🔧 Configuration

smithery

Go to https://smithery.ai/server/@oxylabs/oxylabs-mcp
Login with GitHub
Find the Install section
Follow the instructions to generate the config

Auto install with Smithery CLI

# example for Claude Desktop
npx -y @smithery/cli@latest install @upstash/context7-mcp --client claude --key <smithery_key>

uvx

Install the uv

# macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Use the following config

{
  "mcpServers": {
    "oxylabs": {
      "command": "uvx",
      "args": ["oxylabs-mcp"],
      "env": {
        "OXYLABS_USERNAME": "OXYLABS_USERNAME",
        "OXYLABS_PASSWORD": "OXYLABS_PASSWORD",
        "OXYLABS_AI_STUDIO_API_KEY": "OXYLABS_AI_STUDIO_API_KEY"
      }
    }
  }
}

uv

Install the uvx

# macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Use the following config

{
  "mcpServers": {
    "oxylabs": {
      "command": "uv",
      "args": [
        "--directory",
        "/<Absolute-path-to-folder>/oxylabs-mcp",
        "run",
        "oxylabs-mcp"
      ],
      "env": {
        "OXYLABS_USERNAME": "OXYLABS_USERNAME",
        "OXYLABS_PASSWORD": "OXYLABS_PASSWORD",
        "OXYLABS_AI_STUDIO_API_KEY": "OXYLABS_AI_STUDIO_API_KEY"
      }
    }
  }
}

Manual Setup with Claude Desktop

Navigate to Claude → Settings → Developer → Edit Config and add one of the configurations above to the claude_desktop_config.json file.

Manual Setup with Cursor AI

Navigate to Cursor → Settings → Cursor Settings → MCP. Click Add new global MCP server and add one of the configurations above.

⚙️ Environment variables

Oxylabs MCP server supports the following environment variables

Name	Description	Default
`OXYLABS_USERNAME`	Your Oxylabs Web Scraper API username
`OXYLABS_PASSWORD`	Your Oxylabs Web Scraper API password
`OXYLABS_AI_STUDIO_API_KEY`	Your Oxylabs AI Studio API key
`LOG_LEVEL`	Log level for the logs returned to the client	`INFO`

*At least one set of credentials (Web Scraper API or AI Studio) is required to use the MCP server.

Credential Requirements

The Oxylabs MCP server supports two independent services:

Oxylabs Web Scraper API: Requires OXYLABS_USERNAME and OXYLABS_PASSWORD
Oxylabs AI Studio: Requires OXYLABS_AI_STUDIO_API_KEY

You can use either service independently or both together. The server will automatically detect which credentials are available and enable the corresponding tools.

📝 Logging

Server provides additional information about the tool calls in notification/message events

{
  "method": "notifications/message",
  "params": {
    "level": "info",
    "data": "Create job with params: {\"url\": \"https://ip.oxylabs.io\"}"
  }
}

{
  "method": "notifications/message",
  "params": {
    "level": "info",
    "data": "Job info: job_id=7333113830223918081 job_status=done"
  }
}

{
  "method": "notifications/message",
  "params": {
    "level": "error",
    "data": "Error: request to Oxylabs API failed"
  }
}

🛡️ License

Distributed under the MIT License – see LICENSE for details.

About Oxylabs

Established in 2015, Oxylabs is a market-leading web intelligence collection platform, driven by the highest business, ethics, and compliance standards, enabling companies worldwide to unlock data-driven insights.

_{Made with ☕ by Oxylabs. Feel free to give us a ⭐ if MCP saved you a weekend.}

For Tasks:

Click tags to check more tools for each tasks

scrape website data extract ai-ready content bypass anti-bot walls access geo-restricted data automate data extraction

For Jobs:

data scientist web scraper ai engineer research analyst data analyst

Alternative AI tools for oxylabs-mcp

Similar Open Source Tools

oxylabs-mcp

github

: 61

paperless-gpt

paperless-gpt is a tool designed to generate accurate and meaningful document titles and tags for paperless-ngx using Large Language Models (LLMs). It supports multiple LLM providers, including OpenAI and Ollama. With paperless-gpt, you can streamline your document management by automatically suggesting appropriate titles and tags based on the content of your scanned documents. The tool offers features like multiple LLM support, customizable prompts, easy integration with paperless-ngx, user-friendly interface for reviewing and applying suggestions, dockerized deployment, automatic document processing, and an experimental OCR feature.

github

: 1.4k

WebAI-to-API

This project implements a web API that offers a unified interface to Google Gemini and Claude 3. It provides a self-hosted, lightweight, and scalable solution for accessing these AI models through a streaming API. The API supports both Claude and Gemini models, allowing users to interact with them in real-time. The project includes a user-friendly web UI for configuration and documentation, making it easy to get started and explore the capabilities of the API.

github

: 304

docs-mcp-server

The docs-mcp-server repository contains the server-side code for the documentation management system. It provides functionalities for managing, storing, and retrieving documentation files. Users can upload, update, and delete documents through the server. The server also supports user authentication and authorization to ensure secure access to the documentation system. Additionally, the server includes APIs for integrating with other systems and tools, making it a versatile solution for managing documentation in various projects and organizations.

github

: 599

SwiftAI

SwiftAI is a modern, type-safe Swift library for building AI-powered apps. It provides a unified API that works seamlessly across different AI models, including Apple's on-device models and cloud-based services like OpenAI. With features like model agnosticism, structured output, agent tool loop, conversations, extensibility, and Swift-native design, SwiftAI offers a powerful toolset for developers to integrate AI capabilities into their applications. The library supports easy installation via Swift Package Manager and offers detailed guidance on getting started, structured responses, tool use, model switching, conversations, and advanced constraints. SwiftAI aims to simplify AI integration by providing a type-safe and versatile solution for various AI tasks.

github

: 201

aider-desk

AiderDesk is a desktop application that enhances coding workflow by leveraging AI capabilities. It offers an intuitive GUI, project management, IDE integration, MCP support, settings management, cost tracking, structured messages, visual file management, model switching, code diff viewer, one-click reverts, and easy sharing. Users can install it by downloading the latest release and running the executable. AiderDesk also supports Python version detection and auto update disabling. It includes features like multiple project management, context file management, model switching, chat mode selection, question answering, cost tracking, MCP server integration, and MCP support for external tools and context. Development setup involves cloning the repository, installing dependencies, running in development mode, and building executables for different platforms. Contributions from the community are welcome following specific guidelines.

github

: 769

mcp-omnisearch

mcp-omnisearch is a Model Context Protocol (MCP) server that acts as a unified gateway to multiple search providers and AI tools. It integrates Tavily, Perplexity, Kagi, Jina AI, Brave, Exa AI, and Firecrawl to offer a wide range of search, AI response, content processing, and enhancement features through a single interface. The server provides powerful search capabilities, AI response generation, content extraction, summarization, web scraping, structured data extraction, and more. It is designed to work flexibly with the API keys available, enabling users to activate only the providers they have keys for and easily add more as needed.

github

: 195

open-responses

OpenResponses API provides enterprise-grade AI capabilities through a powerful API, simplifying development and deployment while ensuring complete data control. It offers automated tracing, integrated RAG for contextual information retrieval, pre-built tool integrations, self-hosted architecture, and an OpenAI-compatible interface. The toolkit addresses development challenges like feature gaps and integration complexity, as well as operational concerns such as data privacy and operational control. Engineering teams can benefit from improved productivity, production readiness, compliance confidence, and simplified architecture by choosing OpenResponses.

github

: 56

quantalogic

QuantaLogic is a ReAct framework for building advanced AI agents that seamlessly integrates large language models with a robust tool system. It aims to bridge the gap between advanced AI models and practical implementation in business processes by enabling agents to understand, reason about, and execute complex tasks through natural language interaction. The framework includes features such as ReAct Framework, Universal LLM Support, Secure Tool System, Real-time Monitoring, Memory Management, and Enterprise Ready components.

github

: 376

tunacode

TunaCode CLI is an AI-powered coding assistant that provides a command-line interface for developers to enhance their coding experience. It offers features like model selection, parallel execution for faster file operations, and various commands for code management. The tool aims to improve coding efficiency and provide a seamless coding environment for developers.

github

: 83

mcp-documentation-server

The mcp-documentation-server is a lightweight server application designed to serve documentation files for projects. It provides a simple and efficient way to host and access project documentation, making it easy for team members and stakeholders to find and reference important information. The server supports various file formats, such as markdown and HTML, and allows for easy navigation through the documentation. With mcp-documentation-server, teams can streamline their documentation process and ensure that project information is easily accessible to all involved parties.

github

: 205

search_with_ai

Build your own conversation-based search with AI, a simple implementation with Node.js & Vue3. Live Demo Features: * Built-in support for LLM: OpenAI, Google, Lepton, Ollama(Free) * Built-in support for search engine: Bing, Sogou, Google, SearXNG(Free) * Customizable pretty UI interface * Support dark mode * Support mobile display * Support local LLM with Ollama * Support i18n * Support Continue Q&A with contexts.

github

: 785

MassGen

MassGen is a cutting-edge multi-agent system that leverages the power of collaborative AI to solve complex tasks. It assigns a task to multiple AI agents who work in parallel, observe each other's progress, and refine their approaches to converge on the best solution to deliver a comprehensive and high-quality result. The system operates through an architecture designed for seamless multi-agent collaboration, with key features including cross-model/agent synergy, parallel processing, intelligence sharing, consensus building, and live visualization. Users can install the system, configure API settings, and run MassGen for various tasks such as question answering, creative writing, research, development & coding tasks, and web automation & browser tasks. The roadmap includes plans for advanced agent collaboration, expanded model, tool & agent integration, improved performance & scalability, enhanced developer experience, and a web interface.

github

: 454

dive

Dive is an AI toolkit for Go that enables the creation of specialized teams of AI agents and seamless integration with leading LLMs. It offers a CLI and APIs for easy integration, with features like creating specialized agents, hierarchical agent systems, declarative configuration, multiple LLM support, extended reasoning, model context protocol, advanced model settings, tools for agent capabilities, tool annotations, streaming, CLI functionalities, thread management, confirmation system, deep research, and semantic diff. Dive also provides semantic diff analysis, unified interface for LLM providers, tool system with annotations, custom tool creation, and support for various verified models. The toolkit is designed for developers to build AI-powered applications with rich agent capabilities and tool integrations.

github

: 91

llm

LLM is a Rust library that allows users to utilize multiple LLM backends (OpenAI, Anthropic, Ollama, DeepSeek, xAI, Phind, Groq, Google) in a single project. It provides a unified API and builder style for creating chat or text completion requests without the need for multiple structures and crates. Key features include multi-backend management, multi-step chains, templates for complex prompts, builder pattern for easy configuration, extensibility, validation, evaluation, parallel evaluation, function calling, REST API support, vision integration, and reasoning capabilities.

github

: 53

ck

ck (seek) is a semantic grep tool that finds code by meaning, not just keywords. It replaces traditional grep by understanding the user's search intent. It allows users to search for code based on concepts like 'error handling' and retrieves relevant code even if the exact keywords are not present. ck offers semantic search, drop-in grep compatibility, hybrid search combining keyword precision with semantic understanding, agent-friendly output in JSONL format, smart file filtering, and various advanced features. It supports multiple search modes, relevance scoring, top-K results, and smart exclusions. Users can index projects for semantic search, choose embedding models, and search specific files or directories. The tool is designed to improve code search efficiency and accuracy for developers and AI agents.

github

: 742

For similar tasks

oxylabs-mcp

github

: 61

genaiscript

GenAIScript is a scripting environment designed to facilitate file ingestion, prompt development, and structured data extraction. Users can define metadata and model configurations, specify data sources, and define tasks to extract specific information. The tool provides a convenient way to analyze files and extract desired content in a structured format. It offers a user-friendly interface for working with data and automating data extraction processes, making it suitable for various data processing tasks.

github

: 2.8k

AutoNode

AutoNode is a self-operating computer system designed to automate web interactions and data extraction processes. It leverages advanced technologies like OCR (Optical Character Recognition), YOLO (You Only Look Once) models for object detection, and a custom site-graph to navigate and interact with web pages programmatically. Users can define objectives, create site-graphs, and utilize AutoNode via API to automate tasks on websites. The tool also supports training custom YOLO models for object detection and OCR for text recognition on web pages. AutoNode can be used for tasks such as extracting product details, automating web interactions, and more.

github

: 116

x-crawl

x-crawl is a flexible Node.js AI-assisted crawler library that offers powerful AI assistance functions to make crawler work more efficient, intelligent, and convenient. It consists of a crawler API and various functions that can work normally even without relying on AI. The AI component is currently based on a large AI model provided by OpenAI, simplifying many tedious operations. The library supports crawling dynamic pages, static pages, interface data, and file data, with features like control page operations, device fingerprinting, asynchronous sync, interval crawling, failed retry handling, rotation proxy, priority queue, crawl information control, and TypeScript support.

github

: 1.5k

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 668

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k