DevDocs

Completely free, private, UI based Tech Documentation MCP server. Designed for coders and software developers in mind. Easily integrate into Cursor, Windsurf, Cline, Roo Code, Claude Desktop App

Stars: 469

Visit

DevDocs is a platform designed to simplify the process of digesting technical documentation for software engineers and developers. It automates the extraction and conversion of web content into markdown format, making it easier for users to access and understand the information. By crawling through child pages of a given URL, DevDocs provides a streamlined approach to gathering relevant data and integrating it into various tools for software development. The tool aims to save time and effort by eliminating the need for manual research and content extraction, ultimately enhancing productivity and efficiency in the development process.

README:

DevDocs by CyberAGI 🚀

Turn Weeks of Documentation Research into Hours of Productive Development

Perfect For • Features • Why DevDocs • Getting Started • Scripts • Compare to FireCrawl • Discord

🎯 Perfect For

🏢 Enterprise Software Developers

Skip weeks of reading documentation and dealing with technical debt. Implement ANY technology faster by letting DevDocs handle the heavy lifting of documentation understanding.

🕸️ Web Scrapers

Pull entire contents of websites with Smart Discovery of Child URLs up to level 5. Perfect for both internal and external website documentation with intelligent crawling.

👥 Development Teams

Leverage internal documentation with built-in MCP servers and Claude integration for intelligent data querying. Transform your team's knowledge base into an actionable resource.

🚀 Indie Hackers

DevDocs + VSCode(cline) + Your Idea = Ship products fast with ANY technology. No more getting stuck in documentation hell when building your next big thing.

✨ Features

🧠 Intelligent Crawling

Smart Depth Control: Choose crawl depth from 1-5 levels
Automatic Link Discovery: Finds and categorizes all related content
Selective Crawling: Pick exactly what you want to extract
Child URL Detection: Automatically discovers and maps website structure

⚡ Performance & Speed

Parallel Processing: Crawl multiple pages simultaneously
Smart Caching: Never waste time on duplicate content
Lazy Loading Support: Handles modern web apps effortlessly
Rate Limiting: Respectful crawling that won't overload servers

🎯 Content Processing

Clean Extraction: Get content without the fluff
Multiple Formats: Export to MD or JSON for LLM fine-tuning
Structured Output: Logically organized content
MCP Server Integration: Ready for AI processing

🛡️ Enterprise Features

Error Recovery: Auto-retry on failures
Full Logging: Track every operation
API Access: Integrate with your tools
Team Management: Multiple seats and roles

🤔 Why DevDocs?

The Problem

Documentation is everywhere and LLMs are OUTDATED in their knowledge. Reading it, understanding it, and implementing it takes weeks of research and development even for senior engineers. We cut down that time to hours.

Our Solution

DevDocs brings documentation to you. Point it at any tech documentation URL, and watch as it:

Discovers all related pages to that technology
Extracts meaningful content without the fluff
Organizes information logically inside an MCP server ready for your LLM to query
Presents it in a clean, searchable format in MD or JSON for finetuning LLM purpose

🔥 We want anyone in the world to have the ability to build amazing products quickly using the most cutting edge LLM technology.

💰 Pricing Comparison

Feature	DevDocs	Firecrawl
Free Tier	Unlimited pages	None
Starting Price	Free Forever	$16/month
Enterprise Plan	Custom	$333/month
Crawl Speed	1000/min	20/min
Depth Levels	Up to 5	Limited
Team Seats	Unlimited	1-5 seats
Export Formats	MD, JSON, LLM-ready MCP servers	Limited formats
API Access	Coming Soon	Limited
Model Context Protocol Integration	✅	❌
Support	Priority Available via Discord	Standard only
Self-hosted (free use)	✅	❌

🚀 Getting Started

DevDocs is designed to be easy to use with Docker, requiring minimal setup for new users.

Prerequisites

Docker installed on your system
Git for cloning the repository

Quick Start with Docker (Recommended)

For Mac/Linux users:

# Clone the repository
git clone https://github.com/cyberagiinc/DevDocs.git

# Navigate to the project directory
cd DevDocs

# Start all services using Docker
./docker-start.sh

For Windows users:

# Clone the repository
git clone https://github.com/cyberagiinc/DevDocs.git

# Navigate to the project directory
cd DevDocs

# Start all services using Docker
docker-start.bat

Note for Windows Users

If you encounter permission issues, you may need to run the script as administrator or manually set permissions on the logs, storage, and crawl_results directories. The script uses the icacls command to set permissions, which might require elevated privileges on some Windows systems.

Manually Setting Permissions on Windows:

If you need to manually set permissions, you can do so using either the Windows GUI or command line:

Using Windows Explorer:

Right-click on each directory (logs, storage, crawl_results)

Select "Properties"

Go to the "Security" tab

Click "Edit" to change permissions

Click "Add" to add users/groups

Type "Everyone" and click "Check Names"

Click "OK"

Select "Everyone" in the list

Check "Full control" under "Allow"

Click "Apply" and "OK"

Using Command Prompt (as Administrator):
icacls logs /grant Everyone:F /T
icacls storage /grant Everyone:F /T
icacls crawl_results /grant Everyone:F /T

Note about docker-compose.yml on Windows

If you encounter issues with the docker-compose.yml file (such as "Top-level object must be a mapping" error), the docker-start.bat script automatically fixes this by ensuring the file has the correct format and encoding. This fix is applied every time you run the script, so you don't need to manually modify the file.

This single command will:

Create all necessary directories
Set appropriate permissions
Build and start all Docker containers
Monitor the services to ensure they're running properly

Accessing DevDocs

Once the services are running:

Frontend UI: http://localhost:3001
Backend API: http://localhost:24125
Crawl4AI Service: http://localhost:11235

Logs and Monitoring

When using Docker, logs can be accessed :

Container Logs (recommended for debugging):

# View logs from a specific container
docker logs devdocs-frontend
docker logs devdocs-backend
docker logs devdocs-mcp
docker logs devdocs-crawl4ai

# Follow logs in real-time
docker logs -f devdocs-backend

To stop all services, press Ctrl+C in the terminal where docker-start is running.

📜 Scripts and Their Purpose

DevDocs includes various utility scripts to help with development, testing, and maintenance. Here's a quick reference:

Startup Scripts

start.sh / start.bat / start.ps1 - Start all services (frontend, backend, MCP) for local development.
docker-start.sh / docker-start.bat - Start all services using Docker containers.

MCP Server Scripts

check_mcp_health.sh - Verify the MCP server's health and configuration status.
restart_and_test_mcp.sh - Restart Docker containers with updated MCP configuration and test connectivity.

Crawl4AI Scripts

check_crawl4ai.sh - Check the status and health of the Crawl4AI service.
debug_crawl4ai.sh - Run Crawl4AI in debug mode with verbose logging for troubleshooting.
test_crawl4ai.py - Run tests against the Crawl4AI service to verify functionality.
test_from_container.sh - Test the Crawl4AI service from within a Docker container.

Utility Scripts

view_result.sh - Display crawl results in a formatted view.
find_empty_folders.sh - Identify empty directories in the project structure.
analyze_empty_folders.sh - Analyze empty folders and categorize them by risk level.
verify_reorganization.sh - Verify that code reorganization was successful.

These scripts are organized in the following directories:

Root directory: Main scripts for common operations
scripts/general/: General utility scripts
scripts/docker/: Docker-specific scripts
scripts/mcp/: MCP server management scripts
scripts/test/: Testing and verification scripts

🌍 Built for Developers, by Developers

DevDocs is more than a tool—it's your documentation companion that:

Saves Time: Turn weeks of research into hours
Improves Understanding: Get clean, organized documentation
Enables Innovation: Build faster with any technology
Supports Teams: Share knowledge efficiently
LLM READY: Modern times require modern solutions, using devdocs with LLM is extremely easy and intuitive. With minimal configuration you can run Devdocs and Claude App and recognizes DevDocs's MCP server ready to chat with your data.

🛠️ Setting Up the Cline/Roo Cline for Rapid software development.

Open the "Modes" Interface
- In Roo Code, click the + to create a new Mode-Specific Prompts.
Name
- Give the mode a Name (e.g., Research_MCP).
Role Definition Prompt

Prompt

Expertise and Personality: Expertise: Developer documentation retrieval, technical synthesis, and documentation search. Personality: Systematic, detail-oriented, and precise. Provide well-structured answers with clear references to documentation sections.

Behavioral Mandate: Always use the Table Of Contents and Section Access tools when addressing any query regarding the MCP documentation. Maintain clarity, accuracy, and traceability in your responses.

Mode-Specific Custom Instructions Prompt

Prompt

1. Table Of Contents Tool: Returns a full or filtered list of documentation topics. 
2. Section Access Tool: Retrieves the detailed content of specific documentation sections.

General Process: Query Interpretation: Parse the user's query to extract key topics, keywords, and context. Identify the likely relevant sections (e.g., API configurations, error handling) from the query.

Discovery via Table Of Contents: Use the Table Of Contents tool to search the documentation index for relevant sections. Filter or scan titles and metadata for matching keywords.

Drill-Down Using Section Access: For each identified relevant document or section, use the Section Access tool to retrieve its content. If multiple parts are needed, request all related sections to ensure comprehensive coverage.

Synthesis and Response Formation: Combine the retrieved content into a coherent and complete answer. Reference section identifiers or document paths for traceability. Validate that every aspect of the query has been addressed.

Error Handling: If no matching sections are found, adjust the search parameters and retry. Clearly report if the query remains ambiguous or if no relevant documentation is available.

Mandatory Tool Usage: 
Enforcement: Every time a query is received that requires information from the MCP server docs, the agent MUST first query the Table Of Contents tool to list potential relevant topics, then use the Section Access tool to retrieve the necessary detailed content.

Search & Retrieve Workflow: 
Interpret and Isolate: Identify the key terms and data points from the user's query.

Index Lookup: Immediately query the Table Of Contents tool to obtain a list of relevant documentation sections.

Targeted Retrieval: For each promising section, use the Section Access tool to get complete content.

Information Synthesis: Merge the retrieved content, ensuring all necessary details are included and clearly referenced.

Fallback and Clarification: If initial searches yield insufficient data, adjust the query parameters and retrieve additional sections as needed.

Custom Instruction Loading: Additional custom instructions specific to Research_MCP mode may be loaded from the .clinerules-research-mcp file in your workspace. These may include further refinements or constraints based on evolving documentation structures or query types.

Final Output Construction: The final answer should be organized, directly address the query, and include clear pointers (e.g., section names or identifiers) back to the MCP documentation. Ensure minimal redundancy while covering all necessary details.

🤝 Join Our Community

🏆 Success Stories

"DevDocs turned our 3-week implementation timeline into 2 days. It's not just a crawler, it's a development accelerator." - Senior Engineer at Fortune 100 Company

"Launched my SaaS in half the time by using DevDocs to understand and implement new technologies quickly." - Successful Indie Hacker

📝 Technology Partners

Star History

Made with ❤️ by CyberAGI Inc in 🇺🇸

_{Make Software Development Better Again Contribute to DevDocs}

For Tasks:

Click tags to check more tools for each tasks

discover documentation extract content integrate with mcp server automate data retrieval improve development efficiency

For Jobs:

software engineer web developer data scientist machine learning engineer cybersecurity analyst

Alternative AI tools for DevDocs

Similar Open Source Tools

DevDocs

github

: 469

langmanus

LangManus is a community-driven AI automation framework that combines language models with specialized tools for tasks like web search, crawling, and Python code execution. It implements a hierarchical multi-agent system with agents like Coordinator, Planner, Supervisor, Researcher, Coder, Browser, and Reporter. The framework supports LLM integration, search and retrieval tools, Python integration, workflow management, and visualization. LangManus aims to give back to the open-source community and welcomes contributions in various forms.

github

: 4.9k

utcp-specification

The Universal Tool Calling Protocol (UTCP) Specification repository contains the official documentation for a modern and scalable standard that enables AI systems and clients to discover and interact with tools across different communication protocols. It defines tool discovery mechanisms, call formats, provider configuration, authentication methods, and response handling.

github

: 210

run-gemini-cli

run-gemini-cli is a GitHub Action that integrates Gemini into your development workflow via the Gemini CLI. It acts as an autonomous agent for routine coding tasks and an on-demand collaborator. Use it for GitHub pull request reviews, triaging issues, code analysis, and more. It provides automation, on-demand collaboration, extensibility with tools, and customization options.

github

: 1.3k

MetaGPT

MetaGPT is a multi-agent framework that enables GPT to work in a software company, collaborating to tackle more complex tasks. It assigns different roles to GPTs to form a collaborative entity for complex tasks. MetaGPT takes a one-line requirement as input and outputs user stories, competitive analysis, requirements, data structures, APIs, documents, etc. Internally, MetaGPT includes product managers, architects, project managers, and engineers. It provides the entire process of a software company along with carefully orchestrated SOPs. MetaGPT's core philosophy is "Code = SOP(Team)", materializing SOP and applying it to teams composed of LLMs.

github

: 51.4k

swark

Swark is a VS Code extension that automatically generates architecture diagrams from code using large language models (LLMs). It is directly integrated with GitHub Copilot, requires no authentication or API key, and supports all languages. Swark helps users learn new codebases, review AI-generated code, improve documentation, understand legacy code, spot design flaws, and gain test coverage insights. It saves output in a 'swark-output' folder with diagram and log files. Source code is only shared with GitHub Copilot for privacy. The extension settings allow customization for file reading, file extensions, exclusion patterns, and language model selection. Swark is open source under the GNU Affero General Public License v3.0.

github

: 274

EasyInstruct

EasyInstruct is a Python package proposed as an easy-to-use instruction processing framework for Large Language Models (LLMs) like GPT-4, LLaMA, ChatGLM in your research experiments. EasyInstruct modularizes instruction generation, selection, and prompting, while also considering their combination and interaction.

github

: 381

FinAnGPT-Pro

FinAnGPT-Pro is a financial data downloader and AI query system that downloads quarterly and annual financial data for stocks from EOD Historical Data, storing it in MongoDB and Google BigQuery. It includes an AI-powered natural language interface for querying financial data. Users can set up the tool by following the prerequisites and setup instructions provided in the README. The tool allows users to download financial data for all stocks in a watchlist or for a single stock, query financial data using a natural language interface, and receive responses in a structured format. Important considerations include error handling, rate limiting, data validation, BigQuery costs, MongoDB connection, and security measures for API keys and credentials.

github

: 188

mldl.study

MLDL.Study is a free interactive learning platform focused on simplifying Machine Learning (ML) and Deep Learning (DL) education for students and enthusiasts. It features curated roadmaps, videos, articles, and other learning materials. The platform aims to provide a comprehensive learning experience for Indian audiences, with easy-to-follow paths for ML and DL concepts, diverse resources including video tutorials and articles, and a growing community of over 6000 users. Contributors can add new resources following specific guidelines to maintain quality and relevance. Future plans include expanding content for global learners, introducing a Python programming roadmap, and creating roadmaps for fields like Generative AI and Reinforcement Learning.

github

: 181

Auto-Analyst

Auto-Analyst is an AI-driven data analytics agentic system designed to simplify and enhance the data science process. By integrating various specialized AI agents, this tool aims to make complex data analysis tasks more accessible and efficient for data analysts and scientists. Auto-Analyst provides a streamlined approach to data preprocessing, statistical analysis, machine learning, and visualization, all within an interactive Streamlit interface. It offers plug and play Streamlit UI, agents with data science speciality, complete automation, LLM agnostic operation, and is built using lightweight frameworks.

github

: 180

mattermost-plugin-agents

The Mattermost Agents Plugin integrates AI capabilities directly into your Mattermost workspace, allowing users to run local LLMs on their infrastructure or connect to cloud providers. It offers multiple AI assistants with specialized personalities, thread and channel summarization, action item extraction, meeting transcription, semantic search, smart reactions, direct conversations with AI assistants, and flexible LLM support. The plugin comes with comprehensive documentation, installation instructions, system requirements, and development guidelines for users to interact with AI features and configure LLM providers.

github

: 183

trip_planner_agent

VacAIgent is an AI tool that automates and enhances trip planning by leveraging the CrewAI framework. It integrates a user-friendly Streamlit interface for interactive travel planning. Users can input preferences and receive tailored travel plans with the help of autonomous AI agents. The tool allows for collaborative decision-making on cities and crafting complete itineraries based on specified preferences, all accessible via a streamlined Streamlit user interface. VacAIgent can be customized to use different AI models like GPT-3.5 or local models like Ollama for enhanced privacy and customization.

github

: 61

dev3000

dev3000 captures your web app's complete development timeline including server logs, browser events, console messages, network requests, and automatic screenshots in a unified, timestamped feed for AI debugging. It creates a comprehensive log of your development session that AI assistants can easily understand, monitoring your app in a real browser and capturing server logs, console output, browser console messages and errors, network requests and responses, and automatic screenshots on navigation, errors, and key events. Logs are saved with timestamps and rotated to keep the 10 most recent per project, with the current session symlinked for easy access. The tool integrates with AI assistants for instant debugging and provides advanced querying options through the MCP server.

github

: 416

ComputerGYM

Optexity is a framework for training foundation models using human demonstrations of computer tasks. It enables recording, processing, and utilizing demonstrations to train AI agents for web-based tasks. The tool also plans to incorporate training through self-exploration, software documentations, and YouTube videos in the future.

github

: 68

Director

Director is a framework to build video agents that can reason through complex video tasks like search, editing, compilation, generation, etc. It enables users to summarize videos, search for specific moments, create clips instantly, integrate GenAI projects and APIs, add overlays, generate thumbnails, and more. Built on VideoDB's 'video-as-data' infrastructure, Director is perfect for developers, creators, and teams looking to simplify media workflows and unlock new possibilities.

github

: 791

RainbowGPT

RainbowGPT is a versatile tool that offers a range of functionalities, including Stock Analysis for financial decision-making, MySQL Management for database navigation, and integration of AI technologies like GPT-4 and ChatGlm3. It provides a user-friendly interface suitable for all skill levels, ensuring seamless information flow and continuous expansion of emerging technologies. The tool enhances adaptability, creativity, and insight, making it a valuable asset for various projects and tasks.

github

: 86

For similar tasks

Scrapegraph-LabLabAI-Hackathon

ScrapeGraphAI is a web scraping Python library that utilizes LangChain, LLM, and direct graph logic to create scraping pipelines. Users can specify the information they want to extract, and the library will handle the extraction process. The tool is designed to simplify web scraping tasks by providing a streamlined and efficient approach to data extraction.

github

: 75

DevDocs

github

: 469

FinAnGPT-Pro

github

: 188

1filellm

1filellm is a command-line data aggregation tool designed for LLM ingestion. It aggregates and preprocesses data from various sources into a single text file, facilitating the creation of information-dense prompts for large language models. The tool supports automatic source type detection, handling of multiple file formats, web crawling functionality, integration with Sci-Hub for research paper downloads, text preprocessing, and token count reporting. Users can input local files, directories, GitHub repositories, pull requests, issues, ArXiv papers, YouTube transcripts, web pages, Sci-Hub papers via DOI or PMID. The tool provides uncompressed and compressed text outputs, with the uncompressed text automatically copied to the clipboard for easy pasting into LLMs.

github

: 292

AudioNotes

AudioNotes is a system built on FunASR and Qwen2 that can quickly extract content from audio and video, and organize it using large models into structured markdown notes for easy reading. Users can interact with the audio and video content, install Ollama, pull models, and deploy services using Docker or locally with a PostgreSQL database. The system provides a seamless way to convert audio and video into structured notes for efficient consumption.

github

: 102

dom-to-semantic-markdown

DOM to Semantic Markdown is a tool that converts HTML DOM to Semantic Markdown for use in Large Language Models (LLMs). It maximizes semantic information, token efficiency, and preserves metadata to enhance LLMs' processing capabilities. The tool captures rich web content structure, including semantic tags, image metadata, table structures, and link destinations. It offers customizable conversion options and supports both browser and Node.js environments.

github

: 708

scrape-it-now

Scrape It Now is a versatile tool for scraping websites with features like decoupled architecture, CLI functionality, idempotent operations, and content storage options. The tool includes a scraper component for efficient scraping, ad blocking, link detection, markdown extraction, dynamic content loading, and anonymity features. It also offers an indexer component for creating AI search indexes, chunking content, embedding chunks, and enabling semantic search. The tool supports various configurations for Azure services and local storage, providing flexibility and scalability for web scraping and indexing tasks.

github

: 452

open-deep-research

Open Deep Research is an open-source tool designed to generate AI-powered reports from web search results efficiently. It combines Bing Search API for search results retrieval, JinaAI for content extraction, and customizable report generation. Users can customize settings, export reports in multiple formats, and benefit from rate limiting for stability. The tool aims to streamline research and report creation in a user-friendly platform.

github

: 231

For similar jobs

ciso-assistant-community

CISO Assistant is a tool that helps organizations manage their cybersecurity posture and compliance. It provides a centralized platform for managing security controls, threats, and risks. CISO Assistant also includes a library of pre-built frameworks and tools to help organizations quickly and easily implement best practices.

github

: 3.2k

PurpleLlama

Purple Llama is an umbrella project that aims to provide tools and evaluations to support responsible development and usage of generative AI models. It encompasses components for cybersecurity and input/output safeguards, with plans to expand in the future. The project emphasizes a collaborative approach, borrowing the concept of purple teaming from cybersecurity, to address potential risks and challenges posed by generative AI. Components within Purple Llama are licensed permissively to foster community collaboration and standardize the development of trust and safety tools for generative AI.

github

: 3.8k

vpnfast.github.io

VPNFast is a lightweight and fast VPN service provider that offers secure and private internet access. With VPNFast, users can protect their online privacy, bypass geo-restrictions, and secure their internet connection from hackers and snoopers. The service provides high-speed servers in multiple locations worldwide, ensuring a reliable and seamless VPN experience for users. VPNFast is easy to use, with a user-friendly interface and simple setup process. Whether you're browsing the web, streaming content, or accessing sensitive information, VPNFast helps you stay safe and anonymous online.

github

: 80

taranis-ai

Taranis AI is an advanced Open-Source Intelligence (OSINT) tool that leverages Artificial Intelligence to revolutionize information gathering and situational analysis. It navigates through diverse data sources like websites to collect unstructured news articles, utilizing Natural Language Processing and Artificial Intelligence to enhance content quality. Analysts then refine these AI-augmented articles into structured reports that serve as the foundation for deliverables such as PDF files, which are ultimately published.

github

: 794

NightshadeAntidote

Nightshade Antidote is an image forensics tool used to analyze digital images for signs of manipulation or forgery. It implements several common techniques used in image forensics including metadata analysis, copy-move forgery detection, frequency domain analysis, and JPEG compression artifacts analysis. The tool takes an input image, performs analysis using the above techniques, and outputs a report summarizing the findings.

github

: 163

h4cker

This repository is a comprehensive collection of cybersecurity-related references, scripts, tools, code, and other resources. It is carefully curated and maintained by Omar Santos. The repository serves as a supplemental material provider to several books, video courses, and live training created by Omar Santos. It encompasses over 10,000 references that are instrumental for both offensive and defensive security professionals in honing their skills.

github

: 20.4k

AIMr

AIMr is an AI aimbot tool written in Python that leverages modern technologies to achieve an undetected system with a pleasing appearance. It works on any game that uses human-shaped models. To optimize its performance, users should build OpenCV with CUDA. For Valorant, additional perks in the Discord and an Arduino Leonardo R3 are required.

github

: 229

admyral

Admyral is an open-source Cybersecurity Automation & Investigation Assistant that provides a unified console for investigations and incident handling, workflow automation creation, automatic alert investigation, and next step suggestions for analysts. It aims to tackle alert fatigue and automate security workflows effectively by offering features like workflow actions, AI actions, case management, alert handling, and more. Admyral combines security automation and case management to streamline incident response processes and improve overall security posture. The tool is open-source, transparent, and community-driven, allowing users to self-host, contribute, and collaborate on integrations and features.

github

: 293