devdocs-to-llm

devdocs-to-llm

Turn any developer documentation into a GPT

Stars: 53

Visit
 screenshot

The devdocs-to-llm repository is a work-in-progress tool that aims to convert documentation from DevDocs format to Long Language Model (LLM) format. This tool is designed to streamline the process of converting documentation for use with LLMs, making it easier for developers to leverage large language models for various tasks. By automating the conversion process, developers can quickly adapt DevDocs content for training and fine-tuning LLMs, enabling them to create more accurate and contextually relevant language models.

README:

OpenAI Cookbook Logo

Turn any developer documentation into a specialized GPT.

Overview

DevDocs to LLM is a tool that allows you to crawl developer documentation, extract content, and process it into a format suitable for use with large language models (LLMs) like ChatGPT. This enables you to create specialized assistants tailored to specific documentation sets.

Features

  • Web crawling with customizable options
  • Content extraction in Markdown format
  • Rate limiting to respect server constraints
  • Retry mechanism for failed scrapes
  • Export options:
    • Rentry.co for quick sharing
    • Google Docs for larger documents

Usage

  1. Set up the Firecrawl environment
  2. Crawl a website and generate a sitemap
  3. Extract content from crawled pages
  4. Export the processed content

Requirements

  • Firecrawl API key
  • Google Docs API credentials (optional, for Google Docs export)

Installation

This project is designed to run in a Jupyter notebook environment, particularly Google Colab. No local installation is required.

Configuration

Before running the notebook, you'll need to set a few parameters:

  • sub_url: The URL of the documentation you want to crawl
  • limit: Maximum number of pages to crawl
  • scrape_option: Choose to scrape all pages or a specific number
  • num_pages: Number of pages to scrape if not scraping all
  • pages_per_minute: Rate limiting parameter
  • wait_time_between_chunks: Delay between scraping chunks
  • retry_attempts: Number of retries for failed scrapes

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT

Copyright (c) 2024-present, Alex Fazio


Watch the video

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for devdocs-to-llm

Similar Open Source Tools

For similar tasks

For similar jobs