markpdfdown

markpdfdown

This is a tool that uses a multimodal LLM to transcribe PDF files into Markdown format. 借助多模态大模型将PDF文件转为Markdown

Stars: 137

Visit
 screenshot

MarkPDFDown is a powerful tool that leverages multimodal large language models to transcribe PDF files into Markdown format. It simplifies the process of converting PDF documents into clean, editable Markdown text by accurately extracting text, preserving formatting, and handling complex document structures including tables, formulas, and diagrams.

README:

MarkPDFDown

English | 中文

Size Pulls Tag License

A powerful tool that leverages multimodal large language models to transcribe PDF files into Markdown format.

markpdfdown

Overview

MarkPDFDown is designed to simplify the process of converting PDF documents into clean, editable Markdown text. By utilizing advanced multimodal AI models, it can accurately extract text, preserve formatting, and handle complex document structures including tables, formulas, and diagrams.

Features

  • PDF to Markdown Conversion: Transform any PDF document into well-formatted Markdown
  • Multimodal Understanding: Leverages AI to comprehend document structure and content
  • Format Preservation: Maintains headings, lists, tables, and other formatting elements
  • Customizable Model: Configure the model to suit your needs

Demo

Installation

conda create -n markpdfdown python=3.9
conda activate markpdfdown

# Clone the repository
git clone https://github.com/jorben/markpdfdown.git
cd markpdfdown

# Install dependencies
pip install -r requirements.txt

Usage

# Set up your OpenAI API key
export OPENAI_API_KEY=<your-api-key>
# Optionally, set up your OpenAI API base
export OPENAI_API_BASE=<your-api-base>
# Optionally, set up your OpenAI API model
export OPENAI_DEFAULT_MODEL=<your-model>

# Run the application
python main.py < tests/input.pdf > output.md

Advanced Usage

python main.py page_start page_end < tests/input.pdf > output.md

Docker Usage

docker run -i -e OPENAI_API_KEY=<your-api-key> -e OPENAI_API_BASE=<your-api-base> -e OPENAI_DEFAULT_MODEL=<your-model> jorben/markpdfdown < tests/input.pdf > output.md

Requirements

  • Python 3.9+
  • Dependencies listed in requirements.txt
  • Access to the specified multimodal AI model

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch ( git checkout -b feature/amazing-feature )
  3. Commit your changes ( git commit -m 'feat: Add some amazing feature' )
  4. Push to the branch ( git push origin feature/amazing-feature )
  5. Open a Pull Request

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Acknowledgments

  • Thanks to the developers of the multimodal AI models that power this tool
  • Inspired by the need for better PDF to Markdown conversion tools

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for markpdfdown

Similar Open Source Tools

For similar tasks

For similar jobs