Edit-Banana

Edit-Banana

Edit Banana: A framework for converting statistical formats into editable.

Stars: 1390

Visit
 screenshot

Edit Banana is a universal content re-editor that allows users to transform fixed content into fully manipulatable assets. Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction while preserving original diagram details and logical relationships. The platform offers advanced segmentation, fixed multi-round VLM scanning, high-quality OCR, user system with credits, multi-user concurrency, and a web interface. Users can upload images or PDFs to get editable DrawIO (XML) or PPTX files in seconds. The project structure includes components for segmentation, text extraction, frontend, models, and scripts, with detailed installation and setup instructions provided. The tool is open-source under the Apache License 2.0, allowing commercial use and secondary development.

README:

Edit Banana Logo

🍌 Edit Banana

Universal Content Re-Editor: Make the Uneditable, Editable

Break free from static formats. Our platform empowers you to transform fixed content into fully manipulatable assets. Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction that preserves the original diagram details and logical relationships.

Python License CUDA WeChat


Try It Now!

Try Online Demo

πŸ‘† Click above or https://editbanana.anxin6.cn/ to try Edit Banana online! Upload an image or pdf, get editable DrawIO (XML) or PPTX in seconds. Please note: Our GitHub repository currently trails behind our web-based service. For the most up-to-date features and performance, we recommend using our web platform.


πŸ“Έ Effect Demonstration

High-Definition Input-Output Comparison (3 Typical Scenarios)

To demonstrate the high-fidelity conversion effect, we provides one-to-one comparisons between 3 scenarios of "original static formats" and "editable reconstruction results". All elements can be individually dragged, styled, and modified.

Scenario 1: Figures to Drawio(xml, svg, pptx)

Example No. Original Static Diagram (Input Β· Non-editable) DrawIO Reconstruction Result (Output Β· Fully Editable)
Example 1: Basic Flowchart Original Diagram 1 Reconstruction Result 1
Example 2: Multi-level Architecture Diagram Original Diagram 2 Reconstruction Result 2
Example 3: Technical Schematic Original Diagram 3 Reconstruction Result 3
Example 4: Scientific Formula Diagram Original Diagram 4 Reconstruction Result 4

Scenario 2: PDF to PPTX

Scenario 3: Human in the Loop Modification

✨ Conversion Highlights:

  1. Preserves the layout logic, color matching, and element hierarchy of the original diagram
  2. 1:1 restoration of shape stroke/fill and arrow styles (dashed lines/thickness)
  3. Accurate text recognition, supporting direct subsequent editing and format adjustment
  4. All elements are independently selectable, supporting native DrawIO template replacement and layout optimization

Key Features

  • Advanced Segmentation: Using our fine-tuned SAM 3 (Segment Anything Model 3) for segmentation of diagram elements.
  • Fixed Multi-Round VLM Scanning: An extraction process guided by Multimodal LLMs (Qwen-VL/GPT-4V).
  • High-Quality OCR:
    • Azure Document Intelligence for precise text localization.
    • Fallback Mechanism: Automatically switches to VLM-based end-to-end OCR if Azure services are unreachable.
    • Mistral Vision/MLLM for correcting text and converting mathematical formulas to LaTeX ($\int f(x) dx$).
    • Crop-Guided Strategy: Extracts text/formula regions and sends high-res crops to LLMs for pixel-perfect recognition.
  • User System:
    • Registration: New users receive 10 free credits.
    • Credit System: Pay-per-use model prevents resource abuse.
  • Multi-User Concurrency: Built-in support for concurrent user sessions using a Global Lock mechanism for thread-safe GPU access and an LRU Cache (Least Recently Used) to persist image embeddings across requests, ensuring high performance and stability.
  • Web Interface: A React-based frontend + FastAPI backend for easy uploading and editing.

Architecture Pipeline

  1. Input: Image (PNG/JPG) or PDF.
  2. Segmentation (SAM3): Using our fine-tuned SAM3 mask decoder.
  3. Text Extraction (Parallel):
    • Azure OCR detects text bounding boxes.
    • High-res crops of text regions are sent to Mistral/LLM.
    • Latex conversion for formulas.
  4. XML/PPTX Generation: Merging spatial data from our fine-tuned SAM3 and Text OCR.

Project Structure

sam3_workflow/
β”œβ”€β”€ config/                 # Configuration files
β”œβ”€β”€ flowchart_text/         # OCR & Text Extraction Module
β”‚   β”œβ”€β”€ src/                # OCR Source Code (Azure, Mistral, Alignment)
β”‚   └── main.py             # OCR Entry point
β”œβ”€β”€ frontend/               # React Web Application
β”œβ”€β”€ input/                  # [Manual] Input images directory
β”œβ”€β”€ models/                 # [Manual] Model weights (SAM3)
β”œβ”€β”€ output/                 # [Manual] Results directory
β”œβ”€β”€ sam3/                   # SAM3 Model Library
β”œβ”€β”€ scripts/                # Utility Scripts
β”‚   └── merge_xml.py        # XML Merging & Orchestration
β”œβ”€β”€ main.py                 # CLI Entry point (Modular Pipeline)
β”œβ”€β”€ server_pa.py            # FastAPI Backend Server (Service-based)
└── requirements.txt        # Python dependencies

Installation & Setup

Follow these steps to set up the project locally.

1. Prerequisites

  • Python 3.10+
  • Node.js & npm (for the frontend)
  • CUDA-capable GPU (Highly recommended)

2. Clone Repository

git clone https://github.com/BIT-DataLab/Edit-Banana.git
cd Image2DrawIO

3. Initialize Directory Structure

After cloning, you must manually create the following resource directories (ignored by Git):

# Create input/output directories
mkdir -p input
mkdir -p output
mkdir -p sam3_output

4. Download Model Weights

Download the required models and place them in the correct paths:

Model Download Target Path
SAM 3 https://modelscope.cn/models/facebook/sam3 models/sam3.pt (or as configured)

Note: For SAM 3 (or the specific segmentation checkpoint used), place the .pt file in models/ and update config.yaml.

5. Install Dependencies

Backend:

pip install -r requirements.txt

Frontend:

cd frontend
npm install
cd ..

6. Configuration

  1. Config File: Copy the example config.
    cp config/config.yaml.example config/config.yaml
  2. Environment Variables: Create a .env file in the root directory.
    AZURE_ENDPOINT=your_azure_endpoint
    AZURE_API_KEY=your_azure_key
    # Add other keys as needed

Usage

1. Web Interface (Recommended)

Start the Backend:

python server_pa.py
# Server runs at http://localhost:8000

Start the Frontend:

cd frontend
npm install
npm run dev
# Frontend runs at http://localhost:5173

Open your browser, upload an image, and view the result in the embedded DrawIO editor.

2. Command Line Interface (CLI)

To process a single image:

python main.py -i input/test_diagram.png

The output XML will be saved in the output/ directory.

Configuration config.yaml

Customize the pipeline behavior in config/config.yaml:

  • sam3: Adjust score thresholds, NMS (Non-Maximum Suppression) thresholds, max iteration loops.
  • paths: Set input/output directories.
  • dominant_color: Fine-tune color extraction sensitivity.

πŸ“Œ Development Roadmap

Feature Module Status Description
Core Conversion Pipeline βœ… Completed Full pipeline of segmentation, reconstruction and OCR
Intelligent Arrow Connection ⚠️ In Development Automatically associate arrows with target shapes
DrawIO Template Adaptation πŸ“ Planned Support custom template import
Batch Export Optimization πŸ“ Planned Batch export to DrawIO files (.drawio)
Local LLM Adaptation πŸ“ Planned Support local VLM deployment, independent of APIs

🀝 Contribution Guidelines

Contributions of all kinds are welcome (code submissions, bug reports, feature suggestions):

  1. Fork this repository
  2. Create a feature branch (git checkout -b feature/xxx)
  3. Commit your changes (git commit -m 'feat: add xxx')
  4. Push to the branch (git push origin feature/xxx)
  5. Open a Pull Request

Bug Reports: Issues Feature Suggestions: Discussions

πŸ’¬ Join WeChat Group

Welcome to join our WeChat group to discuss and exchange ideas! Scan the QR code below to join:

WeChat Group QR Code
Scan to join the Edit Banana community

πŸ’‘ If the QR code has expired, please submit an Issue to request an updated one.

🀩 Contributors

Thanks to all developers who have contributed to the project and promoted its iteration!

Name/ID Email
Chai Chengliang [email protected]
Zhang Chi [email protected]
Deng Qiyan
Rao Sijing
Yi Xiangjian
Li Jianhui
Shen Chaoyuan
Zhang Junkai
Han Junyi
You Zirui
Xu Haochen
An Minghao
Yu Mingjie
Yu Xinjiang
Chen Zhuofan
Li Xiangkun

πŸ“„ License

This project is open-source under the Apache License 2.0, allowing commercial use and secondary development (with copyright notice retained).


🌟 Star History

🌟 If this project helps you, please star it to show your support!

Star History Chart(https://www.star-history.com/#bit-datalab/edit-banana&type=date&legend=top-left)

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for Edit-Banana

Similar Open Source Tools

For similar tasks

For similar jobs