marly

marly

Context-aware structured outputs. Search your documents or the web for specific data and get it back in JSON or Markdown.

Stars: 135

Visit
 screenshot

Marly is a tool that allows users to search for and extract context-specific data from various types of documents such as PDFs, Word files, Powerpoints, and websites. It provides the ability to extract data in structured formats like JSON or Markdown, making it easy to integrate into workflows. Marly supports multi-schema and multi-document extraction, offers built-in caching for rapid repeat extractions, and ensures no vendor lock-in by allowing flexibility in choosing model providers.

README:

Marly

PyPI version Discord

FeaturesWhat is a Schema?Use CasesGetting StartedDocumentation


Marly allows you to search for and extract context specific data from your PDFs, WRDs, Powerpoints, Websites etc in a structured format like JSON or Markdown.

Marly Logo


🚀 Features

📄 Extract Relevant Information Seamlessly: Give your applications the ability to identify and extract relevant data from one or many large documents and websites with just a single API call. Get the content back in JSON or Markdown formats, making it easy to integrate into your workflows.

🔍 Multi-Schema/Multi-Document Support: Extract data based one or many predefined schemas from a variety of document types, without needing a vector database or specifying page numbers.

🔄 Built-in Caching: With built-in caching, previously extracted schemas can be instantly retrieved, enabling rapid repeat extractions without having to reprocess the original documents.

🚫 No Vendor Lock-In: Enjoy complete flexibility with your choice of model provider. Whether using open-source or closed-source models, you're never tied to a specific vendor, ensuring full control.


🧰 What is a Schema?

A schema is a set of key-value pairs describing what needs to be extracted from a particular document.

📋 Example Schema
{
    "Firm": "The name of the firm",
    "Number of Funds": "The number of funds managed by the firm",
    "Commitment": "The commitment amount in millions of dollars",
    "% of Total Comm": "The percentage of total commitment",
    "Exposure (FMV + Unfunded)": "The exposure including fair market value and unfunded commitments in millions of dollars",
    "% of Total Exposure": "The percentage of total exposure",
    "TVPI": "Total Value to Paid-In multiple",
    "Net IRR": "Net Internal Rate of Return as a percentage"
}

🎯 Use Cases

💼 Financial Report Analysis 📊 Customer Feedback Processing 🔬 Research Assistant 🧠 Legal Contract Parsing
Extract key financial metrics from quarterly PDF reports Categorize feedback from various document types Process research papers, extracting methodologies and findings Extract key legal terms and conditions from contracts

🛠️ Getting Started

Install the Python Package


To install the python package, run the following command:

pip install marly

Build the Platform


To build the platform from source, run the following command:

./start-marly.sh

Run an example script or notebook

Once the Marly platform is running you can test it out by trying one of our examples

  1. Navigate to the examples folder:

    cd examples
  2. Navigate to the scripts or notebooks folder:

    cd scripts

    or

    cd notebooks/autogen_example
  3. Run one of our example scripts:

    python azure_example.py

📚 Documentation

For more detailed information, please refer to our documentation.


🤝 Contributing

We welcome contributions! Please see our Contributing Guide for more details.

📄 License

This project is licensed under the Elastic License 2.0 (ELv2).

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for marly

Similar Open Source Tools

For similar tasks

For similar jobs