sparrow

Data processing with ML, LLM and Vision LLM

Stars: 4456

Visit

Sparrow is an innovative open-source solution for efficient data extraction and processing from various documents and images. It seamlessly handles forms, invoices, receipts, and other unstructured data sources. Sparrow stands out with its modular architecture, offering independent services and pipelines all optimized for robust performance. One of the critical functionalities of Sparrow - pluggable architecture. You can easily integrate and run data extraction pipelines using tools and frameworks like LlamaIndex, Haystack, or Unstructured. Sparrow enables local LLM data extraction pipelines through Ollama or Apple MLX. With Sparrow solution you get API, which helps to process and transform your data into structured output, ready to be integrated with custom workflows. Sparrow Agents - with Sparrow you can build independent LLM agents, and use API to invoke them from your system. **List of available agents:** * **llamaindex** - RAG pipeline with LlamaIndex for PDF processing * **vllamaindex** - RAG pipeline with LLamaIndex multimodal for image processing * **vprocessor** - RAG pipeline with OCR and LlamaIndex for image processing * **haystack** - RAG pipeline with Haystack for PDF processing * **fcall** - Function call pipeline * **unstructured-light** - RAG pipeline with Unstructured and LangChain, supports PDF and image processing * **unstructured** - RAG pipeline with Weaviate vector DB query, Unstructured and LangChain, supports PDF and image processing * **instructor** - RAG pipeline with Unstructured and Instructor libraries, supports PDF and image processing. Works great for JSON response generation

README:

Sparrow

Data processing with ML, LLM and Vision LLM

Overview

Sparrow is an innovative open-source solution for efficient data extraction and processing from various documents and images. It seamlessly handles forms, bank statements, invoices, receipts, and other unstructured data sources (any type of document layout and complexity). Sparrow stands out with its modular architecture, offering independent services and pipelines all optimized for robust performance. One of the critical functionalities of Sparrow - pluggable architecture. You can easily integrate and run data extraction pipelines using Sparrow Parse (with vision-language models support) or Instructor. Sparrow enables local LLM data extraction pipelines through various backends, such as vLLM, Ollama, PyTorch or Apple MLX. Sparrow Parse with VL model can run either on premise, or it can execute inference on cloud GPU. With Sparrow solution you get API, which helps to process and transform your data into structured output, ready to be integrated with custom workflows.

Sparrow Pipelines - with Sparrow you can build independent LLM pipelines, and use API to invoke them from your system.

Sparrow Components

Sparrow ML LLM - Sparrow main engine that runs various pipelines
Sparrow Agents - Sparrow Agent framework
Sparrow Parse - Sparrow library enabling Sparrow Parse pipeline functionality using VL LLM. Works great for structured JSON response generation
Sparrow OCR - OCR service, providing optical character recognition
Sparrow UI - Dashboard UI for Sparrow

Sparrow UI

Try Sparrow online, this runs on Mac Mini M4 Pro 64GB server with MLX

Examples - data extraction with Sparrow

Bank statement

{
  "bank": "First Platypus Bank",
  "address": "1234 Kings St., New York, NY 12123",
  "account_holder": "Mary G. Orta",
  "account_number": "1234567890123",
  "statement_date": "3/1/2022",
  "period_covered": "2/1/2022 - 3/1/2022",
  "account_summary": {
    "balance_on_march_1": "$25,032.23",
    "total_money_in": "$10,234.23",
    "total_money_out": "$10,532.51"
  },
  "transactions": [
    {
      "date": "02/01",
      "description": "PGD EasyPay Debit",
      "withdrawal": "203.24",
      "deposit": "",
      "balance": "22,098.23"
    },
    {
      "date": "02/02",
      "description": "AB&B Online Payment*****",
      "withdrawal": "71.23",
      "deposit": "",
      "balance": "22,027.00"
    },
    {
      "date": "02/04",
      "description": "Check No. 2345",
      "withdrawal": "",
      "deposit": "450.00",
      "balance": "22,477.00"
    },
    {
      "date": "02/05",
      "description": "Payroll Direct Dep 23422342 Giants",
      "withdrawal": "",
      "deposit": "2,534.65",
      "balance": "25,011.65"
    },
    {
      "date": "02/06",
      "description": "Signature POS Debit - TJP",
      "withdrawal": "84.50",
      "deposit": "",
      "balance": "24,927.15"
    },
    {
      "date": "02/07",
      "description": "Check No. 234",
      "withdrawal": "1,400.00",
      "deposit": "",
      "balance": "23,527.15"
    },
    {
      "date": "02/08",
      "description": "Check No. 342",
      "withdrawal": "",
      "deposit": "25.00",
      "balance": "23,552.15"
    },
    {
      "date": "02/09",
      "description": "FPB AutoPay***** Credit Card",
      "withdrawal": "456.02",
      "deposit": "",
      "balance": "23,096.13"
    },
    {
      "date": "02/08",
      "description": "Check No. 123",
      "withdrawal": "",
      "deposit": "25.00",
      "balance": "23,552.15"
    },
    {
      "date": "02/09",
      "description": "FPB AutoPay***** Credit Card",
      "withdrawal": "156.02",
      "deposit": "",
      "balance": "23,096.13"
    },
    {
      "date": "02/08",
      "description": "Cash Deposit",
      "withdrawal": "",
      "deposit": "25.00",
      "balance": "23,552.15"
    }
  ],
  "valid": "true"
}

Bonds table

{
  "data": [
    {
      "instrument_name": "UNITS BLACKROCK FIX INC DUB FDS PLC ISHS EUR INV GRD CP BD IDX/INST/E",
      "valuation": 19049
    },
    {
      "instrument_name": "UNITS ISHARES III PLC CORE EUR GOVT BOND UCITS ETF/EUR",
      "valuation": 83488
    },
    {
      "instrument_name": "UNITS ISHARES III PLC EUR CORP BOND 1-5YR UCITS ETF/EUR",
      "valuation": 213030
    },
    {
      "instrument_name": "UNIT ISHARES VI PLC/JP MORGAN USD E BOND EUR HED UCITS ETF DIST/HDGD/",
      "valuation": 32774
    },
    {
      "instrument_name": "UNITS XTRACKERS II SICAV/EUR HY CORP BOND UCITS ETF/-1D-/DISTR.",
      "valuation": 23643
    }
  ],
  "valid": "true"
}

Quickstart

Install pyenv and then install Python into your environment
Create virtual environment for the Sparrow pipeline you want to run
Install requirements for the Sparrow pipeline you want to use. Keep in mind, depending on OS, it could be required to do additional install steps for some of the libraries (for example PaddleOCR)
Run Sparrow either from CLI or from API. You need to start API endpoint
Pass query in the form of JSON schema to extract data from the document

See detailed instructions below.

Installation

Python environment

For more details, check out the extended section.

Run Sparrow

You can run Sparrow on CLI or through API. To run on CLI, use sparrow.sh script. Run it from corresponding virtual environment, depending which pipeline you want to execute. sparrow-parse pipeline is using VL LLM model. sparrow-parse pipeline runs VL LLM either locally with MLX, Ollama, or using cloud GPU. instructor pipeline is using Ollama backend. Make sure to pull LLM model for Ollama using name specified in config.yml to run instructor pipeline.

Private deployment

There is a property PROTECTED_ACCESS: False in config.yml. When set to False, sparrow_key is not verified on API call. Otherwise correct sparrow-key needs to be provided for API call.

Inference

Arguments:

query - JSON query string argument with field names and types to fetch

file-path - input document. This can be image or multipage PDF

pipeline - Sparrow pipeline used to process query request

crop_size=N (where N is an integer) to crop N pixels from all borders of the input images. This can be helpful for removing unwanted borders or frame artifacts from scanned documents

debug-dir - folder where processed images are stored

debug - if True, additional messages will be printed

Options:

--options mlx - this will set MLX as backend for local inference

--options mlx-community/Qwen2.5-VL-72B-Instruct-4bit - name for Vision LLM model, supported by MLX

--options tables_only - set this option, if you want to process tables only

--options validation_off - set this option, if you want to disable response validation

--page-type invoice --page-type adjudication_table --page-type adjudication_details - used when query=*, specify a list of page type labels to be used to classify document pages

First two options are always set first.

It is possible to extract non-existing fields (fields missing from the document, NULL values will be assigned) with Sparrow. Use or null format for the query, example:

[{"instrument_name":"str", "valuation":0, "transaction_fees":"0.0 or null"}]

✅ Sparrow Parse pipeline, running locally with Apple MLX backend.

Sparrow validates response against request schema. Result of validation is automatically included into response, see property valid.

./sparrow.sh "[{"instrument_name":"str", "valuation":0}]" --pipeline "sparrow-parse" --debug --options mlx --options mlx-community/Qwen2.5-VL-72B-Instruct-4bit --file-path "/data/bonds_table.png"

Answer:

{
  "data": [
    {
      "instrument_name": "UNITS BLACKROCK FIX INC DUB FDS PLC ISHS EUR INV GRD CP BD IDX/INST/E",
      "valuation": 19049
    },
    {
      "instrument_name": "UNITS ISHARES III PLC CORE EUR GOVT BOND UCITS ETF/EUR",
      "valuation": 83488
    },
    {
      "instrument_name": "UNITS ISHARES III PLC EUR CORP BOND 1-5YR UCITS ETF/EUR",
      "valuation": 213030
    },
    {
      "instrument_name": "UNIT ISHARES VI PLC/JP MORGAN USD E BOND EUR HED UCITS ETF DIST/HDGD/",
      "valuation": 32774
    },
    {
      "instrument_name": "UNITS XTRACKERS II SICAV/EUR HY CORP BOND UCITS ETF/-1D-/DISTR.",
      "valuation": 23643
    }
  ],
  "valid": "true"
}

✅ Sparrow Parse pipeline, with GPU backend on Hugging Face. GPU backend katanaml/sparrow-qwen2-vl-7b is private, to be able to run below command, you need to create your own backend on Hugging Face space using code from Sparrow Parse.

./sparrow.sh "[{"instrument_name":"str", "valuation":0}]" --pipeline "sparrow-parse" --debug --options huggingface --options katanaml/sparrow-qwen2-vl-7b --file-path "/data/bonds_table.png"

✅ Sparrow Parse pipeline supports multi-page PDF documents. Running locally with Apple MLX backend. Response will be structured per page with page number indicators.

./sparrow.sh "{"table": [{"description": "str", "latest_amount": 0, "previous_amount": 0}]}" --pipeline "sparrow-parse" --debug --options mlx --options mlx-community/Qwen2.5-VL-72B-Instruct-4bit --file-path "/data/oracle_10k_2014_q1_small.pdf" --debug-dir "/data/"

Example running private GPU backend katanaml/sparrow-qwen2-vl-7b:

./sparrow.sh "{"table": [{"description": "str", "latest_amount": 0, "previous_amount": 0}]}" --pipeline "sparrow-parse" --debug --options huggingface --options katanaml/sparrow-qwen2-vl-7b --file-path "/data/oracle_10k_2014_q1_small.pdf" --debug-dir "/data/"

Sample answer:

[
    {
        "table": [
            {
                "description": "Revenues",
                "latest_amount": 12453,
                "previous_amount": 11445
            },
            {
                "description": "Operating expenses",
                "latest_amount": 9157,
                "previous_amount": 8822
            }
        ],
        "valid": "true",
        "page": 1
    },
    {
        "table": [
            {
                "description": "Revenues",
                "latest_amount": 12453,
                "previous_amount": 11445
            },
            {
                "description": "Operating expenses",
                "latest_amount": 9157,
                "previous_amount": 8822
            }
        ],
        "valid": "true",
        "page": 2
    }
]

✅ Sparrow Parse pipeline running with cropping functionality

./sparrow.sh "*" --pipeline "sparrow-parse" --debug --options mlx --options mlx-community/Qwen2.5-VL-72B-Instruct-4bit --crop-size 60 --file-path "/data/invoice_1.pdf"

Sample answer:

{
  "invoice_number": "61356291",
  "date_of_issue": "09/06/2012",
  "seller": {
    "name": "Chapman, Kim and Green",
    "address": "64731 James Branch, Smithmouthouth, NC 26872",
    "tax_id": "949-84-9105",
    "iban": "GB50ACIE59715038217063"
  },
  "client": {
    "name": "Rodriguez-Stevens",
    "address": "2280 Angela Plain, Plain, MS 93248",
    "tax_id": "939-98-8477"
  },
  "items": [
    {
      "description": "Wine Glasses Goblets Pair Clear",
      "quantity": 5,
      "unit": "each",
      "net_price": 12.0,
      "net_worth": 60.0,
      "vat_percentage": 10,
      "gross_worth": 66.0
    },
    {
      "description": "With Hooks Stemware Storage Multiple Multiple Uses Iron Wine Rack Hanging",
      "quantity": 4,
      "unit": "each",
      "net_price": 28.08,
      "net_worth": 112.32,
      "vat_percentage": 10,
      "gross_worth": 123.55
    },
    {
      "description": "Replacement Corkscrew Parts Spiral Worm Worm Wine Opener Bottle Houdini",
      "quantity": 1,
      "unit": "each",
      "net_price": 7.5,
      "net_worth": 7.5,
      "vat_percentage": 10,
      "gross_worth": 8.25
    },
    {
      "description": "HOME ESSENTIALS GRADIENT STEMLESS WINE GLASSES SET OF 4 20 FL OZ (591 ml) NEW",
      "quantity": 1,
      "unit": "each",
      "net_price": 12.99,
      "net_worth": 12.99,
      "vat_percentage": 10,
      "gross_worth": 14.29
    }
  ],
  "summary": {
    "total_net_worth": 192.81,
    "total_vat": 19.28,
    "total_gross_worth": 212.09
  }
}

✅ LLM function call example:

./sparrow.sh assistant --pipeline "stocks" --query "Oracle"

Answer:

{
  "company": "Oracle Corporation",
  "ticker": "ORCL"
}

The stock price of the Oracle Corporation is 186.3699951171875. USD

Sparrow FastAPI Endpoint

Sparrow enables you to run a local LLM RAG as an API using FastAPI, providing a convenient and efficient way to interact with our services. You can pass the name of the plugin to be used for the inference. By default, sparrow-parse pipeline is used.

To set this up:

Start the Endpoint

Launch the endpoint by executing the following command in your terminal:

python api.py

If you want to run pipelines from different Python virtual environments simultaneously, you can specify port, to avoid conflicts:

python api.py --port 8001

Access the Endpoint Documentation

You can view detailed documentation for the API by navigating to:

http://127.0.0.1:8000/api/v1/sparrow-llm/docs

For visual reference, a screenshot of the FastAPI endpoint

API call inference

Arguments:

query - JSON query string argument with field names and types to fetch

file-path - input document. This can be image or multipage PDF

pipeline - Sparrow pipeline used to process query request

crop_size=N (where N is an integer) to crop N pixels from all borders of the input images. This can be helpful for removing unwanted borders or frame artifacts from scanned documents

debug-dir - folder where processed images are stored

debug - if True, additional messages will be printed

Options:

--options mlx - this will set MLX as backend for local inference

--options mlx-community/Qwen2.5-VL-72B-Instruct-4bit - name for Vision LLM model, supported by MLX

--options tables_only - set this option, if you want to process tables only

--options validation_off - set this option, if you want to disable response validation

--page-type invoice --page-type adjudication_table --page-type adjudication_details - used when query=*, specify a list of page type labels to be used to classify document pages

First two options are always set first.

It is possible to extract non-existing fields (fields missing from the document, NULL values will be assigned) with Sparrow. Use or null format for the query, example:

[{"instrument_name":"str", "valuation":0, "transaction_fees":"0.0 or null"}]

✅ sparrow-parse pipeline. This pipeline runs locally with Apple MLX backend.

curl -X 'POST' \
  'http://127.0.0.1:8000/api/v1/sparrow-llm/inference' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'query=[{"instrument_name":"str","valuation":0}]' \
  -F 'page_type=' \
  -F 'pipeline=sparrow-parse' \
  -F 'options=mlx,mlx-community/Qwen2.5-VL-72B-Instruct-4bit' \
  -F 'crop_size=' \
  -F 'debug_dir=' \
  -F 'debug=false' \
  -F 'sparrow_key=' \
  -F 'file=@bonds_table.png;type=image/png'

Alternatively, pipeline can run Visual LLM with GPU backend on Hugging Face. GPU backend katanaml/sparrow-qwen2-vl-7b is private, to be able to run below command, you need to create your own backend on Hugging Face space using code from Sparrow Parse.

curl -X 'POST' \
  'http://127.0.0.1:8000/api/v1/sparrow-llm/inference' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'query=[{"instrument_name":"str","valuation":0}]' \
  -F 'page_type=' \
  -F 'pipeline=sparrow-parse' \
  -F 'options=huggingface,katanaml/sparrow-qwen2-vl-7b' \
  -F 'crop_size=' \
  -F 'debug_dir=' \
  -F 'debug=false' \
  -F 'sparrow_key=' \
  -F 'file=@bonds_table.png;type=image/png'

Sparrow Parse pipeline supports multi-page PDF documents. You can pass PDF document through the API endpoint, response will be structured per page with page number indicators:

[
    {
        "table": [
            {
                "description": "Revenues",
                "latest_amount": 12453,
                "previous_amount": 11445
            },
            {
                "description": "Operating expenses",
                "latest_amount": 9157,
                "previous_amount": 8822
            }
        ],
        "valid": "true",
        "page": 1
    },
    {
        "table": [
            {
                "description": "Revenues",
                "latest_amount": 12453,
                "previous_amount": 11445
            },
            {
                "description": "Operating expenses",
                "latest_amount": 9157,
                "previous_amount": 8822
            }
        ],
        "valid": "true",
        "page": 2
    }
]

🤖 Sparrow Agent

Sparrow Agent provides a unified workflow system for orchestrating complex document processing tasks. It allows you to combine multiple document data extraction operations into a single coherent flow, with visual monitoring and tracking powered by Prefect. The agent system intelligently routes documents through classification, extraction, and validation steps, handling the entire pipeline from document ingestion to structured data output. Extensible by design, it currently supports document processing workflows and can be expanded to accommodate additional use cases.

Example:

curl -X 'POST' \
  'http://127.0.0.1:8001/api/v1/sparrow-agents/execute/file' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'agent_name=medical_prescriptions' \
  -F 'extraction_params={"sparrow_key":"123456"}' \
  -F 'file=@your_doc.pdf;type=application/pdf'

Star History

Commercial usage

Sparrow is available under the GPL 3.0 license, promoting freedom to use, modify, and distribute the software while ensuring any modifications remain open source under the same license. This aligns with our commitment to supporting the open-source community and fostering collaboration.

Additionally, we recognize the diverse needs of organizations, including small to medium-sized enterprises (SMEs). Therefore, Sparrow is also offered for free commercial use to organizations with gross revenue below $5 million USD in the past 12 months, enabling them to leverage Sparrow without the financial burden often associated with high-quality software solutions.

For businesses that exceed this revenue threshold or require usage terms not accommodated by the GPL 3.0 license—such as integrating Sparrow into proprietary software without the obligation to disclose source code modifications—we offer dual licensing options. Dual licensing allows Sparrow to be used under a separate proprietary license, offering greater flexibility for commercial applications and proprietary integrations. This model supports both the project's sustainability and the business's needs for confidentiality and customization.

If your organization is seeking to utilize Sparrow under a proprietary license, or if you are interested in custom workflows, consulting services, or dedicated support and maintenance options, please contact us at [email protected]. We're here to provide tailored solutions that meet your unique requirements, ensuring you can maximize the benefits of Sparrow for your projects and workflows.

Author

Katana ML, Andrej Baranovskij

License

For Tasks:

Click tags to check more tools for each tasks

extract data process documents generate invoices generate receipts generate forms

For Jobs:

data extraction document processing invoice processing receipt processing form processing

Alternative AI tools for sparrow

Similar Open Source Tools

sparrow

github

: 4.5k

AICentral

AI Central is a powerful tool designed to take control of your AI services with minimal overhead. It is built on Asp.Net Core and dotnet 8, offering fast web-server performance. The tool enables advanced Azure APIm scenarios, PII stripping logging to Cosmos DB, token metrics through Open Telemetry, and intelligent routing features. AI Central supports various endpoint selection strategies, proxying asynchronous requests, custom OAuth2 authorization, circuit breakers, rate limiting, and extensibility through plugins. It provides an extensibility model for easy plugin development and offers enriched telemetry and logging capabilities for monitoring and insights.

github

: 76

firecrawl

Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown. It crawls all accessible subpages and provides clean markdown for each, without requiring a sitemap. The API is easy to use and can be self-hosted. It also integrates with Langchain and Llama Index. The Python SDK makes it easy to crawl and scrape websites in Python code.

github

: 34.1k

VectorETL

VectorETL is a lightweight ETL framework designed to assist Data & AI engineers in processing data for AI applications quickly. It streamlines the conversion of diverse data sources into vector embeddings and storage in various vector databases. The framework supports multiple data sources, embedding models, and vector database targets, simplifying the creation and management of vector search systems for semantic search, recommendation systems, and other vector-based operations.

github

: 72

structured-logprobs

This Python library enhances OpenAI chat completion responses by providing detailed information about token log probabilities. It works with OpenAI Structured Outputs to ensure model-generated responses adhere to a JSON Schema. Developers can analyze and incorporate token-level log probabilities to understand the reliability of structured data extracted from OpenAI models.

github

: 155

008

008 is an open-source event-driven AI powered WebRTC Softphone compatible with macOS, Windows, and Linux. It is also accessible on the web. The name '008' or 'agent 008' reflects our ambition: beyond crafting the premier Open Source Softphone, we aim to introduce a programmable, event-driven AI agent. This agent utilizes embedded artificial intelligence models operating directly on the softphone, ensuring efficiency and reduced operational costs.

github

: 75

openmacro

Openmacro is a multimodal personal agent that allows users to run code locally. It acts as a personal agent capable of completing and automating tasks autonomously via self-prompting. The tool provides a CLI natural-language interface for completing and automating tasks, analyzing and plotting data, browsing the web, and manipulating files. Currently, it supports API keys for models powered by SambaNova, with plans to add support for other hosts like OpenAI and Anthropic in future versions.

github

: 62

chat-ui

A chat interface using open source models, eg OpenAssistant or Llama. It is a SvelteKit app and it powers the HuggingChat app on hf.co/chat.

github

: 8.5k

promptic

Promptic is a tool designed for LLM app development, providing a productive and pythonic way to build LLM applications. It leverages LiteLLM, allowing flexibility to switch LLM providers easily. Promptic focuses on building features by providing type-safe structured outputs, easy-to-build agents, streaming support, automatic prompt caching, and built-in conversation memory.

github

: 223

pipecat-flows

Pipecat Flows is a framework designed for building structured conversations in AI applications. It allows users to create both predefined conversation paths and dynamically generated flows, handling state management and LLM interactions. The framework includes a Python module for building conversation flows and a visual editor for designing and exporting flow configurations. Pipecat Flows is suitable for scenarios such as customer service scripts, intake forms, personalized experiences, and complex decision trees.

github

: 222

firecrawl-mcp-server

Firecrawl MCP Server is a Model Context Protocol (MCP) server implementation that integrates with Firecrawl for web scraping capabilities. It supports features like scrape, crawl, search, extract, and batch scrape. It provides web scraping with JS rendering, URL discovery, web search with content extraction, automatic retries with exponential backoff, credit usage monitoring, comprehensive logging system, support for cloud and self-hosted FireCrawl instances, mobile/desktop viewport support, and smart content filtering with tag inclusion/exclusion. The server includes configurable parameters for retry behavior and credit usage monitoring, rate limiting and batch processing capabilities, and tools for scraping, batch scraping, checking batch status, searching, crawling, and extracting structured information from web pages.

github

: 116

trex

Trex is a tool that transforms unstructured data into structured data by specifying a regex or context-free grammar. It intelligently restructures data to conform to the defined schema. It offers a Python client for installation and requires an API key obtained by signing up at automorphic.ai. The tool supports generating structured JSON objects based on user-defined schemas and prompts. Trex aims to provide significant speed improvements, structured custom CFG and regex generation, and generation from JSON schema. Future plans include auto-prompt generation for unstructured ETL and more intelligent models.

github

: 239

ZerePy

ZerePy is an open-source Python framework for deploying agents on X using OpenAI or Anthropic LLMs. It offers CLI interface, Twitter integration, and modular connection system. Users can fine-tune models for creative outputs and create agents with specific tasks. The tool requires Python 3.10+, Poetry 1.5+, and API keys for LLM, OpenAI, Anthropic, and X API.

github

: 510

functionary

Functionary is a language model that interprets and executes functions/plugins. It determines when to execute functions, whether in parallel or serially, and understands their outputs. Function definitions are given as JSON Schema Objects, similar to OpenAI GPT function calls. It offers documentation and examples on functionary.meetkai.com. The newest model, meetkai/functionary-medium-v3.1, is ranked 2nd in the Berkeley Function-Calling Leaderboard. Functionary supports models with different context lengths and capabilities for function calling and code interpretation. It also provides grammar sampling for accurate function and parameter names. Users can deploy Functionary models serverlessly using Modal.com.

github

: 1.5k

mistreevous

Mistreevous is a library written in TypeScript for Node and browsers, used to declaratively define, build, and execute behaviour trees for creating complex AI. It allows defining trees with JSON or a minimal DSL, providing in-browser editor and visualizer. The tool offers methods for tree state, stepping, resetting, and getting node details, along with various composite, decorator, leaf nodes, callbacks, guards, and global functions/subtrees. Version history includes updates for node types, callbacks, global functions, and TypeScript conversion.

github

: 82

ruby-openai

Use the OpenAI API with Ruby! 🤖🩵 Stream text with GPT-4, transcribe and translate audio with Whisper, or create images with DALL·E... Hire me | 🎮 Ruby AI Builders Discord | 🐦 Twitter | 🧠 Anthropic Gem | 🚂 Midjourney Gem ## Table of Contents * Ruby OpenAI * Table of Contents * Installation * Bundler * Gem install * Usage * Quickstart * With Config * Custom timeout or base URI * Extra Headers per Client * Logging * Errors * Faraday middleware * Azure * Ollama * Counting Tokens * Models * Examples * Chat * Streaming Chat * Vision * JSON Mode * Functions * Edits * Embeddings * Batches * Files * Finetunes * Assistants * Threads and Messages * Runs * Runs involving function tools * Image Generation * DALL·E 2 * DALL·E 3 * Image Edit * Image Variations * Moderations * Whisper * Translate * Transcribe * Speech * Errors * Development * Release * Contributing * License * Code of Conduct

github

: 3.0k

For similar tasks

skyvern

Skyvern automates browser-based workflows using LLMs and computer vision. It provides a simple API endpoint to fully automate manual workflows, replacing brittle or unreliable automation solutions. Traditional approaches to browser automations required writing custom scripts for websites, often relying on DOM parsing and XPath-based interactions which would break whenever the website layouts changed. Instead of only relying on code-defined XPath interactions, Skyvern adds computer vision and LLMs to the mix to parse items in the viewport in real-time, create a plan for interaction and interact with them. This approach gives us a few advantages: 1. Skyvern can operate on websites it’s never seen before, as it’s able to map visual elements to actions necessary to complete a workflow, without any customized code 2. Skyvern is resistant to website layout changes, as there are no pre-determined XPaths or other selectors our system is looking for while trying to navigate 3. Skyvern leverages LLMs to reason through interactions to ensure we can cover complex situations. Examples include: 1. If you wanted to get an auto insurance quote from Geico, the answer to a common question “Were you eligible to drive at 18?” could be inferred from the driver receiving their license at age 16 2. If you were doing competitor analysis, it’s understanding that an Arnold Palmer 22 oz can at 7/11 is almost definitely the same product as a 23 oz can at Gopuff (even though the sizes are slightly different, which could be a rounding error!) Want to see examples of Skyvern in action? Jump to #real-world-examples-of- skyvern

github

: 12.9k

airbyte-connectors

This repository contains Airbyte connectors used in Faros and Faros Community Edition platforms as well as Airbyte Connector Development Kit (CDK) for JavaScript/TypeScript.

github

: 115

open-parse

Open Parse is a Python library for visually discerning document layouts and chunking them effectively. It is designed to fill the gap in open-source libraries for handling complex documents. Unlike text splitting, which converts a file to raw text and slices it up, Open Parse visually analyzes documents for superior LLM input. It also supports basic markdown for parsing headings, bold, and italics, and has high-precision table support, extracting tables into clean Markdown formats with accuracy that surpasses traditional tools. Open Parse is extensible, allowing users to easily implement their own post-processing steps. It is also intuitive, with great editor support and completion everywhere, making it easy to use and learn.

github

: 2.4k

unstract

Unstract is a no-code platform that enables users to launch APIs and ETL pipelines to structure unstructured documents. With Unstract, users can go beyond co-pilots by enabling machine-to-machine automation. Unstract's Prompt Studio provides a simple, no-code approach to creating prompts for LLMs, vector databases, embedding models, and text extractors. Users can then configure Prompt Studio projects as API deployments or ETL pipelines to automate critical business processes that involve complex documents. Unstract supports a wide range of LLM providers, vector databases, embeddings, text extractors, ETL sources, and ETL destinations, providing users with the flexibility to choose the best tools for their needs.

github

: 5.0k

Dot

Dot is a standalone, open-source application designed for seamless interaction with documents and files using local LLMs and Retrieval Augmented Generation (RAG). It is inspired by solutions like Nvidia's Chat with RTX, providing a user-friendly interface for those without a programming background. Pre-packaged with Mistral 7B, Dot ensures accessibility and simplicity right out of the box. Dot allows you to load multiple documents into an LLM and interact with them in a fully local environment. Supported document types include PDF, DOCX, PPTX, XLSX, and Markdown. Users can also engage with Big Dot for inquiries not directly related to their documents, similar to interacting with ChatGPT. Built with Electron JS, Dot encapsulates a comprehensive Python environment that includes all necessary libraries. The application leverages libraries such as FAISS for creating local vector stores, Langchain, llama.cpp & Huggingface for setting up conversation chains, and additional tools for document management and interaction.

github

: 726

instructor

Instructor is a Python library that makes it a breeze to work with structured outputs from large language models (LLMs). Built on top of Pydantic, it provides a simple, transparent, and user-friendly API to manage validation, retries, and streaming responses. Get ready to supercharge your LLM workflows!

github

: 7.7k

sparrow

github

: 4.5k

Open-DocLLM

Open-DocLLM is an open-source project that addresses data extraction and processing challenges using OCR and LLM technologies. It consists of two main layers: OCR for reading document content and LLM for extracting specific content in a structured manner. The project offers a larger context window size compared to JP Morgan's DocLLM and integrates tools like Tesseract OCR and Mistral for efficient data analysis. Users can run the models on-premises using LLM studio or Ollama, and the project includes a FastAPI app for testing purposes.

github

: 124

For similar jobs

crawlee

Crawlee is a web scraping and browser automation library that helps you build reliable scrapers quickly. Your crawlers will appear human-like and fly under the radar of modern bot protections even with the default configuration. Crawlee gives you the tools to crawl the web for links, scrape data, and store it to disk or cloud while staying configurable to suit your project's needs.

github

: 17.3k

open-parse

github

: 2.4k

sparrow

github

: 4.5k