Best AI tools for< document codebases >

20 - AI tool Sites

Elessar

Elessar is an AI-powered engineering visibility and documentation platform that helps teams ship code faster. It seamlessly integrates with your existing ecosystem, including codebases, communications, and documentation tools. With Elessar, you can generate standardized changelogs per pull request, automatically document in Notion, create a temporary Slack channel per PR with two-way sync, statuses, and AI summaries, search and understand changes right in your editor, and link issues, tasks, and metrics from Linear to Elessar channels and integrations. Elessar also sends daily digests to engineers and managers, synthesizing important updates in one place.

site

: 1.3k

CodeSense AI

CodeSense AI is an innovative tool that revolutionizes code comprehension and collaboration by providing AI-driven code explanation and comment generation. With CodeSense AI, developers can quickly generate descriptive comments for code snippets, receive detailed explanations for given code, and enhance the documentation of codebases. The tool empowers developers by saving time and effort through automatic comment generation, making it easier to maintain and update complex software systems. CodeSense AI is particularly beneficial for onboarding new team members, teaching programming concepts, and improving code readability and understanding.

site

: 0

AI Document Creator

AI Document Creator is an innovative tool that leverages artificial intelligence to assist users in generating various types of documents efficiently. The application utilizes advanced algorithms to analyze input data and create well-structured documents tailored to the user's needs. With AI Document Creator, users can save time and effort in document creation, ensuring accuracy and consistency in their outputs. The tool is user-friendly and accessible, making it suitable for individuals and businesses seeking to streamline their document creation process.

site

: 121.5k

Coral AI

Coral AI is an AI-powered platform that helps users search, summarize, translate, and get citations from documents in over 90 languages. Trusted by researchers and professionals, it simplifies tasks such as summarizing documents, asking questions, translating content, and generating study guides. Users can upload documents, ask questions, and receive answers with page citations, making it a valuable tool for various use cases like books, legal documents, research papers, and more. With features like search without keywords, generating study guides, and simplifying document summaries, Coral AI enhances productivity and saves users time.

site

: 249.6k

Affinda

Affinda is a document AI platform that can read, understand, and extract data from any document type. It combines 10+ years of IP in document reconstruction with the latest advancements in computer vision, natural language processing, and deep learning. Affinda's platform can be used to automate a variety of document processing workflows, including invoice processing, receipt processing, credit note processing, purchase order processing, account statement processing, resume parsing, job description parsing, resume redaction, passport processing, birth certificate processing, and driver's license processing. Affinda's platform is used by some of the world's leading organizations, including Google, Microsoft, Amazon, and IBM.

site

: 73.4k

Ocrolus

Ocrolus is an intelligent document automation software that utilizes AI-driven document processing automation with Human-in-the-Loop. It offers capabilities such as Classify, Capture, Detect, and Analyze to streamline document processing tasks. The application caters to various industries like small business lending, mortgage, consumer, and multifamily, providing solutions for income verification, fraud detection, cash flow analysis, and business process automation. Ocrolus helps users manage risk, avoid fraud, and make faster and more accurate financial decisions by automating document analysis.

site

: 58.1k

Docugami

Docugami is a document engineering platform that uses artificial intelligence to extract, analyze, and automate data from business documents. It is designed to empower business users with immediate impact, without the need for massive investment in machine learning, staff training, or IT development. Docugami's proprietary Business Document Foundation Model is an LLM for Generative AI that can be applied to any type of business document.

site

: 47.0k

Procys

Procys is an AI-powered document processing tool that offers efficient and automated extraction of data from various types of documents, including invoices, receipts, ID cards, and passports. With a self-learning engine and seamless integration with over 260 apps, Procys simplifies data extraction and organization. The tool prioritizes data security, ensuring a secure environment for all information needs. Users can upload documents in PDF, image, or scanned format, process them using advanced OCR technology, and export the processed information in their preferred format. Procys is trusted by many users for its efficiency and accuracy in document processing.

site

: 34.2k

Honeybear.ai

Honeybear.ai is an AI tool designed to simplify document reading tasks. It utilizes advanced algorithms to extract and analyze text from various documents, making it easier for users to access and comprehend information. With Honeybear.ai, users can streamline their document processing workflows and enhance productivity.

site

: 17.0k

Cradl AI

Cradl AI is an AI-powered tool designed to automate document workflows with no-code AI. It enables users to extract data from any document automatically, integrate with no-code tools, and build custom AI models through an easy-to-use interface. The tool empowers automation teams across industries by extracting data from complex document layouts, regardless of language or structure. Cradl AI offers features such as line item extraction, fine-tuning AI models, human-in-the-loop validation, and seamless integration with automation tools. It is trusted by organizations for business-critical document automation, providing enterprise-level features like encrypted transmission, GDPR compliance, secure data handling, and auto-scaling.

site

: 6.6k

Keylight AI

Keylight AI is an AI-powered solution designed to help users efficiently find information within their documents. It offers lightning-fast searches, precision accuracy, a user-friendly interface, customizable prompts, and ensures secure and confidential document handling. Ideal for professionals across various industries, Keylight AI revolutionizes document search by providing quick and efficient navigation. Users can boost their productivity and save time with this innovative tool.

site

: 0

Sharly AI

Sharly AI is a revolutionary tool that utilizes advanced AI technology to transform complex documents and PDFs into easily digestible summaries and facilitate interactive chat-based interactions. It empowers users to engage in natural language conversations with their documents, ask questions, and retrieve specific information effortlessly. Sharly AI's capabilities extend to various domains, including research, legal analysis, project management, and content summarization, offering tailored solutions for professionals in each field. By leveraging the power of AI, Sharly AI streamlines workflows, enhances productivity, and unlocks deeper insights from vast amounts of information.

site

: 509.4k

docAnalyzer.ai

docAnalyzer.ai is an intelligent document analysis tool that allows users to have easy and intelligent conversations with their documents. It is powered by cutting-edge AI research and state-of-the-art embeddings, which ensures superior document analysis and dynamic interactions with PDFs. docAnalyzer.ai is easy to use, privacy-conscious, and secure, and it offers a number of features that make it a valuable tool for anyone who works with documents.

site

: 81.5k

FormX.ai

FormX.ai is an AI-powered data extraction and conversion tool that automates the process of extracting data from physical documents and converting it into digital formats. It supports a wide range of document types, including invoices, receipts, purchase orders, bank statements, contracts, HR forms, shipping orders, loyalty member applications, annual reports, business certificates, personnel licenses, and more. FormX.ai's pre-configured data extraction models and effortless API integration make it easy for businesses to integrate data extraction into their existing systems and workflows. With FormX.ai, businesses can save time and money on manual data entry and improve the accuracy and efficiency of their data processing.

site

: 65.2k

PDF AI

PDF AI is a powerful AI-powered PDF reader that allows you to chat with any PDF document. With PDF AI, you can quickly and easily get a concise summary of long PDF documents, find answers to your questions, and even have complex terms explained to you. PDF AI is perfect for anyone who needs to read and understand large documents, from students to professionals. It's also a great tool for translating documents, extracting content, and collaborating with others.

site

: 52.2k

AlgoDocs

AlgoDocs is a powerful AI Platform developed based on the latest technologies to streamline your processes and free your team from annoying and error-prone manual data entry by offering fast, secure, and accurate document data extraction.

site

: 38.3k

Base64.ai

Base64.ai is an automated document processing API that offers a leading no-code AI solution for understanding documents, photos, and videos. It provides a wide range of AI document processing features and solutions for various industries. With over 400+ integrations, Base64.ai ensures fast, secure, and accurate data extraction, certified for ISO, HIPAA, SOC 2 Type 1 & 2, and GDPR compliance. The platform allows users to add new document types, integrations, and business rules, commanding the AI to meet specific needs. Base64.ai also offers PII redaction, human-in-the-loop verification, and is accessible via API, RPA systems, scanners, web, and mobile apps.

site

: 35.0k

Cradl AI

Cradl AI is a no-code AI-powered document workflow automation tool that helps organizations automate document-related tasks, such as data extraction, processing, and validation. It uses AI to automatically extract data from complex document layouts, regardless of layout or language. Cradl AI also integrates with other no-code tools, making it easy to build and deploy custom AI models.

site

: 10.3k

Translated.BEST

Translated.BEST is an online website that uses artificial intelligence for document translation, making document translation simpler. It supports over 20 file formats, including PDF, DOCX, EXCEL, PPTX, EPUB, and over 50 languages, including English, Chinese, French, Spanish, and Japanese. Translated.BEST also maintains the original document format and supports comparison browsing. Additionally, Translated.BEST offers complimentary translation services for medical documents for children aged 0-14 with chronic illnesses.

site

: 0

PdfPal AI

PdfPal AI is an AI-powered PDF chat tool that allows users to interact with their PDF documents using artificial intelligence. It enables users to have dynamic conversations with their PDFs, gaining insights, summaries, and more. The tool is designed to handle a wide range of document lengths and complexities, from short articles to lengthy research papers. PdfPal AI is secure and easy to use, making it a valuable tool for anyone who works with PDFs.

site

: 18.6k

20 - Open Source AI Tools

code2prompt

code2prompt is a command-line tool that converts your codebase into a single LLM prompt with a source tree, prompt templating, and token counting. It automates generating LLM prompts from codebases of any size, customizing prompt generation with Handlebars templates, respecting .gitignore, filtering and excluding files using glob patterns, displaying token count, including Git diff output, copying prompt to clipboard, saving prompt to an output file, excluding files and folders, adding line numbers to source code blocks, and more. It helps streamline the process of creating LLM prompts for code analysis, generation, and other tasks.

github

: 387

unstructured

The `unstructured` library provides open-source components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more. The use cases of `unstructured` revolve around streamlining and optimizing the data processing workflow for LLMs. `unstructured` modular functions and connectors form a cohesive system that simplifies data ingestion and pre-processing, making it adaptable to different platforms and efficient in transforming unstructured data into structured outputs.

github

: 7.3k

cody

Cody is a free, open-source AI coding assistant that can write and fix code, provide AI-generated autocomplete, and answer your coding questions. Cody fetches relevant code context from across your entire codebase to write better code that uses more of your codebase's APIs, impls, and idioms, with less hallucination.

github

: 2.2k

thepipe

The Pipe is a multimodal-first tool for feeding files and web pages into vision-language models such as GPT-4V. It is best for LLM and RAG applications that require a deep understanding of tricky data sources. The Pipe is available as a hosted API at thepi.pe, or it can be set up locally.

github

: 544

lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron. We're releasing it with the community in the spirit of building in the open. Note that it is still very much early so don't expect 100% stability ^^' In case of problems or question, feel free to open an issue!

github

: 423

awesome-ai-agents

github

: 7.1k

pluto

Pluto is a development tool dedicated to helping developers **build cloud and AI applications more conveniently** , resolving issues such as the challenging deployment of AI applications and open-source models. Developers are able to write applications in familiar programming languages like **Python and TypeScript** , **directly defining and utilizing the cloud resources necessary for the application within their code base** , such as AWS SageMaker, DynamoDB, and more. Pluto automatically deduces the infrastructure resource needs of the app through **static program analysis** and proceeds to create these resources on the specified cloud platform, **simplifying the resources creation and application deployment process**.

github

: 71

azure-openai-llm-vector-langchain

github

: 263

OpenGPTAndBeyond

github

: 102

awesome-ai-agents

github

: 59

awesome-azure-openai-llm

github

: 270

cognita

Cognita is an open-source framework to organize your RAG codebase along with a frontend to play around with different RAG customizations. It provides a simple way to organize your codebase so that it becomes easy to test it locally while also being able to deploy it in a production ready environment. The key issues that arise while productionizing RAG system from a Jupyter Notebook are: 1. **Chunking and Embedding Job** : The chunking and embedding code usually needs to be abstracted out and deployed as a job. Sometimes the job will need to run on a schedule or be trigerred via an event to keep the data updated. 2. **Query Service** : The code that generates the answer from the query needs to be wrapped up in a api server like FastAPI and should be deployed as a service. This service should be able to handle multiple queries at the same time and also autoscale with higher traffic. 3. **LLM / Embedding Model Deployment** : Often times, if we are using open-source models, we load the model in the Jupyter notebook. This will need to be hosted as a separate service in production and model will need to be called as an API. 4. **Vector DB deployment** : Most testing happens on vector DBs in memory or on disk. However, in production, the DBs need to be deployed in a more scalable and reliable way. Cognita makes it really easy to customize and experiment everything about a RAG system and still be able to deploy it in a good way. It also ships with a UI that makes it easier to try out different RAG configurations and see the results in real time. You can use it locally or with/without using any Truefoundry components. However, using Truefoundry components makes it easier to test different models and deploy the system in a scalable way. Cognita allows you to host multiple RAG systems using one app. ### Advantages of using Cognita are: 1. A central reusable repository of parsers, loaders, embedders and retrievers. 2. Ability for non-technical users to play with UI - Upload documents and perform QnA using modules built by the development team. 3. Fully API driven - which allows integration with other systems. > If you use Cognita with Truefoundry AI Gateway, you can get logging, metrics and feedback mechanism for your user queries. ### Features: 1. Support for multiple document retrievers that use `Similarity Search`, `Query Decompostion`, `Document Reranking`, etc 2. Support for SOTA OpenSource embeddings and reranking from `mixedbread-ai` 3. Support for using LLMs using `Ollama` 4. Support for incremental indexing that ingests entire documents in batches (reduces compute burden), keeps track of already indexed documents and prevents re-indexing of those docs.

github

: 2.9k

llm-foundry

LLM Foundry is a codebase for training, finetuning, evaluating, and deploying LLMs for inference with Composer and the MosaicML platform. It is designed to be easy-to-use, efficient _and_ flexible, enabling rapid experimentation with the latest techniques. You'll find in this repo: * `llmfoundry/` - source code for models, datasets, callbacks, utilities, etc. * `scripts/` - scripts to run LLM workloads * `data_prep/` - convert text data from original sources to StreamingDataset format * `train/` - train or finetune HuggingFace and MPT models from 125M - 70B parameters * `train/benchmarking` - profile training throughput and MFU * `inference/` - convert models to HuggingFace or ONNX format, and generate responses * `inference/benchmarking` - profile inference latency and throughput * `eval/` - evaluate LLMs on academic (or custom) in-context-learning tasks * `mcli/` - launch any of these workloads using MCLI and the MosaicML platform * `TUTORIAL.md` - a deeper dive into the repo, example workflows, and FAQs

github

: 3.8k

Devon

Devon is an open-source pair programmer tool designed to facilitate collaborative coding sessions. It provides features such as multi-file editing, codebase exploration, test writing, bug fixing, and architecture exploration. The tool supports Anthropic, OpenAI, and Groq APIs, with plans to add more models in the future. Devon is community-driven, with ongoing development goals including multi-model support, plugin system for tool builders, self-hostable Electron app, and setting SOTA on SWE-bench Lite. Users can contribute to the project by developing core functionality, conducting research on agent performance, providing feedback, and testing the tool.

github

: 2.0k

blockoli

Blockoli is a high-performance tool for code indexing, embedding generation, and semantic search tool for use with LLMs. It is built in Rust and uses the ASTerisk crate for semantic code parsing. Blockoli allows you to efficiently index, store, and search code blocks and their embeddings using vector similarity. Key features include indexing code blocks from a codebase, generating vector embeddings for code blocks using a pre-trained model, storing code blocks and their embeddings in a SQLite database, performing efficient similarity search on code blocks using vector embeddings, providing a REST API for easy integration with other tools and platforms, and being fast and memory-efficient due to its implementation in Rust.

github

: 53

catalyst

Catalyst is a C# Natural Language Processing library designed for speed, inspired by spaCy's design. It provides pre-trained models, support for training word and document embeddings, and flexible entity recognition models. The library is fast, modern, and pure-C#, supporting .NET standard 2.0. It is cross-platform, running on Windows, Linux, macOS, and ARM. Catalyst offers non-destructive tokenization, named entity recognition, part-of-speech tagging, language detection, and efficient binary serialization. It includes pre-built models for language packages and lemmatization. Users can store and load models using streams. Getting started with Catalyst involves installing its NuGet Package and setting the storage to use the online repository. The library supports lazy loading of models from disk or online. Users can take advantage of C# lazy evaluation and native multi-threading support to process documents in parallel. Training a new FastText word2vec embedding model is straightforward, and Catalyst also provides algorithms for fast embedding search and dimensionality reduction.

github

: 694

HybridAGI

HybridAGI is the first Programmable LLM-based Autonomous Agent that lets you program its behavior using a **graph-based prompt programming** approach. This state-of-the-art feature allows the AGI to efficiently use any tool while controlling the long-term behavior of the agent. Become the _first Prompt Programmers in history_ ; be a part of the AI revolution one node at a time! **Disclaimer: We are currently in the process of upgrading the codebase to integrate DSPy**

github

: 194

LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing

LLM-PowerHouse is a comprehensive and curated guide designed to empower developers, researchers, and enthusiasts to harness the true capabilities of Large Language Models (LLMs) and build intelligent applications that push the boundaries of natural language understanding. This GitHub repository provides in-depth articles, codebase mastery, LLM PlayLab, and resources for cost analysis and network visualization. It covers various aspects of LLMs, including NLP, models, training, evaluation metrics, open LLMs, and more. The repository also includes a collection of code examples and tutorials to help users build and deploy LLM-based applications.

github

: 508

Awesome-Colorful-LLM

Awesome-Colorful-LLM is a meticulously assembled anthology of vibrant multimodal research focusing on advancements propelled by large language models (LLMs) in domains such as Vision, Audio, Agent, Robotics, and Fundamental Sciences like Mathematics. The repository contains curated collections of works, datasets, benchmarks, projects, and tools related to LLMs and multimodal learning. It serves as a comprehensive resource for researchers and practitioners interested in exploring the intersection of language models and various modalities for tasks like image understanding, video pretraining, 3D modeling, document understanding, audio analysis, agent learning, robotic applications, and mathematical research.

github

: 98

OpenAdapt

OpenAdapt is an open-source software adapter between Large Multimodal Models (LMMs) and traditional desktop and web Graphical User Interfaces (GUIs). It aims to automate repetitive GUI workflows by leveraging the power of LMMs. OpenAdapt records user input and screenshots, converts them into tokenized format, and generates synthetic input via transformer model completions. It also analyzes recordings to generate task trees and replay synthetic input to complete tasks. OpenAdapt is model agnostic and generates prompts automatically by learning from human demonstration, ensuring that agents are grounded in existing processes and mitigating hallucinations. It works with all types of desktop GUIs, including virtualized and web, and is open source under the MIT license.

github

: 717