Best AI tools for< source document retrieval >

20 - AI tool Sites

Agentive

Agentive is an AI-powered audit automation platform that helps simplify and automate audits. It uses machine learning and large language AI models to extract structured data from audit evidence, match attributes to values, and provide direct access to source documents. Agentive's platform makes auditing easier by eliminating the need for manual procedures and allowing auditors to reuse templates and focus on higher-value tasks.

site

: 9.7k

Quivr

Quivr is an open-source, chat-powered second brain that allows users to build a unified search engine across all their documents, tools, and databases. It is powered by AI and continuously trains on the user's company's unique context to improve search relevance and knowledge discovery. Quivr can be integrated with a variety of tools and applications, and users can choose from a variety of GenAI models to get the best results for their specific tasks.

site

: 69.8k

Shieldbase

Shieldbase is an AI-powered enterprise search tool designed to provide secure and efficient search capabilities for businesses. It utilizes advanced artificial intelligence algorithms to index and retrieve information from various data sources within an organization, ensuring quick and accurate search results. With a focus on security, Shieldbase offers encryption and access control features to protect sensitive data. The platform is user-friendly and customizable, making it easy for businesses to implement and integrate into their existing systems. Shieldbase enhances productivity by enabling employees to quickly find the information they need, ultimately improving decision-making processes and overall operational efficiency.

site

: 3.1k

Casc

Casc is an AI-powered knowledge management tool that helps teams access and share information quickly and easily. It integrates with popular collaboration tools like Slack, Google Drive, and Confluence, allowing users to search and access documents, images, and other content from a central location. Casc also uses natural language processing to understand user queries and deliver precise answers, making it easy for teams to find the information they need without having to spend hours searching through multiple sources.

site

: 0

Simulai

Simulai is an open-source conversational form builder that allows users to create interactive surveys and forms that feel like natural conversations. It is inspired by the simplicity of Notion and is completely free to use. With Simulai, users can easily add logic, choose from a list of templates, and host their forms on their own servers or use Simulai's free cloud hosting services.

site

: 10.5k

Simulai

site

: 0

Meta Llama

Meta Llama is an AI-powered chatbot that helps you write better. It can help you with a variety of writing tasks, including generating text, translating languages, and writing different kinds of creative content.

site

: 1.1m

heißdocs

heißdocs is a tool that adds a superfast search layer on top of scanned or digital PDFs. It also has AI-powered Question-Answering capabilities. With heißdocs, you can easily find information in your PDFs without having to sift through thousands of pages. You can also own your data and host it yourself. heißdocs is open source, so you can view the code, make modifications, or request modifications as you like.

site

: 540

Protocol Pal

Protocol Pal is a free and open-source tool that helps developers build better APIs. It provides a set of libraries and tools that make it easy to design, document, and test APIs. Protocol Pal also includes a community of developers who can help you with your API development projects.

site

: 0

DocsAI

DocsAI is an AI-powered document companion that helps you organize, search, and chat with your documents. It integrates with various sources, including websites, text files, PDFs, Docx, Notion, and Confluence. You can customize the companion's appearance to match your brand and suggest better answers to improve its accuracy. DocsAI also offers a chat widget that can be embedded on any website, allowing you to chat with your documents and get summaries, insights, and leads. It is mobile and tablet-friendly, and you can export chats and analyze data to identify trends and improve customer satisfaction. DocsAI is open source and offers custom prompts and multi-language support.

site

: 3.1k

GeniA

GeniA is an open-source engineering Gen AI team member that can be embedded in your everyday and production environments. It is built with enterprise-grade engineering tools under the most challenging security standards and can be accessed right in your team's Slack channel. GeniA can assist with a variety of tasks, including code generation, debugging, and testing.

site

: 2.2k

Dust

Dust is a customizable and secure AI assistant platform that helps businesses amplify their team's potential. It allows users to deploy the best Large Language Models to their company, connect Dust to their team's data, and empower their teams with assistants tailored to their specific needs. Dust is exceptionally modular and adaptable, tailoring to unique requirements and continuously evolving to meet changing needs. It supports multiple sources of data and models, including proprietary and open-source models from OpenAI, Anthropic, and Mistral. Dust also helps businesses identify their most creative and driven team members and share their experience with AI throughout the company. It promotes collaboration with shared conversations, @mentions in discussions, and Slackbot integration. Dust prioritizes security and data privacy, ensuring that data remains private and that enterprise-grade security measures are in place to manage data access policies.

site

: 82.7k

EchoMark

EchoMark is a cloud-based data leak prevention solution that uses invisible forensic watermarks to protect sensitive information from unauthorized access and exfiltration. It allows organizations to securely share and collaborate on documents and emails without compromising privacy and security. EchoMark's advanced investigation tools can trace the source of a leaked document or email, even if it has been shared via printout or photo.

site

: 3.5k

Monitaur

Monitaur is an AI governance software that provides a comprehensive platform for organizations to manage the entire lifecycle of their AI systems. It brings together data, governance, risk, and compliance teams onto one platform to mitigate AI risk, leverage full potential, and turn intention into action. Monitaur's SaaS products offer user-friendly workflows that document the lifecycle of AI journey on one platform, providing a single source of truth for AI that stays honest.

site

: 13.8k

Slite

Slite is an AI-powered knowledge base application designed to streamline knowledge management for companies. It offers features such as collaborative document creation, AI-driven insights, and instant answers to queries. With a focus on utility and simplicity, Slite aims to provide a single source of truth for company information, freeing up teams from manual work and ensuring accurate and up-to-date knowledge management at scale.

site

: 325.7k

H2O.ai

H2O.ai is a leading AI platform that offers a convergence of predictive and generative AI solutions. It provides end-to-end GenAI platform for various deployments, including air-gapped, on-premises, or cloud VPC. With a focus on democratizing AI, H2O.ai offers a range of AI tools and applications for different industries and use cases, such as financial services, government, health, insurance, manufacturing, marketing, retail, and telecommunications. The platform includes features like h2oGPTe for document and data AI, H2O Driverless AI for automated machine learning, H2O-3 for open-source distributed machine learning, and more. H2O.ai aims to empower organizations to infuse intelligence into their data and processes, enabling them to make informed decisions and drive innovation.

site

: 235.9k

Magick

Magick is a cutting-edge Artificial Intelligence Development Environment (AIDE) that empowers users to rapidly prototype and deploy advanced AI agents and applications without coding. It provides a full-stack solution for building, deploying, maintaining, and scaling AI creations. Magick's open-source, platform-agnostic nature allows for full control and flexibility, making it suitable for users of all skill levels. With its visual node-graph editors, users can code visually and create intuitively. Magick also offers powerful document processing capabilities, enabling effortless embedding and access to complex data. Its real-time and event-driven agents respond to events right in the AIDE, ensuring prompt and efficient handling of tasks. Magick's scalable deployment feature allows agents to handle any number of users, making it suitable for large-scale applications. Additionally, its multi-platform integrations with tools like Discord, Unreal Blueprints, and Google AI provide seamless connectivity and enhanced functionality.

site

: 5.5k

AFFiNE

AFFiNE is an all-in-one KnowledgeOS platform that integrates documents, whiteboards, and databases with AI capabilities. It offers a workspace for writing, drawing, and planning, allowing users to enhance creativity and productivity. The platform is privacy-focused, user-centric, and open-source, catering to individuals, startups, and established organizations. AFFiNE aims to streamline workflows, foster collaboration, and provide a vibrant community space for users to connect and inspire each other.

site

: 293.4k

ChatDOC

ChatDOC is an AI-powered tool that allows users to chat with PDF documents and get instant answers with cited sources. It can summarize long documents, explain complex concepts, and find key information in seconds. ChatDOC is built for professionals and is used by over 500,000 global users.

site

: 463.2k

Petal

Petal is an AI-powered document analysis platform that allows users to link to their own knowledge bases to generate fully sourced and reliable answers. It enables users to train AI on their own documents to support their work. Petal provides a centralized location for all knowledge, ensuring that documents are always synchronized and secure. It offers features such as automatic metadata extraction, file deduplication, and dedicated technical and scientific document support. Users can highlight key points, share comments, and use AI to identify key points and explain complex ideas. Petal is trusted by over 20,000 researchers, faculty, and industry experts and has been listed by MIT as a trusted university resource.

site

: 44.4k

20 - Open Source AI Tools

llmware

LLMWare is a framework for quickly developing LLM-based applications including Retrieval Augmented Generation (RAG) and Multi-Step Orchestration of Agent Workflows. This project provides a comprehensive set of tools that anyone can use - from a beginner to the most sophisticated AI developer - to rapidly build industrial-grade, knowledge-based enterprise LLM applications. Our specific focus is on making it easy to integrate open source small specialized models and connecting enterprise knowledge safely and securely.

github

: 4.2k

LLM4IR-Survey

LLM4IR-Survey is a collection of papers related to large language models for information retrieval, organized according to the survey paper 'Large Language Models for Information Retrieval: A Survey'. It covers various aspects such as query rewriting, retrievers, rerankers, readers, search agents, and more, providing insights into the integration of large language models with information retrieval systems.

github

: 330

prompt-in-context-learning

An Open-Source Engineering Guide for Prompt-in-context-learning from EgoAlpha Lab. 📝 Papers | ⚡️ Playground | 🛠 Prompt Engineering | 🌍 ChatGPT Prompt | ⛳ LLMs Usage Guide > **⭐️ Shining ⭐️:** This is fresh, daily-updated resources for in-context learning and prompt engineering. As Artificial General Intelligence (AGI) is approaching, let’s take action and become a super learner so as to position ourselves at the forefront of this exciting era and strive for personal and professional greatness. The resources include: _🎉Papers🎉_: The latest papers about _In-Context Learning_ , _Prompt Engineering_ , _Agent_ , and _Foundation Models_. _🎉Playground🎉_: Large language models（LLMs）that enable prompt experimentation. _🎉Prompt Engineering🎉_: Prompt techniques for leveraging large language models. _🎉ChatGPT Prompt🎉_: Prompt examples that can be applied in our work and daily lives. _🎉LLMs Usage Guide🎉_: The method for quickly getting started with large language models by using LangChain. In the future, there will likely be two types of people on Earth (perhaps even on Mars, but that's a question for Musk): - Those who enhance their abilities through the use of AIGC; - Those whose jobs are replaced by AI automation. 💎EgoAlpha: Hello! human👤, are you ready?

github

: 1.4k

embedJs

EmbedJs is a NodeJS framework that simplifies RAG application development by efficiently processing unstructured data. It segments data, creates relevant embeddings, and stores them in a vector database for quick retrieval.

github

: 149

vectordb-recipes

This repository contains examples, applications, starter code, & tutorials to help you kickstart your GenAI projects. * These are built using LanceDB, a free, open-source, serverless vectorDB that **requires no setup**. * It **integrates into python data ecosystem** so you can simply start using these in your existing data pipelines in pandas, arrow, pydantic etc. * LanceDB has **native Typescript SDK** using which you can **run vector search** in serverless functions! This repository is divided into 3 sections: - Examples - Get right into the code with minimal introduction, aimed at getting you from an idea to PoC within minutes! - Applications - Ready to use Python and web apps using applied LLMs, VectorDB and GenAI tools - Tutorials - A curated list of tutorials, blogs, Colabs and courses to get you started with GenAI in greater depth.

github

: 487

Awesome-LLM-Long-Context-Modeling

This repository includes papers and blogs about Efficient Transformers, Length Extrapolation, Long Term Memory, Retrieval Augmented Generation(RAG), and Evaluation for Long Context Modeling.

github

: 450

hallucination-leaderboard

This leaderboard evaluates the hallucination rate of various Large Language Models (LLMs) when summarizing documents. It uses a model trained by Vectara to detect hallucinations in LLM outputs. The leaderboard includes models from OpenAI, Anthropic, Google, Microsoft, Amazon, and others. The evaluation is based on 831 documents that were summarized by all the models. The leaderboard shows the hallucination rate, factual consistency rate, answer rate, and average summary length for each model.

github

: 1.0k

awesome-langchain

LangChain is an amazing framework to get LLM projects done in a matter of no time, and the ecosystem is growing fast. Here is an attempt to keep track of the initiatives around LangChain. Subscribe to the newsletter to stay informed about the Awesome LangChain. We send a couple of emails per month about the articles, videos, projects, and tools that grabbed our attention Contributions welcome. Add links through pull requests or create an issue to start a discussion. Please read the contribution guidelines before contributing.

github

: 7.1k

awesome-ai-agents

github

: 7.1k

LLM-and-Law

This repository is dedicated to summarizing papers related to large language models with the field of law. It includes applications of large language models in legal tasks, legal agents, legal problems of large language models, data resources for large language models in law, law LLMs, and evaluation of large language models in the legal domain.

github

: 109

awesome-hallucination-detection

This repository provides a curated list of papers, datasets, and resources related to the detection and mitigation of hallucinations in large language models (LLMs). Hallucinations refer to the generation of factually incorrect or nonsensical text by LLMs, which can be a significant challenge for their use in real-world applications. The resources in this repository aim to help researchers and practitioners better understand and address this issue.

github

: 370

OpenGPTAndBeyond

github

: 102

llm-client

LLMClient is a JavaScript/TypeScript library that simplifies working with large language models (LLMs) by providing an easy-to-use interface for building and composing efficient prompts using prompt signatures. These signatures enable the automatic generation of typed prompts, allowing developers to leverage advanced capabilities like reasoning, function calling, RAG, ReAcT, and Chain of Thought. The library supports various LLMs and vector databases, making it a versatile tool for a wide range of applications.

github

: 540

llm-course

The LLM course is divided into three parts: 1. 🧩 **LLM Fundamentals** covers essential knowledge about mathematics, Python, and neural networks. 2. 🧑‍🔬 **The LLM Scientist** focuses on building the best possible LLMs using the latest techniques. 3. 👷 **The LLM Engineer** focuses on creating LLM-based applications and deploying them. For an interactive version of this course, I created two **LLM assistants** that will answer questions and test your knowledge in a personalized way: * 🤗 **HuggingChat Assistant**: Free version using Mixtral-8x7B. * 🤖 **ChatGPT Assistant**: Requires a premium account. ## 📝 Notebooks A list of notebooks and articles related to large language models. ### Tools | Notebook | Description | Notebook | |----------|-------------|----------| | 🧐 LLM AutoEval | Automatically evaluate your LLMs using RunPod | ![Open In Colab](img/colab.svg) | | 🥱 LazyMergekit | Easily merge models using MergeKit in one click. | ![Open In Colab](img/colab.svg) | | 🦎 LazyAxolotl | Fine-tune models in the cloud using Axolotl in one click. | ![Open In Colab](img/colab.svg) | | ⚡ AutoQuant | Quantize LLMs in GGUF, GPTQ, EXL2, AWQ, and HQQ formats in one click. | ![Open In Colab](img/colab.svg) | | 🌳 Model Family Tree | Visualize the family tree of merged models. | ![Open In Colab](img/colab.svg) | | 🚀 ZeroSpace | Automatically create a Gradio chat interface using a free ZeroGPU. | ![Open In Colab](img/colab.svg) |

github

: 32.7k

DecryptPrompt

This repository does not provide a tool, but rather a collection of resources and strategies for academics in the field of artificial intelligence who are feeling depressed or overwhelmed by the rapid advancements in the field. The resources include articles, blog posts, and other materials that offer advice on how to cope with the challenges of working in a fast-paced and competitive environment.

github

: 2.1k

dify

Dify is an open-source LLM app development platform that combines AI workflow, RAG pipeline, agent capabilities, model management, observability features, and more. It allows users to quickly go from prototype to production. Key features include: 1. Workflow: Build and test powerful AI workflows on a visual canvas. 2. Comprehensive model support: Seamless integration with hundreds of proprietary / open-source LLMs from dozens of inference providers and self-hosted solutions. 3. Prompt IDE: Intuitive interface for crafting prompts, comparing model performance, and adding additional features. 4. RAG Pipeline: Extensive RAG capabilities that cover everything from document ingestion to retrieval. 5. Agent capabilities: Define agents based on LLM Function Calling or ReAct, and add pre-built or custom tools. 6. LLMOps: Monitor and analyze application logs and performance over time. 7. Backend-as-a-Service: All of Dify's offerings come with corresponding APIs for easy integration into your own business logic.

github

: 35.9k

pebblo

Pebblo enables developers to safely load data and promote their Gen AI app to deployment without worrying about the organization’s compliance and security requirements. The project identifies semantic topics and entities found in the loaded data and summarizes them on the UI or a PDF report.

github

: 108

Dot

Dot is a standalone, open-source application designed for seamless interaction with documents and files using local LLMs and Retrieval Augmented Generation (RAG). It is inspired by solutions like Nvidia's Chat with RTX, providing a user-friendly interface for those without a programming background. Pre-packaged with Mistral 7B, Dot ensures accessibility and simplicity right out of the box. Dot allows you to load multiple documents into an LLM and interact with them in a fully local environment. Supported document types include PDF, DOCX, PPTX, XLSX, and Markdown. Users can also engage with Big Dot for inquiries not directly related to their documents, similar to interacting with ChatGPT. Built with Electron JS, Dot encapsulates a comprehensive Python environment that includes all necessary libraries. The application leverages libraries such as FAISS for creating local vector stores, Langchain, llama.cpp & Huggingface for setting up conversation chains, and additional tools for document management and interaction.

github

: 726

ragflow

RAGFlow is an open-source Retrieval-Augmented Generation (RAG) engine that combines deep document understanding with Large Language Models (LLMs) to provide accurate question-answering capabilities. It offers a streamlined RAG workflow for businesses of all sizes, enabling them to extract knowledge from unstructured data in various formats, including Word documents, slides, Excel files, images, and more. RAGFlow's key features include deep document understanding, template-based chunking, grounded citations with reduced hallucinations, compatibility with heterogeneous data sources, and an automated and effortless RAG workflow. It supports multiple recall paired with fused re-ranking, configurable LLMs and embedding models, and intuitive APIs for seamless integration with business applications.

github

: 10.7k

h2ogpt

h2oGPT is an Apache V2 open-source project that allows users to query and summarize documents or chat with local private GPT LLMs. It features a private offline database of any documents (PDFs, Excel, Word, Images, Video Frames, Youtube, Audio, Code, Text, MarkDown, etc.), a persistent database (Chroma, Weaviate, or in-memory FAISS) using accurate embeddings (instructor-large, all-MiniLM-L6-v2, etc.), and efficient use of context using instruct-tuned LLMs (no need for LangChain's few-shot approach). h2oGPT also offers parallel summarization and extraction, reaching an output of 80 tokens per second with the 13B LLaMa2 model, HYDE (Hypothetical Document Embeddings) for enhanced retrieval based upon LLM responses, a variety of models supported (LLaMa2, Mistral, Falcon, Vicuna, WizardLM. With AutoGPTQ, 4-bit/8-bit, LORA, etc.), GPU support from HF and LLaMa.cpp GGML models, and CPU support using HF, LLaMa.cpp, and GPT4ALL models. Additionally, h2oGPT provides Attention Sinks for arbitrarily long generation (LLaMa-2, Mistral, MPT, Pythia, Falcon, etc.), a UI or CLI with streaming of all models, the ability to upload and view documents through the UI (control multiple collaborative or personal collections), Vision Models LLaVa, Claude-3, Gemini-Pro-Vision, GPT-4-Vision, Image Generation Stable Diffusion (sdxl-turbo, sdxl) and PlaygroundAI (playv2), Voice STT using Whisper with streaming audio conversion, Voice TTS using MIT-Licensed Microsoft Speech T5 with multiple voices and Streaming audio conversion, Voice TTS using MPL2-Licensed TTS including Voice Cloning and Streaming audio conversion, AI Assistant Voice Control Mode for hands-free control of h2oGPT chat, Bake-off UI mode against many models at the same time, Easy Download of model artifacts and control over models like LLaMa.cpp through the UI, Authentication in the UI by user/password via Native or Google OAuth, State Preservation in the UI by user/password, Linux, Docker, macOS, and Windows support, Easy Windows Installer for Windows 10 64-bit (CPU/CUDA), Easy macOS Installer for macOS (CPU/M1/M2), Inference Servers support (oLLaMa, HF TGI server, vLLM, Gradio, ExLLaMa, Replicate, OpenAI, Azure OpenAI, Anthropic), OpenAI-compliant, Server Proxy API (h2oGPT acts as drop-in-replacement to OpenAI server), Python client API (to talk to Gradio server), JSON Mode with any model via code block extraction. Also supports MistralAI JSON mode, Claude-3 via function calling with strict Schema, OpenAI via JSON mode, and vLLM via guided_json with strict Schema, Web-Search integration with Chat and Document Q/A, Agents for Search, Document Q/A, Python Code, CSV frames (Experimental, best with OpenAI currently), Evaluate performance using reward models, and Quality maintained with over 1000 unit and integration tests taking over 4 GPU-hours.

github

: 10.9k

20 - OpenAI Gpts

Research GPT

Your go-to source for well-researched information!

gpt

: 1K+

RoadLawsAI

Your go-to source for road laws and legal documents.

gpt

: 30+

FDA Advisor

Approachable expert on FDA medical device regulation. Offering direct download links for related regulation and guidance documents from FDA sites.

gpt

: 400+

Open Source LLM Advisor

Download and Run Open Source LLMs Locally.

gpt

: 50+

Das deutsche Grundgesetz

Constitutional knowledge source

gpt

: 70+

Legal Beaver

Your go-to source for Canadian legal frameworks, now with federal property insights!

gpt

: 70+

Nigerian Legal Expert

Specialised assistant dedicated to providing in-depth knowledge and insights on Nigerian laws and legal matters.

gpt

: 20+

Canada Law

Information on Canadian laws, courts, legal forms, regulations, consultations

gpt

: 1K+

Fill PDF Forms

Fill legal forms & complex PDF documents easily! Upload a file, provide data sources and I'll handle the rest.

gpt

: 300+

APA 7 School

Experto en APA 7ma Edición y redacción de artículos científicos

gpt

: 70+

SOURCE

IT products and pricing, for IT professionals.

gpt

: 40+

Source Evaluation and Fact Checking v1.3

FactCheck Navigator GPT is designed for in-depth fact checking and analysis of written content and evaluation of its source. The approach is to iterate through predefined and well-prompted steps. If desired, the user can refine the process by providing input between these steps.

gpt

: 100+