swirl-search

AI Search & RAG Without Moving Your Data. Get instant answers from your company's knowledge across 100+ apps while keeping data secure. Deploy in minutes, not months.

Stars: 2718

Visit

Swirl is an open-source software that allows users to simultaneously search multiple content sources and receive AI-ranked results. It connects to various data sources, including databases, public data services, and enterprise sources, and utilizes AI and LLMs to generate insights and answers based on the user's data. Swirl is easy to use, requiring only the download of a YML file, starting in Docker, and searching with Swirl. Users can add credentials to preloaded SearchProviders to access more sources. Swirl also offers integration with ChatGPT as a configured AI model. It adapts and distributes user queries to anything with a search API, re-ranking the unified results using Large Language Models without extracting or indexing anything. Swirl includes five Google Programmable Search Engines (PSEs) to get users up and running quickly. Key features of Swirl include Microsoft 365 integration, SearchProvider configurations, query adaptation, synchronous or asynchronous search federation, optional subscribe feature, pipelining of Processor stages, results stored in SQLite3 or PostgreSQL, built-in Query Transformation support, matching on word stems and handling of stopwords, duplicate detection, re-ranking of unified results using Cosine Vector Similarity, result mixers, page through all results requested, sample data sets, optional spell correction, optional search/result expiration service, easily extensible Connector and Mixer objects, and a welcoming community for collaboration and support.

README:

SWIRL

Give your team ChatGPT-level search without moving data to the cloud

RAG with One Drive & Microsoft 365 in 60 seconds

Ask question → Get answer with sources → Click through to source

Watch it on Youtube

Teams using SWIRL saves an average 7.5 hours of productive time per week.

⚡ Quick Start · 💬 Join Slack · 📚 Docs · 🔌 Connectors · 🤝 Contribute

🤔 Why SWIRL?

Skip the Complexity, Keep the Power

❌ Without SWIRL

Set up vector databases
Move data around
Complex ETL pipelines
Weeks of infrastructure work
Security headaches

✅ With SWIRL

One docker command
Data stays in place
No vector DB needed
2-minute setup
Enterprise-grade security

🚀 Built Different

No Vector DB Drama

# No need for:
$ setup-vector-db
$ migrate-data
$ configure-indexes

# Just this:
$ curl https://raw.githubusercontent.com/swirlai/swirl-search/main/docker-compose.yaml -o docker-compose.yaml

💡 What Can You Build With SWIRL?

Real examples of what teams build with SWIRL:

🔍 Knowledge Base Search

Connect SharePoint, Confluence, & Drive
Get instant answers with source links
Keep sensitive data secure

🤖 Customer Support Assistant

Search across support docs & tickets
Draft responses using your content
Maintain consistent answers

👩‍💻 Developer Assistant

Search GitHub, Jira, & documentation
Find code examples & solutions
Speed up development workflow

🏢 Unified Search

Unified search across all tools
Results respect existing permissions
No data duplication needed

👀 See it in action

Schedule Your Free Demo of SWIRL Enterprise

Try SWIRL Enterprise for free for 30 Days. Click on the banner to contact us.

⚡ Why Teams Choose SWIRL

🔒 Your infrastructure, your control
🚀 Deploy in minutes, not months
🔌 100+ enterprise connectors
🤖 AI that respects your security

SWIRL's Ranking in Action

SWIRL doesn't just search - it understands your company's context. Instead of broad web results, you get precise answers from your private data, right where it lives.

SWIRL Features

Full list of connectors is available here

For Support on Connectors Contact the Swirl Team at: [email protected]

🔥 Try Swirl Now In Docker

Prerequisites

To run Swirl in Docker, you must have the latest Docker app for MacOS, Linux, or Windows installed and running locally. You can also watch the video tutorial to get started.
Windows users must also install and configure either the WSL 2 or the Hyper-V backend, as outlined in the System Requirements for installing Docker Desktop on Windows.

Start Swirl in Docker

Warning Make sure the Docker app is running before proceeding!

Download the YAML file: https://raw.githubusercontent.com/swirlai/swirl-search/main/docker-compose.yaml

curl https://raw.githubusercontent.com/swirlai/swirl-search/main/docker-compose.yaml -o docker-compose.yaml

Optional: To enable Swirl's Real-Time Retrieval Augmented Generation (RAG) in Docker, run the following commands from the Console using a valid OpenAI API key:

export MSAL_CB_PORT=8000
export MSAL_HOST=localhost
export OPENAI_API_KEY=‘<your-OpenAI-API-key>’

🔑 Check out OpenAI's YouTube video if you don't have an OpenAI API Key.

In MacOS or Linux, run the following command from the Console:

docker-compose pull && docker-compose up

In Windows, run the following command from PowerShell:

docker compose up

After a few minutes the following or similar should appear:

Open this URL with a browser: http://localhost:8000 (or http://localhost:8000/galaxy)
If the search page appears, click Log Out at the top, right. The Swirl login page will appear.
Enter the username admin and password password, then click Login.
Enter a search in the search box and press the Search button. Ranked results appear in just a few seconds:

To view the raw JSON, open http://localhost:8000/swirl/search/

The most recent Search object will be displayed at the top. Click on the result_url link to view the full JSON Response.

Notes 📝

Warning The Docker version of Swirl does not retain any data or configuration when shut down!

🔑 Swirl comes configured to search Arxiv, European PMC and Google News right out of the box.

🔑 Using Swirl with Microsoft 365 requires installation and approval by an authorized company Administrator. For more information, please review the M365 Guide or contact us.

Next Steps 👇

Check out the details of our latest release!
Head over to the Quick Start Guide and install Swirl locally!

Video Tutorial

Guide to Run SWIRL in Docker in 60 seconds.

🌟 Key Features

✦	Feature
📌	Microsoft 365 integration and OAUTH2 support
🔍	SearchProvider configurations for all included Connectors. They can be organized with the active, default and tags properties.
✏️	Adaptation of the query for each provider such as rewriting `NOT term` to `-term`, removing NOTted terms from providers that don't support NOT, and passing down the AND, + and OR operators.
⏳	Synchronous or asynchronous search federation via APIs
🛎️	Optional subscribe feature to continuously monitor any search for new results
🛠️	Pipelining of Processor stages for real-time adaptation and transformation of queries, responses and results
🗄️	Results stored in SQLite3 or PostgreSQL for post-processing, consumption and/or analytics
➡️	Built-in Query Transformation support, including re-writing and replacement
📖	Matching on word stems and handling of stopwords via NLTK
🚫	Duplicate detection on field or by configurable Cosine Similarity threshold
🔄	Re-ranking of unified results using Cosine Vector Similarity based on spaCy's large language model and NLTK
🎚️	Result mixers order results by relevancy, date or round-robin (stack) format, with optional filtering of just new items in subscribe mode
📄	Page through all results requested, re-run, re-score and update searches using URLs provided with each result set
📁	Sample data sets for use with SQLite3 and PostgreSQL
✒️	Optional spell correction using TextBlob
⌛	Optional search/result expiration service to limit storage use
🔌	Easily extensible Connector and Mixer objects

👩‍💻 Contributing to Swirl

Do you have a brilliant idea or improvement for SWIRL? We're all ears, and thrilled you're here to help!

🔗 Get Started in 3 Easy Steps:

Connect with Fellow Enthusiasts - Jump into the Swirl Slack Community and share your ideas. You'll find a welcoming group of Swirl enthusiasts and team members eager to assist and collaborate.
Branch It Out - Always branch off from the develop branch with a descriptive name that encapsulates your idea or fix.
Start Your Contribution - Ready to get your hands dirty? Make sure all contributions come through a GitHub pull request. We roughly follow the Gitflow branching model, so all changes destined for the next release should be made to the develop branch.

📚 First time contributing on GitHub? No worries, the GitHub documentation has you covered with a great guide on contributing to projects.

💡 Every contribution, big or small, makes a difference. Join us in shaping the future of Swirl!

☁ Use the Swirl Cloud

For information about Swirl as a managed service, please contact us!

📖 Documentation

🔗 SWIRL Documentation

👷‍♂️ Need Help? We're Here for You

At Swirl, every user matters to us. Whether you're a beginner finding your way or an expert with feedback, we're here to support, listen, and help. Don't hesitate to reach out to us.

Join the SWIRL Community Slack: Dive into our SWIRL Community on Slack - to discuss anything related to SWIRL.
Direct Support: For any questions, suggestions, or even a simple hello, drop us an email at [email protected]. We cherish every message and promise to get back to you promptly!
Request A Connector (Enterprise Support) Want to see a new connector quickly and fast. Contact the Swirl Team at: [email protected]

For Tasks:

Click tags to check more tools for each tasks

find information gather insights generate answers

For Jobs:

researcher analyst data scientist information architect knowledge manager

Alternative AI tools for swirl-search

Similar Open Source Tools

swirl-search

github

: 2.7k

dify

Dify is an open-source LLM app development platform that combines AI workflow, RAG pipeline, agent capabilities, model management, observability features, and more. It allows users to quickly go from prototype to production. Key features include: 1. Workflow: Build and test powerful AI workflows on a visual canvas. 2. Comprehensive model support: Seamless integration with hundreds of proprietary / open-source LLMs from dozens of inference providers and self-hosted solutions. 3. Prompt IDE: Intuitive interface for crafting prompts, comparing model performance, and adding additional features. 4. RAG Pipeline: Extensive RAG capabilities that cover everything from document ingestion to retrieval. 5. Agent capabilities: Define agents based on LLM Function Calling or ReAct, and add pre-built or custom tools. 6. LLMOps: Monitor and analyze application logs and performance over time. 7. Backend-as-a-Service: All of Dify's offerings come with corresponding APIs for easy integration into your own business logic.

github

: 89.5k

rag-time

RAG Time is a 5-week AI learning series focusing on Retrieval-Augmented Generation (RAG) concepts. The repository contains code samples, step-by-step guides, and resources to help users master RAG. It aims to teach foundational and advanced RAG concepts, demonstrate real-world applications, and provide hands-on samples for practical implementation.

github

: 91

kubesphere

KubeSphere is a distributed operating system for cloud-native application management, using Kubernetes as its kernel. It provides a plug-and-play architecture, allowing third-party applications to be seamlessly integrated into its ecosystem. KubeSphere is also a multi-tenant container platform with full-stack automated IT operation and streamlined DevOps workflows. It provides developer-friendly wizard web UI, helping enterprises to build out a more robust and feature-rich platform, which includes most common functionalities needed for enterprise Kubernetes strategy.

github

: 15.1k

OpenContracts

OpenContracts is an Apache-2 licensed enterprise document analytics tool that supports multiple formats, including PDF and txt-based formats. It features multiple document ingestion pipelines with a pluggable architecture for easy format and ingestion engine support. Users can create custom document analytics tools with beautiful result displays, support mass document data extraction with a LlamaIndex wrapper, and manage document collections, layout parsing, automatic vector embeddings, and human annotation. The tool also offers pluggable parsing pipelines, human annotation interface, LlamaIndex integration, data extraction capabilities, and custom data extract pipelines for bulk document querying.

github

: 803

synmetrix

Synmetrix is an open source data engineering platform and semantic layer for centralized metrics management. It provides a complete framework for modeling, integrating, transforming, aggregating, and distributing metrics data at scale. Key features include data modeling and transformations, semantic layer for unified data model, scheduled reports and alerts, versioning, role-based access control, data exploration, caching, and collaboration on metrics modeling. Synmetrix leverages Cube.js to consolidate metrics from various sources and distribute them downstream via a SQL API. Use cases include data democratization, business intelligence and reporting, embedded analytics, and enhancing accuracy in data handling and queries. The tool speeds up data-driven workflows from metrics definition to consumption by combining data engineering best practices with self-service analytics capabilities.

github

: 531

ai-prompts

Instructa AI Prompts is an open-source repository dedicated to collecting and sharing AI prompts, best practices, and curated rules for developers. The goal is to help users quickly set up and refine their workflow with ready-to-use prompts. Users can dynamically include prompts in AI-assisted coding tools like Cursor, GitHub Copilot, Zed, Windsurf, and Cline to adhere to project-specific coding standards, best practices, and automation workflows.

github

: 217

mlcraft

Synmetrix (prev. MLCraft) is an open source data engineering platform and semantic layer for centralized metrics management. It provides a complete framework for modeling, integrating, transforming, aggregating, and distributing metrics data at scale. Key features include data modeling and transformations, semantic layer for unified data model, scheduled reports and alerts, versioning, role-based access control, data exploration, caching, and collaboration on metrics modeling. Synmetrix leverages Cube (Cube.js) for flexible data models that consolidate metrics from various sources, enabling downstream distribution via a SQL API for integration into BI tools, reporting, dashboards, and data science. Use cases include data democratization, business intelligence, embedded analytics, and enhancing accuracy in data handling and queries. The tool speeds up data-driven workflows from metrics definition to consumption by combining data engineering best practices with self-service analytics capabilities.

github

: 480

llm-twin-course

The LLM Twin Course is a free, end-to-end framework for building production-ready LLM systems. It teaches you how to design, train, and deploy a production-ready LLM twin of yourself powered by LLMs, vector DBs, and LLMOps good practices. The course is split into 11 hands-on written lessons and the open-source code you can access on GitHub. You can read everything and try out the code at your own pace.

github

: 3.1k

lm.rs

lm.rs is a tool that allows users to run inference on Language Models locally on the CPU using Rust. It supports LLama3.2 1B and 3B models, with a WebUI also available. The tool provides benchmarks and download links for models and tokenizers, with recommendations for quantization options. Users can convert models from Google/Meta on huggingface using provided scripts. The tool can be compiled with cargo and run with various arguments for model weights, tokenizer, temperature, and more. Additionally, a backend for the WebUI can be compiled and run to connect via the web interface.

github

: 775

piccolo

github

: 58

beeai

BeeAI is an open platform that helps users discover, run, and compose AI agents from any framework and language. It offers a framework-agnostic approach, allowing seamless integration of AI agents regardless of the language or platform. Users can build complex workflows using simple building blocks, explore a catalog of powerful agents with integrated search, and benefit from the BeeAI ecosystem with first-class support for Python and TypeScript agent developers.

github

: 396

second-brain-ai-assistant-course

This open-source course teaches how to build an advanced RAG and LLM system using LLMOps and ML systems best practices. It helps you create an AI assistant that leverages your personal knowledge base to answer questions, summarize documents, and provide insights. The course covers topics such as LLM system architecture, pipeline orchestration, large-scale web crawling, model fine-tuning, and advanced RAG features. It is suitable for ML/AI engineers and data/software engineers & data scientists looking to level up to production AI systems. The course is free, with minimal costs for tools like OpenAI's API and Hugging Face's Dedicated Endpoints. Participants will build two separate Python applications for offline ML pipelines and online inference pipeline.

github

: 539

StratosphereLinuxIPS

Slips is a powerful endpoint behavioral intrusion prevention and detection system that uses machine learning to detect malicious behaviors in network traffic. It can work with network traffic in real-time, PCAP files, and network flows from tools like Suricata, Zeek/Bro, and Argus. Slips threat detection is based on machine learning models, threat intelligence feeds, and expert heuristics. It gathers evidence of malicious behavior and triggers alerts when enough evidence is accumulated. The tool is Python-based and supported on Linux and MacOS, with blocking features only on Linux. Slips relies on Zeek network analysis framework and Redis for interprocess communication. It offers a graphical user interface for easy monitoring and analysis.

github

: 691

anything-llm

AnythingLLM is a full-stack application that enables you to turn any document, resource, or piece of content into context that any LLM can use as references during chatting. This application allows you to pick and choose which LLM or Vector Database you want to use as well as supporting multi-user management and permissions.

github

: 42.1k

superduper

superduper.io is a Python framework that integrates AI models, APIs, and vector search engines directly with existing databases. It allows hosting of models, streaming inference, and scalable model training/fine-tuning. Key features include integration of AI with data infrastructure, inference via change-data-capture, scalable model training, model chaining, simple Python interface, Python-first approach, working with difficult data types, feature storing, and vector search capabilities. The tool enables users to turn their existing databases into centralized repositories for managing AI model inputs and outputs, as well as conducting vector searches without the need for specialized databases.

github

: 5.0k

For similar tasks

vectara-answer

Vectara Answer is a sample app for Vectara-powered Summarized Semantic Search (or question-answering) with advanced configuration options. For examples of what you can build with Vectara Answer, check out Ask News, LegalAid, or any of the other demo applications.

github

: 249

LLocalSearch

LLocalSearch is a completely locally running search aggregator using LLM Agents. The user can ask a question and the system will use a chain of LLMs to find the answer. The user can see the progress of the agents and the final answer. No OpenAI or Google API keys are needed.

github

: 5.3k

llm-answer-engine

This repository contains the code and instructions needed to build a sophisticated answer engine that leverages the capabilities of Groq, Mistral AI's Mixtral, Langchain.JS, Brave Search, Serper API, and OpenAI. Designed to efficiently return sources, answers, images, videos, and follow-up questions based on user queries, this project is an ideal starting point for developers interested in natural language processing and search technologies.

github

: 4.5k

swirl-search

github

: 2.7k

DocsGPT

DocsGPT is an open-source documentation assistant powered by GPT models. It simplifies the process of searching for information in project documentation by allowing developers to ask questions and receive accurate answers. With DocsGPT, users can say goodbye to manual searches and quickly find the information they need. The tool aims to revolutionize project documentation experiences and offers features like live previews, Discord community, guides, and contribution opportunities. It consists of a Flask app, Chrome extension, similarity search index creation script, and a frontend built with Vite and React. Users can quickly get started with DocsGPT by following the provided setup instructions and can contribute to its development by following the guidelines in the CONTRIBUTING.md file. The project follows a Code of Conduct to ensure a harassment-free community environment for all participants. DocsGPT is licensed under MIT and is built with LangChain.

github

: 15.5k

udm14

udm14 is a basic website designed to facilitate easy searches on Google with the &udm=14 parameter, ensuring AI-free results without knowledge panels. The tool simplifies access to these specific search results buried within Google's interface, providing a straightforward solution for users seeking this functionality.

github

: 84

openrecall

OpenRecall is a fully open-source, privacy-first tool that captures your digital history through snapshots, making it searchable for quick access to specific information. It offers transparency, cross-platform support, privacy focus, and hardware compatibility. Features include time travel, local-first AI, semantic search, and full control over storage. The roadmap includes visual search capabilities and audio transcription. Users can easily install and run OpenRecall to enhance memory and productivity without compromising privacy.

github

: 1.5k

Fyin

Fyin is an open-source tool that serves as an alternative to Perplexity AI, allowing users to run it locally for faster answers. It features the ability to run locally using ollama or OpenAI API, a local VectorDB for fast search, quick searching, scraping & answering due to parallelism, configurable number of search results to parse, and local scraping of websites. The tool aims to provide a more efficient and customizable solution for obtaining answers through search and scraping functionalities.

github

: 133

For similar jobs

vectara-answer

github

: 249

smartcat

Smartcat is a CLI interface that brings language models into the Unix ecosystem, allowing power users to leverage the capabilities of LLMs in their daily workflows. It features a minimalist design, seamless integration with terminal and editor workflows, and customizable prompts for specific tasks. Smartcat currently supports OpenAI, Mistral AI, and Anthropic APIs, providing access to a range of language models. With its ability to manipulate file and text streams, integrate with editors, and offer configurable settings, Smartcat empowers users to automate tasks, enhance code quality, and explore creative possibilities.

github

: 77

ragflow

RAGFlow is an open-source Retrieval-Augmented Generation (RAG) engine that combines deep document understanding with Large Language Models (LLMs) to provide accurate question-answering capabilities. It offers a streamlined RAG workflow for businesses of all sizes, enabling them to extract knowledge from unstructured data in various formats, including Word documents, slides, Excel files, images, and more. RAGFlow's key features include deep document understanding, template-based chunking, grounded citations with reduced hallucinations, compatibility with heterogeneous data sources, and an automated and effortless RAG workflow. It supports multiple recall paired with fused re-ranking, configurable LLMs and embedding models, and intuitive APIs for seamless integration with business applications.

github

: 47.9k

Dot

Dot is a standalone, open-source application designed for seamless interaction with documents and files using local LLMs and Retrieval Augmented Generation (RAG). It is inspired by solutions like Nvidia's Chat with RTX, providing a user-friendly interface for those without a programming background. Pre-packaged with Mistral 7B, Dot ensures accessibility and simplicity right out of the box. Dot allows you to load multiple documents into an LLM and interact with them in a fully local environment. Supported document types include PDF, DOCX, PPTX, XLSX, and Markdown. Users can also engage with Big Dot for inquiries not directly related to their documents, similar to interacting with ChatGPT. Built with Electron JS, Dot encapsulates a comprehensive Python environment that includes all necessary libraries. The application leverages libraries such as FAISS for creating local vector stores, Langchain, llama.cpp & Huggingface for setting up conversation chains, and additional tools for document management and interaction.

github

: 726

emerging-trajectories

Emerging Trajectories is an open source library for tracking and saving forecasts of political, economic, and social events. It provides a way to organize and store forecasts, as well as track their accuracy over time. This can be useful for researchers, analysts, and anyone else who wants to keep track of their predictions.

github

: 70

reor

Reor is an AI-powered desktop note-taking app that automatically links related notes, answers questions on your notes, and provides semantic search. Everything is stored locally and you can edit your notes with an Obsidian-like markdown editor. The hypothesis of the project is that AI tools for thought should run models locally by default. Reor stands on the shoulders of the giants Ollama, Transformers.js & LanceDB to enable both LLMs and embedding models to run locally. Connecting to OpenAI or OpenAI-compatible APIs like Oobabooga is also supported.

github

: 7.8k

swirl-search

github

: 2.7k

obsidian-Smart2Brain

Your Smart Second Brain is a free and open-source Obsidian plugin that serves as your personal assistant, powered by large language models like ChatGPT or Llama2. It can directly access and process your notes, eliminating the need for manual prompt editing, and it can operate completely offline, ensuring your data remains private and secure.

github

: 278