kollektiv

kollektiv

The open source chat powered by LLMs with RAG. Kollektiv makes it easy to sync your custom data sources and get accurate, contextual replies.

Stars: 71

Visit
 screenshot

Kollektiv is a Retrieval-Augmented Generation (RAG) system designed to enable users to chat with their favorite documentation easily. It aims to provide LLMs with access to the most up-to-date knowledge, reducing inaccuracies and improving productivity. The system utilizes intelligent web crawling, advanced document processing, vector search, multi-query expansion, smart re-ranking, AI-powered responses, and dynamic system prompts. The technical stack includes Python/FastAPI for backend, Supabase, ChromaDB, and Redis for storage, OpenAI and Anthropic Claude 3.5 Sonnet for AI/ML, and Chainlit for UI. Kollektiv is licensed under a modified version of the Apache License 2.0, allowing free use for non-commercial purposes.

README:

πŸš€ Kollektiv - LLMs + Up-to-date knowledge

Python 3.12 Ruff CI codecov License Maintenance

🌟 Overview

Kollektiv is a Retrieval-Augmented Generation (RAG) system designed for one purpose - allow you to chat with your favorite docs (of libraries, frameworks, tools primarily) easily.

This project aims to allow LLMs to tap into the most up-to-date knowledge in 2 clicks so that you don't have to worry about incorrect replies, hallucinations or inaccuracies when working with the best LLMs.

❓Why?

This project was born out of a personal itch - whenever a new feature of my favorite library comes up, I know I can't rely on the LLM to help me build with it - because it simply doesn't know about it!

The root cause - LLMs lack access to the most recent documentation or private knowledge, as they are trained on a set of data that was accumulated way back (sometimes more than a year ago).

The impact - hallucinations in answers, inaccurate, incorrect or outdated information, which directly decreases productivity and usefulness of using LLMs

But there is a better way...

What if LLMs could tap into a source of up-to-date information on libraries, tools, frameworks you are building with?

Imagine your LLM could intelligently decide when it needs to check the documentation source and always provide an accurate reply?

🎯 Goal

Meet Kollektiv -> an open-source RAG app that helps you easily:

  • parse the docs of your favorite libraries
  • efficiently stores and embeds them in a local vector storage
  • sets up an LLM chat which you can rely on

Note this is v.0.1.6 and reliability of the system can be characterized as following:

  • in 50% of the times it works every time!

So do let me know if you are experiencing issues and I'll try to fix them.

βš™οΈ Key Features

  • πŸ•·οΈ Intelligent Web Crawling: Utilizes FireCrawl API to efficiently crawl and extract content from specified documentation websites.
  • 🧠 Advanced Document Processing: Implements custom chunking strategies to optimize document storage and retrieval.
  • πŸ” Vector Search: Employs Chroma DB for high-performance similarity search of document chunks.
  • πŸ”„ Multi-Query Expansion: Enhances search accuracy by generating multiple relevant queries for each user input.
  • πŸ“Š Smart Re-ranking: Utilizes Cohere's re-ranking API to improve relevancy of search results
  • πŸ€– AI-Powered Responses: Integrates with Claude 3.5 Sonnet to generate human-like, context-aware responses.
  • 🧠 Dynamic system prompt: Automatically summarizes the embedded documentation to improve RAG decision-making.

πŸ› οΈ Technical Stack

  • Backend: Python/FastAPI
  • Storage:
    • Supabase (auth/data)
    • ChromaDB (vectors)
    • Redis (queues/real-time)
  • AI/ML:
    • OpenAI text-embedding-3-small (embeddings)
    • Anthropic Claude 3.5 Sonnet (chat)
    • Cohere (re-ranking)
  • UI: Chainlit
  • Additional: tiktoken, pydantic, pytest, ruff

πŸš€ Quick Start

  1. Clone the repository:

    git clone https://github.com/alexander-zuev/kollektiv.git
    cd kollektiv
  2. Set up environment variables: Create a .env file in the project root with the following:

    FIRECRAWL_API_KEY="your_firecrawl_api_key"
    OPENAI_API_KEY="your_openai_api_key"
    ANTHROPIC_API_KEY="your_anthropic_api_key"
    COHERE_API_KEY="your_cohere_api_key"
  3. Install dependencies:

    poetry install
  4. Run the application:

    poetry run kollektiv

πŸ’‘ Usage

  1. Start the Application:

    # Run both API and Chainlit UI
    poetry run kollektiv
    
    # Or run only Chainlit UI
    chainlit run main.py
  2. Add Documentation:

    @docs add https://your-docs-url.com

    The system will guide you through:

    • Setting crawling depth
    • Adding exclude patterns (optional)
    • Processing and embedding content
  3. Manage Documents:

    @docs list                  # List all documents
    @docs remove [ID]          # Remove a document
    @help                      # Show all commands
  4. Chat with Documentation: Simply ask questions in natural language. The system will:

    • Search relevant documentation
    • Re-rank results for accuracy
    • Generate contextual responses

β€οΈβ€πŸ©Ή Current Limitations

  • Image content not supported (text-only embeddings)
  • No automatic re-indexing of documentation
  • URL validation limited to common formats
  • Exclude patterns must start with /

πŸ›£οΈ Roadmap

For a brief roadmap please check out project wiki page.

πŸ“ˆ Performance Metrics

Evaluation is currently done using ragas library. There are 2 key parts assessed:

  1. End-to-end generation
    • Faithfulness
    • Answer relevancy
    • Answer correctness
  2. Retriever (TBD)
    • Context recall
    • Context precision

πŸ“œ License

Kollektiv is licensed under a modified version of the Apache License 2.0. While it allows for free use, modification, and distribution for non-commercial purposes, any commercial use requires explicit permission from the copyright owner.

  • For non-commercial use: You are free to use, modify, and distribute this software under the terms of the Apache License 2.0.
  • For commercial use: Please contact [email protected] to obtain a commercial license.

See the LICENSE file for the full license text and additional conditions.

Project Renaming Notice

The project has been renamed from OmniClaude to Kollektiv to:

  • avoid confusion / unintended copyright infringement of Anthropic
  • emphasize the goal to become a tool to enhance collaboration through simplifying access to knowledge
  • overall cool name (isn't it?)

If you have any questions regarding the renaming, feel free to reach out.

πŸ™ Acknowledgements

πŸ“ž Support

For any questions or issues, please open an issue


Built with ❀️ by AZ

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for kollektiv

Similar Open Source Tools

For similar tasks

For similar jobs