quick-start-connectors

quick-start-connectors

This open-source repository offers reference code for integrating workplace datastores with Cohere's LLMs, enabling developers and businesses to perform seamless retrieval-augmented generation (RAG) on their own data.

Stars: 132

Visit
 screenshot

Cohere's Build-Your-Own-Connector framework allows integration of Cohere's Command LLM via the Chat API endpoint to any datastore/software holding text information with a search endpoint. Enables user queries grounded in proprietary information. Use-cases include question/answering, knowledge working, comms summary, and research. Repository provides code for popular datastores and a template connector. Requires Python 3.11+ and Poetry. Connectors can be built and deployed using Docker. Environment variables set authorization values. Pre-commits for linting. Connectors tailored to integrate with Cohere's Chat API for creating chatbots. Connectors return documents as JSON objects for Cohere's API to generate answers with citations.

README:

Cohere

Quick Start Connectors

License: MIT


Table of Contents

Overview

Cohere's Build-Your-Own-Connector framework allows you to integrate Cohere's Command LLM via the Chat api endpoint to any datastore/software that holds text information and has a corresponding search endpoint exposed in its API. This allows the Command model to generated responses to user queries that are grounded in proprietary information.

Some examples of the use-cases you can enable with this framework:

  • Generic question/answering around broad internal company docs
  • Knowledge working with specific sub-set of internal knowledge
  • Internal comms summary and search
  • Research using external providers of information, allowing researchers and writers to explore to information from 3rd parties

This open-source repository contains code that will allow you to get started integrating with some of the most popular datastores. There is also an empty template connector which you can expand to use any datasource. Note that different datastores may have different requirements or limitations that need to be addressed in order to to get good quality responses. While some of the quickstart code has been enhanced to address some of these limitations, others only provide the basics of the integration, and you will need to develop them further to fit your specific use-case and the underlying datastore limitations.

Please read more about our connectors framework here: https://docs.cohere.com/docs/connectors

Getting Started

This project requires Python 3.11+ and Poetry at a minimum. Each connector uses poetry to create a virtual environment specific to that connector, and to install all the required dependencies to run a local server.

For production releases, you can optionally build and deploy using Docker. When building a Docker image, you can use the Dockerfile in the root project directory and specify the app build argument. For example:

docker build . -t gdrive:1 --build-arg app=gdrive

Development

For development, refer to a connector's README. Generally, there is an .env file that needs to be created in that subdirectory, based off of a .env-template. The environment variables here most commonly set authorization values such as API keys, credentials, and also modify the way the search for that connector behaves.

After configuring the .env, you will be able to use poetry's CLI to start a local server.

Pre-commits

It is recommended to use the pre-commits defined that will automatically lint your files. You can run a pip install pre-commit

and

pre-commit install within the root folder. Now your prior to committing your files will be automatically linted. Currently, the pre-commit will run black (pinned to 24.1.1).

Integrating With Cohere

All of the connectors in this repository have been tailored to integrate with Cohere's Chat API to make creating a grounded chatbot quick and easy.

Cohere's API requires that connectors return documents as an array of JSON objects. Each document should be an object with string keys and string values containing all the relevant information about the document (e.g. title, url, etc.). For best results the largest text content should be stored in the text key.

For example, a connector that returns documents about company expensing policy might return the following:

[
  {
    "title": "Company Travel Policy",
    "text": "Flights, Hotels and Meals can be expensed using this new tool...",
    "url": "https://drive.google.com/file/d/id1",
    "created_at": "2023-11-25T20:09:31Z"
  },
  {
    "title": "2024 Expenses Policy",
    "text": "The list of recommended hotels are...",
    "url": "https://drive.google.com/file/d/id2",
    "created_at": "2023-12-04T16:52:12Z"
  }
]

Cohere's Chat API will query the connector and use these documents to generated answers with direct citations.

Contributing

Contributions are what drive an open source community, any contributions made are greatly appreciated. For specific. To get started, check out our documentation.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for quick-start-connectors

Similar Open Source Tools

For similar tasks

For similar jobs