langdrive

langdrive

https://addy-ai.github.io/langdrive/ Train 100+ LLMs on private data. LangDrive handles everything from data ingestion to training and deployment with just a single interface ⚡

Stars: 59

Visit
 screenshot

LangDrive is an open-source AI library that simplifies training, deploying, and querying open-source large language models (LLMs) using private data. It supports data ingestion, fine-tuning, and deployment via a command-line interface, YAML file, or API, with a quick, easy setup. Users can build AI applications such as question/answering systems, chatbots, AI agents, and content generators. The library provides features like data connectors for ingestion, fine-tuning of LLMs, deployment to Hugging Face hub, inference querying, data utilities for CRUD operations, and APIs for model access. LangDrive is designed to streamline the process of working with LLMs and making AI development more accessible.

README:

LangDrive

Train, deploy and query open source LLMs using your private data, all from one library.

GitHub Contributors GitHub Last Commit GitHub Repo Size GitHub Issues GitHub Pull Requests Github License


Use casesFeaturesDocsGetting startedContributions


LangDrive is an open-source AI library that simplifies training, deploying, and querying open-source large language models (LLMs) using private data. It supports data ingestion, fine-tuning, and deployment via a command-line interface, YAML file, or API, with a quick, easy setup.

Read the docs for more.


Train Your First LLM

We've replicated one of our training images as a Google Colab Notebook. Here's what it does:

  • Finetune falcon-7b-instruct
  • Creates a Flask Webserver
  • Opens an Ngrok API endpoint so you can call the API

Try it out here


Use cases

LangDrive lets you builds amazing AI apps like:

  • Question/Answering over internal documents
  • Chatbots
  • AI agents
  • Content generation

Features:

  • Data ingestion LangDrive comes with the following built in data connectors to simplify data ingestion:

    • Firebase Firestore
    • Email Ingestion via SMTP
    • Google Drive
    • CSV
    • Website URL
    • (more coming soon, or you can build yours - LangDrive is open source)
  • Fine tuning

    • Fine tune open source LLMs easily by formating your data into input:output completion pairs
  • Deployment

    • Add your Hugging Face access token to deploy your model directly to hugging face hub after fine tuning
  • Inference

    • Query our supported open source models
  • Data Utils

    • LangDrive comes built-in with data utils for CRUD operations for the different data connectors
  • API


Docs

To see full Documentation and examples, go to docs


Getting started

The simplest way to get started with LangDrive is through your CLI. For a more detailed overview on getting started using the YAML config and API, please visit the docs.

Using the CLI

Node developers can train and deploy a model in 2 simple steps.

  1. npm install langdrive
  2. langdrive train --csv ./path/to/csvFileName.csv --hftoken apikey123 --deploy

In this case, LangDrive will retrieve the data, train a model, host it's weights on Hugging Face, and return an inference endpoint you may use to query the LLM.

The command langdrive train is used to train the LLM, please see how to configure the command below.

args:

  • yaml: Path to optional YAML config doc, default Value: './LangDrive.yaml'. This will load up any class and query for records and their values for both inputs and ouputs.
  • csv: Path to training dataCSV*The training data should be a two-column CSV of input and output pairs.
  • hfToken: An API key provided by Hugging Face with write permissions. Get one here.
  • baseModel: The original model to train: This can be one of the models in our supported models shown at the bottom of this page
  • deployToHf: true | false
  • hfModelPath: The full path to your hugging face model repo where the model should be deployed. Format: hugging face username/model

It is assumed you do not want to deploy your model if you run langdrive train. In such a case a link to where you can download the weights will be provided. Adding --deploy will return a link to the inferencing endpoint.

More information on how to ingest simple data using the CLI can be found in the docs.


Contributions

LangDrive is open source and we welcome contributions from the community. To contribute, please make a PR through the "fork and pull request" process.

Join our Discord to keep up to date with the community and roadmap.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for langdrive

Similar Open Source Tools

For similar tasks

For similar jobs