sample-apps

Repository of sample applications for https://vespa.ai, the open big data serving engine

Stars: 361

Visit

Vespa is an open-source search and AI engine that provides a unified platform for building and deploying search and AI applications. Vespa sample applications showcase various use cases and features of Vespa, including basic search, recommendation, semantic search, image search, text ranking, e-commerce search, question answering, search-as-you-type, and ML inference serving.

README:

Vespa Sample Applications

The Vespa sample applications are created to run both self-hosted and on Vespa Cloud. You can easily deploy the sample applications to Vespa Cloud without changing the files - just follow the same steps as for logo Managed Vector Search using Vespa Cloud, adding security credentials.

First-time users should go through the getting-started guides first.

Explore the examples for smaller applications helping you getting started with a particular feature, and see operations for operational examples.

Getting Started

logo Album Recommendations is the intro application to Vespa. Learn how to configure the schema for simple recommendation and search use cases.

logo Pyvespa: Hybrid Search - Quickstart and logo Pyvespa: Hybrid Search - Quickstart on Vespa Cloud create a hybrid text search application combining traditional keyword matching with semantic vector search (dense retrieval). They also demonstrate the Vespa native embedder functionality. These are intro-level applications for Python users using more advanced Vespa features. Use logo Pyvespa: Authenticating to Vespa Cloud for Vespa Cloud credentials.

logo Pyvespa: Querying Vespa is a good start for Python users, exploring how to query Vespa using the Vespa Query Language (YQL).

logo Pyvespa: Read and write operations documents ways to feed, get, update, and delete data; Using context manager for efficiently managing resources and feeding streams of data using feed_iter, which can feed from streams, Iterables, Lists, and files by the use of generators.

logo Pyvespa: Application packages is a good intro to the concept of application packages in Vespa. Try logo Pyvespa: Advanced Configuration for Vespa Services configuration.

logo Pyvespa: Examples is a repository of small snippets and examples, e.g., really simple vector distance search applications.

The logo News and Recommendation Tutorial demonstrates basic search functionality and is a great place to start exploring Vespa features. It creates a recommendation system where the approximate nearest neighbor search in a shared user/item embedding space is used to retrieve recommended content for a user. This app also demonstrates using parent-child relationships.

The logo Text Search Tutorial demonstrates traditional text search using BM25/Vespa nativeRank, and is a good start to using the MS Marco dataset.

Vector Search, Hybrid Search and Embeddings

There is a growing interest in AI-powered vector representations of unstructured multimodal data and searching efficiently over these representations. logo Managed Vector Search using Vespa Cloud describes how to unlock the full potential of multimodal AI-powered vector representations using Vespa Cloud.

logo Vespa Multi-Vector Indexing with HNSW and logo Pyvespa: Multi-vector indexing with HNSW demonstrate how to index multiple vectors per document field for semantic search for longer documents. These are more advanced than the Hybrid Search examples in the Getting Started section.

logo Vector Streaming Search uses vector streaming search for naturally partitioned data, see the blog post for details.

logo Multilingual Search with multilingual embeddings demonstrates multilingual semantic search with multilingual text embedding models.

logo Simple hybrid search with SPLADE uses the Vespa splade-embedder for semantic search using sparse vector representations, and is a good intro to SPLADE and sparse learned weights for ranking.

logo Customizing Frozen Data Embeddings in Vespa demonstrates how to adapt frozen embeddings from foundational embedding models - see the blog post. Frozen data embeddings from foundational models is an emerging industry practice for reducing the complexity of maintaining and versioning embeddings. The frozen data embeddings are re-used for various tasks, such as classification, search, or recommendations.

logo Pyvespa: Using Cohere Binary Embeddings in Vespa demonstrates how to use the Cohere binary vectors with Vespa, including a re-ranking phase that uses the float query vector version for improved accuracy.

logo Pyvespa: Billion-scale vector search with Cohere binary embeddings in Vespa uses the Cohere int8 & binary Embeddings with a coarse-to-fine search and re-ranking pipeline; This reduces costs but offers the same retrieval (nDCG) accuracy. The packed binary vector representation is stored in memory, with an optional HNSW index using hamming distance. The int8 vector representation is stored on disk using Vespa’s paged option.

logo Pyvespa: Multilingual Hybrid Search with Cohere binary embeddings and Vespa demonstrates:

Building a multilingual search application over a sample of the German split of Wikipedia using binarized Cohere embeddings.
Indexing multiple binary embeddings per document without having to split the chunks across multiple retrievable units.
Hybrid search, combining the lexical matching capabilities of Vespa with Cohere binary embeddings.
Re-scoring the binarized vectors for improved accuracy.

logo Pyvespa: BGE-M3 - The Mother of all embedding models demonstrates how to use the BGE-M3 embeddings and represent all three embedding representations in Vespa. This code is inspired by the BAAI/bge-m3 README.

logo Pyvespa: Evaluating retrieval with Snowflake arctic embed shows how different rank profiles in Vespa can be set up and evaluated. For the rank profiles that use semantic search, we will use the small version of Snowflake’s arctic embed model series for generating embeddings.

logo Pyvespa: Exploring the potential of OpenAI Matryoshka 🪆 embeddings with Vespa demonstrates the effectiveness of using the recently released (as of January 2024) OpenAI text-embedding-3 embeddings with Vespa. Specifically, we are interested in the Matryoshka Representation Learning technique used in training, which lets us "shorten embeddings (i.e., remove some numbers from the end of the sequence) without the embedding losing its concept-representing properties". This allows us to trade off a small amount of accuracy in exchange for much smaller embedding sizes, so we can store more documents and search them faster.

logo Pyvespa: Using Mixedbread.ai embedding model with support for binary vectors shows how to use the mixedbread-ai/mxbai-embed-large-v1 model with support for binary vectors with Vespa. The notebook example also includes a re-ranking phase that uses the float query vector version for improved accuracy. The re-ranking step makes the model perform at 96.45% of the full float version, with a 32x decrease in storage footprint.

Retrieval Augmented Generation (RAG) and Generative AI

logo Retrieval Augmented Generation (RAG) in Vespa is an end-to-end RAG application where all the steps are run within Vespa. This application focuses on the generation part of RAG, with a simple text search using BM25. This application has three versions of an end-to-end RAG application:

Using an external LLM service to generate the final response.
Using local LLM inference to generate the final response.
Deploying to Vespa Cloud and using GPU-accelerated LLM inference to generate the final response. This includes using Vespa Cloud's Secret Store to save the OpenAI API key.

logo Pyvespa: Building cost-efficient retrieval-augmented personal AI assistants uses streaming mode for cost-efficient retrieval for applications that store and retrieve personal data. This notebook connects a custom LlamaIndex Retriever with a Vespa app using streaming mode to retrieve personal data.

logo Pyvespa: Turbocharge RAG with LangChain and Vespa Streaming Mode for Partitioned Data uses streaming mode to build cost-efficient RAG applications over naturally sharded data - also available as a blog post: Turbocharge RAG with LangChain and Vespa Streaming Mode for Sharded Data. This example features using elementSimilarity in search results to easily inspect each chunk's closeness to the query embedding.

Also try logo Pyvespa: Chat with your pdfs with ColBERT, LangChain, and Vespa - this demonstrates how you can now use ColBERT ranking natively in Vespa, which handles the ColBERT embedding process with no custom code.

Visual Search

logo Pyvespa: Visual PDF RAG with Vespa - ColPali demo application is an end-to-end demo application for visual retrieval of PDF pages, including a frontend web application - try vespa-engine-colpali-vespa-visual-retrieval.hf.space for a live demo. The main goal of the demo is to make it easy to create your own PDF Enterprise Search application using Vespa!

logo Pyvespa: Vespa 🤝 ColPali: Efficient Document Retrieval with Vision Language Models demonstrates how to retrieve PDF pages using the embeddings generated by the ColPali model. ColPali is a powerful Vision Language Model (VLM) that can generate embeddings for images and text. This notebook uses ColPali to generate embeddings for images of PDF pages and store them in Vespa. We also store the base64-encoded image of the PDF page and some metadata like title and url.

logo Pyvespa: Scaling ColPALI (VLM) Retrieval demonstrates how to represent ColPali in Vespa and to scale to large collections. Also see the Scaling ColPali to billions of PDFs with Vespa blog post.

logo Pyvespa: ColPali Ranking Experiments on DocVQA shows how to reproduce the ColPali results on DocVQA with Vespa. The dataset consists of PDF documents with questions and answers. We demonstrate how we can binarize the patch embeddings and replace the float MaxSim scoring with a hamming-based MaxSim without much loss in ranking accuracy but with a significant speedup (close to 4x) and reducing the memory (and storage) requirements by 32x.

logo Pyvespa: PDF-Retrieval using ColQWen2 (ColPali) with Vespa is a continuation of the notebooks related to the ColPali models (above) for complex document retrieval, and demonstrates use of the ColQWen2 model checkpoint.

logo Billion-Scale Image Search demonstrates billion-scale image search using a CLIP model exported in ONNX-format for retrieval. It features separation of compute from storage and query-time vector similarity de-duping. It uses PCA to reduce from 768 to 128 dimensions.

Video Search

logo Text-video search is a notebook that downloads a set of videos, converts from .avi to .mp4, creates CLIP embeddings, feeds to Vespa and lets you query the videos in text using a Streamlit application. It is a good start for creating a video search application using Vespa!

logo Video Search and Retrieval with Vespa and TwelveLabs is a notebook showcasing the use of TwelveLabs state-of-the-art generation and embedding models for video processing. It demonstrates how to generate rich metadata (including summaries and keywords) for videos using TwelveLabs' technology, and how to embed video chunks for efficient retrieval. The notebook processes three sample videos, segments them into chunks, and stores their embeddings along with metadata in Vespa's multi-vector tensors. You can perform hybrid searches to find specific video scenes based on natural language descriptions. This serves as an excellent starting point for implementing advanced video retrieval with Vespa!

Ranking

logo MS Marco Passage Ranking shows how to represent state-of-the-art text ranking using Transformer (BERT) models. It uses the MS Marco passage ranking datasets and features bi-encoders, cross-encoders, and late-interaction models (ColBERT):

Simple single-stage sparse retrieval accelerated by the WAND dynamic pruning algorithm with BM25 ranking.
Dense (vector) search retrieval for efficient candidate retrieval using Vespa's support for approximate nearest neighbor search.
Re-ranking using the Late contextual interaction over BERT (ColBERT) model.
Re-ranking using a cross-encoder with cross attention between the query and document terms.
Multiphase retrieval and ranking combining efficient retrieval (WAND or ANN) with re-ranking stages.
Using Vespa embedder functionality.
Hybrid ranking.

With Vespa’s phased ranking capabilities, doing cross-encoder inference for a subset of documents at a later stage in the ranking pipeline can be a good trade-off between ranking performance and latency. logo Pyvespa: Using Mixedbread.ai cross-encoder for reranking in Vespa.ai shows how to use the Mixedbread.ai cross-encoder for global-phase reranking in Vespa.

logo Pyvespa: Standalone ColBERT with Vespa for end-to-end retrieval and ranking illustrates using the colbert-ai package to produce token vectors, instead of using the native Vespa ColBERT embedder. The guide illustrates how to feed and query using a single passage representation:

Compress token vectors using binarization compatible with Vespa's unpack_bits used in ranking. This implements the binarization of token-level vectors using numpy.
Use Vespa hex feed format for binary vectors.
Query examples.

As a bonus, this also demonstrates how to use ColBERT end-to-end with Vespa for both retrieval and ranking. The retrieval step searches the binary token-level representations using hamming distance. This uses 32 nearestNeighbor operators in the same query, each finding 100 nearest hits in hamming space. Then, the results are re-ranked using the full-blown MaxSim calculation.

ColBERT token-level embeddings:

Simple hybrid search with ColBERT uses a single vector embedding model for retrieval and ColBERT (multi-token vector representation) for re-ranking. This semantic search application demonstrates the colbert-embedder and the tensor expressions for ColBERT MaxSim. It also features reciprocal rank fusion to fuse different rankings.
Long-Context ColBERT demonstrates Long-Context ColBERT (multi-token vector representation) with extended context windows for long-document retrieval, as announced in Vespa Long-Context ColBERT. The app demonstrates the colbert-embedder and the tensor expressions for performing two types of extended ColBERT late-interaction for long-context retrieval. This app uses trec-eval for evaluation using nDCG.
Pyvespa: Standalone ColBERT + Vespa for long-context ranking is a guide on how to use the ColBERT package to produce token-level vectors, as an alternative to using the native Vespa ColBERT embedder. It illustrates how to feed multiple passages per Vespa document (long-context):
- Compress token vectors using binarization that is compatible with Vespa's unpack_bits.
- Use Vespa hex feed format for binary vectors with mixed vespa tensors.
- How to query Vespa with the ColBERT query tensor representation.

logo Pyvespa: LightGBM: Training the model with Vespa features deploys and uses a LightGBM model in a Vespa application. The tutorial runs through how to:

Train a LightGBM classification model with variable names supported by Vespa.
Create Vespa application package files and export them to an application folder.
Export the trained LightGBM model to the Vespa application folder.
Deploy the Vespa application using the application folder.
Feed data to the Vespa application.
Assert that the LightGBM predictions from the deployed model are correct.

logo Pyvespa: LightGBM: Mapping model features to Vespa features shows how to deploy a LightGBM model with feature names that do not match Vespa feature names. In addition to the steps in the app above, this tutorial:

Trains a LightGBM classification model with generic feature names that will not be available in the Vespa application.
Creates an application package and includes a mapping from Vespa feature names to LightGBM model feature names.

Performance

logo Pyvespa: Feeding performance intends to shine some light on the different modes of feeding documents to Vespa, looking at 4 different methods:

Using VespaSync
Using VespaAsync
Using feed_iterable()
Using Vespa CLI

Use logo Feeding to Vespa Cloud to test feeding using Vespa Cloud.

E-Commerce

The logo e-commerce application is an end-to-end shopping engine, using the Amazon product data set. This use case bundles a front-end application. It demonstrates building next-generation E-commerce Search using Vespa, and is a good intro to using the Vespa Cloud CI/CD tests.

Data in e-commerce applications is structured, so Gradient Boosted Decision Trees (GBDT) models are popular in this domain. Try logo Vespa Product Ranking for using learning-to-rank (LTR) techniques (using XGBoost and LightGBM) for improving product search ranking.

In Vespa, faceting (the attribute filtering) is called grouping. logo Grouping Results is a quick intro to implementing faceting/grouping in Vespa.

Recommendations are integral to e-commerce applications. The logo recommendation tutorial is a good starting point.

Finally, search as you type and query suggestions lets users quickly create good queries.

Other sample applications and demos

Search as you type and query suggestions

logo Incremental Search shows search-as-you-type functionality, where for each keystroke of the user, it retrieves matching documents. It also demonstrates search suggestions (query auto-completion).

Vespa as ML inference server

logo Stateless model evaluation demonstrates using Vespa as a stateless ML model inference server where Vespa takes care of distributing ML models to multiple serving containers, offering horizontal scaling and safe deployment. It features model versioning and a feature processing pipeline, as well as using custom code in Searchers, Document Processors and Request Handlers.

Vespa Documentation Search

logo Vespa Documentation Search is the search application that powers search.vespa.ai - refer to this for GitHub Actions automation. This sample app is a good start for automated deployments, as it has system, staging and production test examples. It uses the Document API both for regular PUT operations but also for UPDATE with create-if-nonexistent. It also has Vespa Components for custom code.

CORD-19 Search

logo cord19.vespa.ai is a full-featured application, based on the Covid-19 Open Research Dataset:

cord-19: frontend
cord-19-search: search backend

This application uses embeddings to implement "similar documents" search.

Note: Applications with pom.xml are Java/Maven projects and must be built before deployment. Refer to the Developer Guide for more information.

Contribute to the Vespa sample applications.

For Tasks:

Click tags to check more tools for each tasks

search documents recommend products answer questions rank search results serve ml models

For Jobs:

search engineer ai engineer data scientist software developer devops engineer

Alternative AI tools for sample-apps

Similar Open Source Tools

sample-apps

github

: 361

aihwkit

The IBM Analog Hardware Acceleration Kit is an open-source Python toolkit for exploring and using the capabilities of in-memory computing devices in the context of artificial intelligence. It consists of two main components: Pytorch integration and Analog devices simulator. The Pytorch integration provides a series of primitives and features that allow using the toolkit within PyTorch, including analog neural network modules, analog training using torch training workflow, and analog inference using torch inference workflow. The Analog devices simulator is a high-performant (CUDA-capable) C++ simulator that allows for simulating a wide range of analog devices and crossbar configurations by using abstract functional models of material characteristics with adjustable parameters. Along with the two main components, the toolkit includes other functionalities such as a library of device presets, a module for executing high-level use cases, a utility to automatically convert a downloaded model to its equivalent Analog model, and integration with the AIHW Composer platform. The toolkit is currently in beta and under active development, and users are advised to be mindful of potential issues and keep an eye for improvements, new features, and bug fixes in upcoming versions.

github

: 335

Nanoflow

NanoFlow is a throughput-oriented high-performance serving framework for Large Language Models (LLMs) that consistently delivers superior throughput compared to other frameworks by utilizing key techniques such as intra-device parallelism, asynchronous CPU scheduling, and SSD offloading. The framework proposes nano-batching to schedule compute-, memory-, and network-bound operations for simultaneous execution, leading to increased resource utilization. NanoFlow also adopts an asynchronous control flow to optimize CPU overhead and eagerly offloads KV-Cache to SSDs for multi-round conversations. The open-source codebase integrates state-of-the-art kernel libraries and provides necessary scripts for environment setup and experiment reproduction.

github

: 483

TensorRT-Model-Optimizer

The NVIDIA TensorRT Model Optimizer is a library designed to quantize and compress deep learning models for optimized inference on GPUs. It offers state-of-the-art model optimization techniques including quantization and sparsity to reduce inference costs for generative AI models. Users can easily stack different optimization techniques to produce quantized checkpoints from torch or ONNX models. The quantized checkpoints are ready for deployment in inference frameworks like TensorRT-LLM or TensorRT, with planned integrations for NVIDIA NeMo and Megatron-LM. The tool also supports 8-bit quantization with Stable Diffusion for enterprise users on NVIDIA NIM. Model Optimizer is available for free on NVIDIA PyPI, and this repository serves as a platform for sharing examples, GPU-optimized recipes, and collecting community feedback.

github

: 438

awesome-transformer-nlp

This repository contains a hand-curated list of great machine (deep) learning resources for Natural Language Processing (NLP) with a focus on Generative Pre-trained Transformer (GPT), Bidirectional Encoder Representations from Transformers (BERT), attention mechanism, Transformer architectures/networks, Chatbot, and transfer learning in NLP.

github

: 1.1k

NeMo

NeMo Framework is a generative AI framework built for researchers and pytorch developers working on large language models (LLMs), multimodal models (MM), automatic speech recognition (ASR), and text-to-speech synthesis (TTS). The primary objective of NeMo is to provide a scalable framework for researchers and developers from industry and academia to more easily implement and design new generative AI models by being able to leverage existing code and pretrained models.

github

: 13.5k

aphrodite-engine

Aphrodite is the official backend engine for PygmalionAI, serving as the inference endpoint for the website. It allows serving Hugging Face-compatible models with fast speeds. Features include continuous batching, efficient K/V management, optimized CUDA kernels, quantization support, distributed inference, and 8-bit KV Cache. The engine requires Linux OS and Python 3.8 to 3.12, with CUDA >= 11 for build requirements. It supports various GPUs, CPUs, TPUs, and Inferentia. Users can limit GPU memory utilization and access full commands via CLI.

github

: 1.2k

aphrodite-engine

Aphrodite is an inference engine optimized for serving HuggingFace-compatible models at scale. It leverages vLLM's Paged Attention technology to deliver high-performance model inference for multiple concurrent users. The engine supports continuous batching, efficient key/value management, optimized CUDA kernels, quantization support, distributed inference, and modern samplers. It can be easily installed and launched, with Docker support for deployment. Aphrodite requires Linux or Windows OS, Python 3.8 to 3.12, and CUDA >= 11. It is designed to utilize 90% of GPU VRAM but offers options to limit memory usage. Contributors are welcome to enhance the engine.

github

: 1.4k

llm-search

pyLLMSearch is an advanced RAG system that offers a convenient question-answering system with a simple YAML-based configuration. It enables interaction with multiple collections of local documents, with improvements in document parsing, hybrid search, chat history, deep linking, re-ranking, customizable embeddings, and more. The package is designed to work with custom Large Language Models (LLMs) from OpenAI or installed locally. It supports various document formats, incremental embedding updates, dense and sparse embeddings, multiple embedding models, 'Retrieve and Re-rank' strategy, HyDE (Hypothetical Document Embeddings), multi-querying, chat history, and interaction with embedded documents using different models. It also offers simple CLI and web interfaces, deep linking, offline response saving, and an experimental API.

github

: 538

R1-Searcher

R1-searcher is a tool designed to incentivize the search capability in large reasoning models (LRMs) via reinforcement learning. It enables LRMs to invoke web search and obtain external information during the reasoning process by utilizing a two-stage outcome-supervision reinforcement learning approach. The tool does not require instruction fine-tuning for cold start and is compatible with existing Base LLMs or Chat LLMs. It includes training code, inference code, model checkpoints, and a detailed technical report.

github

: 368

postgres-new

Postgres.new is an in-browser Postgres sandbox with AI assistance that allows users to spin up unlimited Postgres databases directly in the browser. Each database comes with a large language model (LLM) enabling features like drag-and-drop CSV import, report generation, chart creation, and database diagram building. The tool utilizes PGlite, a WASM version of Postgres, to run databases in the browser and store data in IndexedDB for persistence. The monorepo includes a frontend built with Next.js and a backend serving S3-backed PGlite databases over the PG wire protocol using pg-gateway.

github

: 2.2k

llvm-aie

This repository extends the LLVM framework to generate code for use with AMD/Xilinx AI Engine processors. AI Engine processors are in-order, exposed-pipeline VLIW processors focused on application acceleration for AI, Machine Learning, and DSP applications. The repository adds LLVM support for specific features like non-power of 2 pointers, operand latencies, resource conflicts, negative operand latencies, slot assignment, relocations, code alignment restrictions, and register allocation. It includes support for Clang, LLD, binutils, Compiler-RT, and LLVM-LIBC.

github

: 130

asreview

The ASReview project implements active learning for systematic reviews, utilizing AI-aided pipelines to assist in finding relevant texts for search tasks. It accelerates the screening of textual data with minimal human input, saving time and increasing output quality. The software offers three modes: Oracle for interactive screening, Exploration for teaching purposes, and Simulation for evaluating active learning models. ASReview LAB is designed to support decision-making in any discipline or industry by improving efficiency and transparency in screening large amounts of textual data.

github

: 709

agentUniverse

agentUniverse is a multi-agent framework based on large language models, providing flexible capabilities for building individual agents. It focuses on collaborative pattern components to solve problems in various fields and integrates domain experience. The framework supports LLM model integration and offers various pattern components like PEER and DOE. Users can easily configure models and set up agents for tasks. agentUniverse aims to assist developers and enterprises in constructing domain-expert-level intelligent agents for seamless collaboration.

github

: 1.3k

Synthetic-Voice-Detection-Vocoder-Artifacts

The Synthetic-Voice-Detection-Vocoder-Artifacts repository provides the LibriSeVoc dataset containing self-vocoding samples created with six state-of-the-art vocoders to expose and exploit vocoder artifacts. It also introduces a new approach for detecting synthetic human voices by identifying signal artifacts left by neural vocoders and enhancing the RawNet2 baseline. The repository includes a paper and dataset for further reference and offers instructions for training the model and testing it in the wild.

github

: 63

HEC-Commander

HEC-Commander Tools is a suite of python notebooks developed with AI assistance for water resource engineering workflows, focused on providing automation for HEC-RAS and HEC-HMS through Jupyter Notebooks. It contains automation scripts for HEC-HMS and HEC-RAS, tools for plotting results, and miscellaneous scripts for workflow assistance. The repository also includes blog posts, ChatGPT assistants, and presentations related to H&H modeling and the use of LLM's for water resources workflows.

github

: 82

For similar tasks

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

onnxruntime-genai

ONNX Runtime Generative AI is a library that provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management. Users can call a high level `generate()` method, or run each iteration of the model in a loop. It supports greedy/beam search and TopP, TopK sampling to generate token sequences, has built in logits processing like repetition penalties, and allows for easy custom scoring.

github

: 442

jupyter-ai

Jupyter AI connects generative AI with Jupyter notebooks. It provides a user-friendly and powerful way to explore generative AI models in notebooks and improve your productivity in JupyterLab and the Jupyter Notebook. Specifically, Jupyter AI offers: * An `%%ai` magic that turns the Jupyter notebook into a reproducible generative AI playground. This works anywhere the IPython kernel runs (JupyterLab, Jupyter Notebook, Google Colab, Kaggle, VSCode, etc.). * A native chat UI in JupyterLab that enables you to work with generative AI as a conversational assistant. * Support for a wide range of generative model providers, including AI21, Anthropic, AWS, Cohere, Gemini, Hugging Face, NVIDIA, and OpenAI. * Local model support through GPT4All, enabling use of generative AI models on consumer grade machines with ease and privacy.

github

: 3.5k

khoj

Khoj is an open-source, personal AI assistant that extends your capabilities by creating always-available AI agents. You can share your notes and documents to extend your digital brain, and your AI agents have access to the internet, allowing you to incorporate real-time information. Khoj is accessible on Desktop, Emacs, Obsidian, Web, and Whatsapp, and you can share PDF, markdown, org-mode, notion files, and GitHub repositories. You'll get fast, accurate semantic search on top of your docs, and your agents can create deeply personal images and understand your speech. Khoj is self-hostable and always will be.

github

: 28.5k

langchain_dart

LangChain.dart is a Dart port of the popular LangChain Python framework created by Harrison Chase. LangChain provides a set of ready-to-use components for working with language models and a standard interface for chaining them together to formulate more advanced use cases (e.g. chatbots, Q&A with RAG, agents, summarization, extraction, etc.). The components can be grouped into a few core modules: * **Model I/O:** LangChain offers a unified API for interacting with various LLM providers (e.g. OpenAI, Google, Mistral, Ollama, etc.), allowing developers to switch between them with ease. Additionally, it provides tools for managing model inputs (prompt templates and example selectors) and parsing the resulting model outputs (output parsers). * **Retrieval:** assists in loading user data (via document loaders), transforming it (with text splitters), extracting its meaning (using embedding models), storing (in vector stores) and retrieving it (through retrievers) so that it can be used to ground the model's responses (i.e. Retrieval-Augmented Generation or RAG). * **Agents:** "bots" that leverage LLMs to make informed decisions about which available tools (such as web search, calculators, database lookup, etc.) to use to accomplish the designated task. The different components can be composed together using the LangChain Expression Language (LCEL).

github

: 497

danswer

Danswer is an open-source Gen-AI Chat and Unified Search tool that connects to your company's docs, apps, and people. It provides a Chat interface and plugs into any LLM of your choice. Danswer can be deployed anywhere and for any scale - on a laptop, on-premise, or to cloud. Since you own the deployment, your user data and chats are fully in your own control. Danswer is MIT licensed and designed to be modular and easily extensible. The system also comes fully ready for production usage with user authentication, role management (admin/basic users), chat persistence, and a UI for configuring Personas (AI Assistants) and their Prompts. Danswer also serves as a Unified Search across all common workplace tools such as Slack, Google Drive, Confluence, etc. By combining LLMs and team specific knowledge, Danswer becomes a subject matter expert for the team. Imagine ChatGPT if it had access to your team's unique knowledge! It enables questions such as "A customer wants feature X, is this already supported?" or "Where's the pull request for feature Y?"

github

: 10.5k

infinity

Infinity is an AI-native database designed for LLM applications, providing incredibly fast full-text and vector search capabilities. It supports a wide range of data types, including vectors, full-text, and structured data, and offers a fused search feature that combines multiple embeddings and full text. Infinity is easy to use, with an intuitive Python API and a single-binary architecture that simplifies deployment. It achieves high performance, with 0.1 milliseconds query latency on million-scale vector datasets and up to 15K QPS.

github

: 3.3k

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k