Best AI tools for< Nearest-neighbors Search >
0 - AI tool Sites
20 - Open Source AI Tools
data:image/s3,"s3://crabby-images/91385/91385f0e68039f5cce45a86e24175adcceb1dd07" alt="model2vec Screenshot"
model2vec
Model2Vec is a technique to turn any sentence transformer into a really small static model, reducing model size by 15x and making the models up to 500x faster, with a small drop in performance. It outperforms other static embedding models like GLoVe and BPEmb, is lightweight with only `numpy` as a major dependency, offers fast inference, dataset-free distillation, and is integrated into Sentence Transformers, txtai, and Chonkie. Model2Vec creates powerful models by passing a vocabulary through a sentence transformer model, reducing dimensionality using PCA, and weighting embeddings using zipf weighting. Users can distill their own models or use pre-trained models from the HuggingFace hub. Evaluation can be done using the provided evaluation package. Model2Vec is licensed under MIT.
data:image/s3,"s3://crabby-images/9f50d/9f50d2e7097cc65ee3a0964bffaff81884417683" alt="LLMUnity Screenshot"
LLMUnity
LLM for Unity enables seamless integration of Large Language Models (LLMs) within the Unity engine, allowing users to create intelligent characters for immersive player interactions. The tool supports major LLM models, runs locally without internet access, offers fast inference on CPU and GPU, and is easy to set up with a single line of code. It is free for both personal and commercial use, tested on Unity 2021 LTS, 2022 LTS, and 2023. Users can build multiple AI characters efficiently, use remote servers for processing, and customize model settings for text generation.
data:image/s3,"s3://crabby-images/79848/79848807242914f8138a611776f75dac33cab2bf" alt="cuvs Screenshot"
cuvs
cuVS is a library that contains state-of-the-art implementations of several algorithms for running approximate nearest neighbors and clustering on the GPU. It can be used directly or through the various databases and other libraries that have integrated it. The primary goal of cuVS is to simplify the use of GPUs for vector similarity search and clustering.
data:image/s3,"s3://crabby-images/616ac/616ac4d5ea2bcc704a395a599191c74d74d986f6" alt="vicinity Screenshot"
vicinity
Vicinity is a lightweight, low-dependency vector store that provides a unified interface for nearest neighbor search with support for different backends and evaluation. It simplifies the process of comparing and evaluating different nearest neighbors packages by offering a simple and intuitive API. Users can easily experiment with various indexing methods and distance metrics to choose the best one for their use case. Vicinity also allows for measuring performance metrics like queries per second and recall.
data:image/s3,"s3://crabby-images/20ef9/20ef97c25254d9fe67eecc0e3481cc22f24eb63a" alt="vector-search-class-notes Screenshot"
vector-search-class-notes
The 'vector-search-class-notes' repository contains class materials for a course on Long Term Memory in AI, focusing on vector search and databases. The course covers theoretical foundations and practical implementation of vector search applications, algorithms, and systems. It explores the intersection of Artificial Intelligence and Database Management Systems, with topics including text embeddings, image embeddings, low dimensional vector search, dimensionality reduction, approximate nearest neighbor search, clustering, quantization, and graph-based indexes. The repository also includes information on the course syllabus, project details, selected literature, and contributions from industry experts in the field.
data:image/s3,"s3://crabby-images/ef790/ef79094ba5c7c1fc7026b3de63876dbf8fa69058" alt="lance Screenshot"
lance
Lance is a modern columnar data format optimized for ML workflows and datasets. It offers high-performance random access, vector search, zero-copy automatic versioning, and ecosystem integrations with Apache Arrow, Pandas, Polars, and DuckDB. Lance is designed to address the challenges of the ML development cycle, providing a unified data format for collection, exploration, analytics, feature engineering, training, evaluation, deployment, and monitoring. It aims to reduce data silos and streamline the ML development process.
data:image/s3,"s3://crabby-images/04aa4/04aa4793ed52c1584e6d61fea26df29f5740979e" alt="myscaledb Screenshot"
myscaledb
MyScaleDB is a SQL vector database designed for scalable AI applications, enabling developers to efficiently manage and process massive volumes of data using familiar SQL. It offers fast and efficient vector search, filtered search, and SQL-vector join queries. MyScaleDB is fully SQL-compatible and production-ready for AI applications, providing unmatched performance and scalability through cutting-edge OLAP architecture and advanced vector algorithms. Built on top of ClickHouse, it combines structured and vectorized data management for high accuracy and speed in filtered searches.
data:image/s3,"s3://crabby-images/ae143/ae1434506a5388173743add31705dcb673b40425" alt="Detection-and-Classification-of-Alzheimers-Disease Screenshot"
Detection-and-Classification-of-Alzheimers-Disease
This tool is designed to detect and classify Alzheimer's Disease using Deep Learning and Machine Learning algorithms on an early basis, which is further optimized using the Crow Search Algorithm (CSA). Alzheimer's is a fatal disease, and early detection is crucial for patients to predetermine their condition and prevent its progression. By analyzing MRI scanned images using Artificial Intelligence technology, this tool can classify patients who may or may not develop AD in the future. The CSA algorithm, combined with ML algorithms, has proven to be the most effective approach for this purpose.
data:image/s3,"s3://crabby-images/037e9/037e9f5b8909f61465afa34b956028ac516c7b9c" alt="MyScaleDB Screenshot"
MyScaleDB
MyScaleDB is a SQL vector database optimized for AI applications, enabling developers to manage and process massive volumes of data efficiently. It offers fast and powerful vector search, filtered search, and SQL-vector join queries, making it fully SQL-compatible. MyScaleDB provides unmatched performance and scalability by leveraging cutting-edge OLAP database architecture and advanced vector algorithms. It is production-ready for AI applications, supporting structured data, text, vector, JSON, geospatial, and time-series data. MyScale Cloud offers fully-managed MyScaleDB with premium features on billion-scale data, making it cost-effective and simpler to use compared to specialized vector databases. Built on top of ClickHouse, MyScaleDB combines structured and vector search efficiently, ensuring high accuracy and performance in filtered search operations.
data:image/s3,"s3://crabby-images/90877/9087770dfead3e43bf636700a52e21c7f0b13f72" alt="treds Screenshot"
treds
Treds is a Radix Trie based data structure server that stores keys in sorted order, ensuring fast and efficient retrieval. It offers various commands for key/value store, sorted maps store, list store, set store, hash store, and more. Treds provides unique features like optimized querying for keys with common prefixes, sorted key/value pairs, and new commands like DELPREFIX, LNGPREFIX, and PPUBLISH. It is designed for high performance with single-threaded architecture and event loop, utilizing modified Radix trees and Doubly Linked Lists for quick lookup. Treds also supports PubSub functionality and vector store operations for vector search using HNSW algorithm.
data:image/s3,"s3://crabby-images/3c972/3c97234782fde56f111ada599f410fafbca512d6" alt="chroma Screenshot"
chroma
Chroma is an open-source embedding database that provides a simple, scalable, and feature-rich way to build Python or JavaScript LLM apps with memory. It offers a fully-typed, fully-tested, and fully-documented API that makes it easy to get started and scale your applications. Chroma also integrates with popular tools like LangChain and LlamaIndex, and supports a variety of embedding models, including Sentence Transformers, OpenAI embeddings, and Cohere embeddings. With Chroma, you can easily add documents to your database, query relevant documents with natural language, and compose documents into the context window of an LLM like GPT3 for additional summarization or analysis.
data:image/s3,"s3://crabby-images/a6320/a6320fe432a23d6d1a5dbcac68733af8d08fc6c5" alt="AgroTech-AI Screenshot"
AgroTech-AI
AgroTech AI platform is a comprehensive web-based tool where users can access various machine learning models for making accurate predictions related to agriculture. It offers solutions for crop management, soil health assessment, pest control, and more. The platform implements machine learning algorithms to provide functionalities like fertilizer prediction, crop prediction, soil quality prediction, yield prediction, and mushroom edibility prediction.
data:image/s3,"s3://crabby-images/1ff19/1ff1940c54fc0f5758749b2be26fd57f53a51f9a" alt="llm-course Screenshot"
llm-course
The LLM course is divided into three parts: 1. 🧩 **LLM Fundamentals** covers essential knowledge about mathematics, Python, and neural networks. 2. 🧑🔬 **The LLM Scientist** focuses on building the best possible LLMs using the latest techniques. 3. 👷 **The LLM Engineer** focuses on creating LLM-based applications and deploying them. For an interactive version of this course, I created two **LLM assistants** that will answer questions and test your knowledge in a personalized way: * 🤗 **HuggingChat Assistant**: Free version using Mixtral-8x7B. * 🤖 **ChatGPT Assistant**: Requires a premium account. ## 📝 Notebooks A list of notebooks and articles related to large language models. ### Tools | Notebook | Description | Notebook | |----------|-------------|----------| | 🧐 LLM AutoEval | Automatically evaluate your LLMs using RunPod | data:image/s3,"s3://crabby-images/7d269/7d269d89d7371bebd9710eff23770c83b7236848" alt="Open In Colab" | | 🥱 LazyMergekit | Easily merge models using MergeKit in one click. | data:image/s3,"s3://crabby-images/7d269/7d269d89d7371bebd9710eff23770c83b7236848" alt="Open In Colab" | | 🦎 LazyAxolotl | Fine-tune models in the cloud using Axolotl in one click. | data:image/s3,"s3://crabby-images/7d269/7d269d89d7371bebd9710eff23770c83b7236848" alt="Open In Colab" | | ⚡ AutoQuant | Quantize LLMs in GGUF, GPTQ, EXL2, AWQ, and HQQ formats in one click. | data:image/s3,"s3://crabby-images/7d269/7d269d89d7371bebd9710eff23770c83b7236848" alt="Open In Colab" | | 🌳 Model Family Tree | Visualize the family tree of merged models. | data:image/s3,"s3://crabby-images/7d269/7d269d89d7371bebd9710eff23770c83b7236848" alt="Open In Colab" | | 🚀 ZeroSpace | Automatically create a Gradio chat interface using a free ZeroGPU. | data:image/s3,"s3://crabby-images/7d269/7d269d89d7371bebd9710eff23770c83b7236848" alt="Open In Colab" |
data:image/s3,"s3://crabby-images/580f0/580f04cf513e2c743f39fafdc408fc77a0cc2de6" alt="LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing Screenshot"
LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing
LLM-PowerHouse is a comprehensive and curated guide designed to empower developers, researchers, and enthusiasts to harness the true capabilities of Large Language Models (LLMs) and build intelligent applications that push the boundaries of natural language understanding. This GitHub repository provides in-depth articles, codebase mastery, LLM PlayLab, and resources for cost analysis and network visualization. It covers various aspects of LLMs, including NLP, models, training, evaluation metrics, open LLMs, and more. The repository also includes a collection of code examples and tutorials to help users build and deploy LLM-based applications.
data:image/s3,"s3://crabby-images/cf600/cf6006f96243ca22cb6611c1d8034972bdee582e" alt="Efficient-LLMs-Survey Screenshot"
Efficient-LLMs-Survey
This repository provides a systematic and comprehensive review of efficient LLMs research. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient LLMs topics from **model-centric** , **data-centric** , and **framework-centric** perspective, respectively. We hope our survey and this GitHub repository can serve as valuable resources to help researchers and practitioners gain a systematic understanding of the research developments in efficient LLMs and inspire them to contribute to this important and exciting field.
data:image/s3,"s3://crabby-images/c8bf3/c8bf3608afc99a8fdf4eeafb2d094668c8cb0c15" alt="Awesome-LLMs-on-device Screenshot"
Awesome-LLMs-on-device
Welcome to the ultimate hub for on-device Large Language Models (LLMs)! This repository is your go-to resource for all things related to LLMs designed for on-device deployment. Whether you're a seasoned researcher, an innovative developer, or an enthusiastic learner, this comprehensive collection of cutting-edge knowledge is your gateway to understanding, leveraging, and contributing to the exciting world of on-device LLMs.
data:image/s3,"s3://crabby-images/7fe12/7fe12a44acdf789cb975da9d32e6396389ba970f" alt="AI-PhD-S24 Screenshot"
AI-PhD-S24
AI-PhD-S24 is a mono-repo for the PhD course 'AI for Business Research' at CUHK Business School in Spring 2024. The course aims to provide a basic understanding of machine learning and artificial intelligence concepts/methods used in business research, showcase how ML/AI is utilized in business research, and introduce state-of-the-art AI/ML technologies. The course includes scribed lecture notes, class recordings, and covers topics like AI/ML fundamentals, DL, NLP, CV, unsupervised learning, and diffusion models.
data:image/s3,"s3://crabby-images/d34ae/d34ae0fc6bca1a53a6a90146f87a4502966a4f31" alt="raft Screenshot"
raft
RAFT (Reusable Accelerated Functions and Tools) is a C++ header-only template library with an optional shared library that contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
data:image/s3,"s3://crabby-images/add44/add44387ff60b1c72afd8cf284b372059eeb353f" alt="chatWeb Screenshot"
chatWeb
ChatWeb is a tool that can crawl web pages, extract text from PDF, DOCX, TXT files, and generate an embedded summary. It can answer questions based on text content using chatAPI and embeddingAPI based on GPT3.5. The tool calculates similarity scores between text vectors to generate summaries, performs nearest neighbor searches, and designs prompts to answer user questions. It aims to extract relevant content from text and provide accurate search results based on keywords. ChatWeb supports various modes, languages, and settings, including temperature control and PostgreSQL integration.
3 - OpenAI Gpts
data:image/s3,"s3://crabby-images/58be0/58be05112402102344ea883ae0b44ee17ac40e61" alt="VA: Veterans Benefits Navigator (VBN) Screenshot"
VA: Veterans Benefits Navigator (VBN)
Veterans Benefits Navigator (VBN) is a specialized chatbot designed to guide U.S. veterans through the complexities of VA benefits. It offers tailored, up-to-date information, locates nearest VA facilities, and ensures empathetic, confidential assistance for all benefit-related inquiries.