octopus-v4

AI for all: Build the large graph of the language models

Stars: 97

Visit

The Octopus-v4 project aims to build the world's largest graph of language models, integrating specialized models and training Octopus models to connect nodes efficiently. The project focuses on identifying, training, and connecting specialized models. The repository includes scripts for running the Octopus v4 model, methods for managing the graph, training code for specialized models, and inference code. Environment setup instructions are provided for Linux with NVIDIA GPU. The Octopus v4 model helps users find suitable models for tasks and reformats queries for effective processing. The project leverages Language Large Models for various domains and provides benchmark results. Users are encouraged to train and add specialized models following recommended procedures.

README:

Graph of Language Models

Let's build this graph together! We have tried our best to find the specialized models, but we can definitely do more with your participation!

🔗 Octopus-v4 on Hugging Face
🏆Open LLM Leaderboard for domains

This project aims to build the world's largest graph of language models. To our knowledge, it is the first attempt to construct such a graph. Have a look at our design demo. In this graph, we will integrate many different specialized models and train the respective Octopus models for the edges between the nodes to help distribute and deliver information effectively. We wish to unit all open source language models to deliver the performance that can compete with closed source models.

The project is still in its early stages, and we have only included the very initial Octopus model. However, at Nexa AI, we are committed to dedicating significant time and resources to create a powerful graph of AI models.

Project Scope

The project will mainly focus on the following aspects:

Identifying the specialized models needed and training these models.
Constructing the graph consisting of multiple specialized models as nodes.
Training the Octopus models to connect different nodes efficiently.

The file structure of this GitHub repository is organized as follows:

main.py: This is the primary script for running the Octopus v4 model.
build_graph: Contains methods for constructing and managing the graph of language models. This includes operations such as creating, updating, and deleting nodes and edges.
specialized_models: Here, you'll find the training code along with a tutorial on how to prepare your data and train the specialized models. We provide code based on Hugging Face Transformers TRL library, to facilitate your training process. Feel free to raise any issues or questions you encounter during training.
specialized_models_inference: Here, you can find the inference code for the specialized models. This code is used to work with octopus-v4 model through the graph of language models, the entrance is the specialized_infer.py file.

Environment Setup

We recommend using a Linux environment and assume that you have an NVIDIA GPU when contributing to the project. To set up the project, follow these steps:

conda create -n octopus4 python=3.10
pip3 install torch torchvision torchaudio
pip3 install transformers datasets accelerate peft

Make sure to install PyTorch first, followed by the other packages. We recommend to install torchvision and torchaudio as well since we will introduce multimodal AI agent in the future. Alternatively, you can create a dev environment using our Docker image. For more information on setting up a dev environment, refer to this YouTube video. And you can use our Dockerfile to build the image.

docker build -t octopus4 .
docker run --gpus all -p 8700:8700 octopus4

Otherwise, you can directly pull our docker image

docker pull nexaai/octopus4

Using the Octopus v4 Model

Our initial v4 model is customized for the MMLU benchmark. However, we plan to support real-world use cases in the future. The Octopus v4 model helps you find the most appropriate model to finish your task and reformats your query so that the worker model can process it effectively. In a graph setup, it knows the best neighbor to choose and how to message from one node to another.

Here's an example of the result for Octopus v4 model:

Query: Tell me the result of derivative of x^3 when x is 2?

<nexa_4>('Determine the derivative of the function f(x) = x^3 at the point where x equals 2, and interpret the result within the context of rate of change and tangent slope.')
<nexa_end>

In this use case, <nexa_4> is the special token representing the math GPT. The natural math question is converted into a professional math expression to facilitate better understanding by the worker model. To try our model, you can use python main.py to run the code to try the Octopus v4 model.

The respective models used in our experiments are as follows:

Model Selection

We leverage the latest Language Large Models for a variety of domains. Below is a summary of the chosen models for each category. In cases where no specialized model exists for a subject, we utilize generic models like Llama3-8b. You may consider to add more content to our table below. Nexa AI will create another leaderboard for the specialized model.

Model	Category	Subjects
`jondurbin/bagel-8b-v1.0`	Biology	`college_biology`, `high_school_biology`
`Weyaxi/Einstein-v6.1-Llama3-8B`	Physics	`astronomy`, `college_physics`, `conceptual_physics`, `high_school_physics`
`meta-llama/Meta-Llama-3-8B-Instruct`	Business	`business_ethics`, `management`, `marketing`
`meta-llama/Meta-Llama-3-8B-Instruct`	Chemistry	`college_chemistry`, `high_school_chemistry`
`abacusai/Llama-3-Smaug-8B`	Computer Science	`college_computer_science`, `computer_security`, `high_school_computer_science`, `machine_learning`
`Open-Orca/Mistral-7B-OpenOrca`	Math	`abstract_algebra`, `college_mathematics`, `elementary_mathematics`, `high_school_mathematics`, `high_school_statistics`
`meta-llama/Meta-Llama-3-8B-Instruct`	Economics	`econometrics`, `high_school_macroeconomics`, `high_school_microeconomics`
`AdaptLLM/medicine-chat`	Health	`anatomy`, `clinical_knowledge`, `college_medicine`, `human_aging`, `medical_genetics`, `nutrition`, `professional_medicine`, `virology`
`STEM-AI-mtl/phi-2-electrical-engineering`	Engineering	`electrical_engineering`
`meta-llama/Meta-Llama-3-8B-Instruct`	Philosophy	`formal_logic`, `logical_fallacies`, `moral_disputes`, `moral_scenarios`, `philosophy`, `world_religions`
`microsoft/Phi-3-mini-128k-instruct`	Other	`global_facts`, `miscellaneous`, `professional_accounting`
`meta-llama/Meta-Llama-3-8B-Instruct`	History	`high_school_european_history`, `high_school_us_history`, `high_school_world_history`, `prehistory`
`meta-llama/Meta-Llama-3-8B-Instruct`	Culture	`human_sexuality`, `sociology`
`AdaptLLM/law-chat`	Law	`international_law`, `jurisprudence`, `professional_law`
`meta-llama/Meta-Llama-3-8B-Instruct`	Psychology	`high_school_psychology`, `professional_psychology`

MMLU Benchmark Results (5-shot learning)

Here are the comparative MMLU scores for various models tested under a 5-shot learning setup:

Model	MMLU Score
Octopus-V4	74.6%
GPT-3.5	70.0%
Phi-3-mini-128k-instruct	68.1%
OpenELM-3B	26.7%
Lamma3-8b-instruct	68.4%
Gemma-2b	42.3%
Gemma-7b	64.3%

Domain LLM Leaderboard

Explore our collection of domain-specific large language models (LLMs) or contribute by suggesting new models tailored to specific domains. For detailed information on available models and to engage with our community, please visit our Domain LLM Leaderboard.

Train the Specialized Models

We encourage you to train and add the specialized model list.

For instructions on training specialized models, please refer to the specialized_models directory. We currently support training using Hugging Face TRL, chosen for its convenience and robustness in training specialized models. Future updates will expand support to include LoRA training, training larger models (such as 13B and 70B), distributed training, and more. Stay tuned for these enhancements.

Recommended Training Procedures

To develop your specialized model effectively, we suggest the following steps:

Data Collection and Preparation: Collect a dataset specific to your domain. Process this dataset to ensure it is clean and free from inappropriate content.
Model Training: Train your model using the Supervised Fine-Tuning (SFT) method.
DPO Training: Prepare a dataset for Direct Preference Optimization (DPO), and use the DPO to train your model.

For Tasks:

Click tags to check more tools for each tasks

train models build graph run inference benchmark models develop specialized models

For Jobs:

data scientist machine learning engineer ai researcher natural language processing engineer research scientist

Alternative AI tools for octopus-v4

Similar Open Source Tools

octopus-v4

github

: 97

lingua

Meta Lingua is a minimal and fast LLM training and inference library designed for research. It uses easy-to-modify PyTorch components to experiment with new architectures, losses, and data. The codebase enables end-to-end training, inference, and evaluation, providing tools for speed and stability analysis. The repository contains essential components in the 'lingua' folder and scripts that combine these components in the 'apps' folder. Researchers can modify the provided templates to suit their experiments easily. Meta Lingua aims to lower the barrier to entry for LLM research by offering a lightweight and focused codebase.

github

: 4.4k

llm-foundry

LLM Foundry is a codebase for training, finetuning, evaluating, and deploying LLMs for inference with Composer and the MosaicML platform. It is designed to be easy-to-use, efficient _and_ flexible, enabling rapid experimentation with the latest techniques. You'll find in this repo: * `llmfoundry/` - source code for models, datasets, callbacks, utilities, etc. * `scripts/` - scripts to run LLM workloads * `data_prep/` - convert text data from original sources to StreamingDataset format * `train/` - train or finetune HuggingFace and MPT models from 125M - 70B parameters * `train/benchmarking` - profile training throughput and MFU * `inference/` - convert models to HuggingFace or ONNX format, and generate responses * `inference/benchmarking` - profile inference latency and throughput * `eval/` - evaluate LLMs on academic (or custom) in-context-learning tasks * `mcli/` - launch any of these workloads using MCLI and the MosaicML platform * `TUTORIAL.md` - a deeper dive into the repo, example workflows, and FAQs

github

: 4.1k

llm_qlora

LLM_QLoRA is a repository for fine-tuning Large Language Models (LLMs) using QLoRA methodology. It provides scripts for training LLMs on custom datasets, pushing models to HuggingFace Hub, and performing inference. Additionally, it includes models trained on HuggingFace Hub, a blog post detailing the QLoRA fine-tuning process, and instructions for converting and quantizing models. The repository also addresses troubleshooting issues related to Python versions and dependencies.

github

: 207

chatgpt-cli

ChatGPT CLI provides a powerful command-line interface for seamless interaction with ChatGPT models via OpenAI and Azure. It features streaming capabilities, extensive configuration options, and supports various modes like streaming, query, and interactive mode. Users can manage thread-based context, sliding window history, and provide custom context from any source. The CLI also offers model and thread listing, advanced configuration options, and supports GPT-4, GPT-3.5-turbo, and Perplexity's models. Installation is available via Homebrew or direct download, and users can configure settings through default values, a config.yaml file, or environment variables.

github

: 661

garak

Garak is a free tool that checks if a Large Language Model (LLM) can be made to fail in a way that is undesirable. It probes for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and many other weaknesses. Garak's a free tool. We love developing it and are always interested in adding functionality to support applications.

github

: 1.3k

ai-starter-kit

SambaNova AI Starter Kits is a collection of open-source examples and guides designed to facilitate the deployment of AI-driven use cases for developers and enterprises. The kits cover various categories such as Data Ingestion & Preparation, Model Development & Optimization, Intelligent Information Retrieval, and Advanced AI Capabilities. Users can obtain a free API key using SambaNova Cloud or deploy models using SambaStudio. Most examples are written in Python but can be applied to any programming language. The kits provide resources for tasks like text extraction, fine-tuning embeddings, prompt engineering, question-answering, image search, post-call analysis, and more.

github

: 215

mergekit

Mergekit is a toolkit for merging pre-trained language models. It uses an out-of-core approach to perform unreasonably elaborate merges in resource-constrained situations. Merges can be run entirely on CPU or accelerated with as little as 8 GB of VRAM. Many merging algorithms are supported, with more coming as they catch my attention.

github

: 5.5k

garak

Garak is a vulnerability scanner designed for LLMs (Large Language Models) that checks for various weaknesses such as hallucination, data leakage, prompt injection, misinformation, toxicity generation, and jailbreaks. It combines static, dynamic, and adaptive probes to explore vulnerabilities in LLMs. Garak is a free tool developed for red-teaming and assessment purposes, focusing on making LLMs or dialog systems fail. It supports various LLM models and can be used to assess their security and robustness.

github

: 4.2k

log10

Log10 is a one-line Python integration to manage your LLM data. It helps you log both closed and open-source LLM calls, compare and identify the best models and prompts, store feedback for fine-tuning, collect performance metrics such as latency and usage, and perform analytics and monitor compliance for LLM powered applications. Log10 offers various integration methods, including a python LLM library wrapper, the Log10 LLM abstraction, and callbacks, to facilitate its use in both existing production environments and new projects. Pick the one that works best for you. Log10 also provides a copilot that can help you with suggestions on how to optimize your prompt, and a feedback feature that allows you to add feedback to your completions. Additionally, Log10 provides prompt provenance, session tracking and call stack functionality to help debug prompt chains. With Log10, you can use your data and feedback from users to fine-tune custom models with RLHF, and build and deploy more reliable, accurate and efficient self-hosted models. Log10 also supports collaboration, allowing you to create flexible groups to share and collaborate over all of the above features.

github

: 96

assistant

The WhatsApp AI Assistant repository offers a chatbot named Sydney that serves as an AI-powered personal assistant. It utilizes Language Model (LLM) technology to provide various features such as Google/Bing searching, Google Calendar integration, communication capabilities, group chat compatibility, voice message support, basic text reminders, image recognition, and more. Users can interact with Sydney through natural language queries and voice messages. The chatbot can transcribe voice messages using either the Whisper API or a local method. Additionally, Sydney can be used in group chats by mentioning her username or replying to her last message. The repository welcomes contributions in the form of issue reports, pull requests, and requests for new tools. The creators of the project, Veigamann and Luisotee, are open to job opportunities and can be contacted through their GitHub profiles.

github

: 99

ML-Bench

ML-Bench is a tool designed to evaluate large language models and agents for machine learning tasks on repository-level code. It provides functionalities for data preparation, environment setup, usage, API calling, open source model fine-tuning, and inference. Users can clone the repository, load datasets, run ML-LLM-Bench, prepare data, fine-tune models, and perform inference tasks. The tool aims to facilitate the evaluation of language models and agents in the context of machine learning tasks on code repositories.

github

: 344

bolna

Bolna is an open-source platform for building voice-driven conversational applications using large language models (LLMs). It provides a comprehensive set of tools and integrations to handle various aspects of voice-based interactions, including telephony, transcription, LLM-based conversation handling, and text-to-speech synthesis. Bolna simplifies the process of creating voice agents that can perform tasks such as initiating phone calls, transcribing conversations, generating LLM-powered responses, and synthesizing speech. It supports multiple providers for each component, allowing users to customize their setup based on their specific needs. Bolna is designed to be easy to use, with a straightforward local setup process and well-documented APIs. It is also extensible, enabling users to integrate with other telephony providers or add custom functionality.

github

: 205

code2prompt

code2prompt is a command-line tool that converts your codebase into a single LLM prompt with a source tree, prompt templating, and token counting. It automates generating LLM prompts from codebases of any size, customizing prompt generation with Handlebars templates, respecting .gitignore, filtering and excluding files using glob patterns, displaying token count, including Git diff output, copying prompt to clipboard, saving prompt to an output file, excluding files and folders, adding line numbers to source code blocks, and more. It helps streamline the process of creating LLM prompts for code analysis, generation, and other tasks.

github

: 5.1k

LLMBox

LLMBox is a comprehensive library designed for implementing Large Language Models (LLMs) with a focus on a unified training pipeline and comprehensive model evaluation. It serves as a one-stop solution for training and utilizing LLMs, offering flexibility and efficiency in both training and utilization stages. The library supports diverse training strategies, comprehensive datasets, tokenizer vocabulary merging, data construction strategies, parameter efficient fine-tuning, and efficient training methods. For utilization, LLMBox provides comprehensive evaluation on various datasets, in-context learning strategies, chain-of-thought evaluation, evaluation methods, prefix caching for faster inference, support for specific LLM models like vLLM and Flash Attention, and quantization options. The tool is suitable for researchers and developers working with LLMs for natural language processing tasks.

github

: 755

sandbox

Sandbox is an open-source cloud-based code editing environment with custom AI code autocompletion and real-time collaboration. It consists of a frontend built with Next.js, TailwindCSS, Shadcn UI, Clerk, Monaco, and Liveblocks, and a backend with Express, Socket.io, Cloudflare Workers, D1 database, R2 storage, Workers AI, and Drizzle ORM. The backend includes microservices for database, storage, and AI functionalities. Users can run the project locally by setting up environment variables and deploying the containers. Contributions are welcome following the commit convention and structure provided in the repository.

github

: 1.1k

For similar tasks

byteir

The ByteIR Project is a ByteDance model compilation solution. ByteIR includes compiler, runtime, and frontends, and provides an end-to-end model compilation solution. Although all ByteIR components (compiler/runtime/frontends) are together to provide an end-to-end solution, and all under the same umbrella of this repository, each component technically can perform independently. The name, ByteIR, comes from a legacy purpose internally. The ByteIR project is NOT an IR spec definition project. Instead, in most scenarios, ByteIR directly uses several upstream MLIR dialects and Google Mhlo. Most of ByteIR compiler passes are compatible with the selected upstream MLIR dialects and Google Mhlo.

github

: 418

ScandEval

ScandEval is a framework for evaluating pretrained language models on mono- or multilingual language tasks. It provides a unified interface for benchmarking models on a variety of tasks, including sentiment analysis, question answering, and machine translation. ScandEval is designed to be easy to use and extensible, making it a valuable tool for researchers and practitioners alike.

github

: 81

opencompass

OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features include: * Comprehensive support for models and datasets: Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 70+ datasets with about 400,000 questions, comprehensively evaluating the capabilities of the models in five dimensions. * Efficient distributed evaluation: One line command to implement task division and distributed evaluation, completing the full evaluation of billion-scale models in just a few hours. * Diversified evaluation paradigms: Support for zero-shot, few-shot, and chain-of-thought evaluations, combined with standard or dialogue-type prompt templates, to easily stimulate the maximum performance of various models. * Modular design with high extensibility: Want to add new models or datasets, customize an advanced task division strategy, or even support a new cluster management system? Everything about OpenCompass can be easily expanded! * Experiment management and reporting mechanism: Use config files to fully record each experiment, and support real-time reporting of results.

github

: 4.8k

openvino.genai

The GenAI repository contains pipelines that implement image and text generation tasks. The implementation uses OpenVINO capabilities to optimize the pipelines. Each sample covers a family of models and suggests certain modifications to adapt the code to specific needs. It includes the following pipelines: 1. Benchmarking script for large language models 2. Text generation C++ samples that support most popular models like LLaMA 2 3. Stable Diffuison (with LoRA) C++ image generation pipeline 4. Latent Consistency Model (with LoRA) C++ image generation pipeline

github

: 246

GPT4Point

GPT4Point is a unified framework for point-language understanding and generation. It aligns 3D point clouds with language, providing a comprehensive solution for tasks such as 3D captioning and controlled 3D generation. The project includes an automated point-language dataset annotation engine, a novel object-level point cloud benchmark, and a 3D multi-modality model. Users can train and evaluate models using the provided code and datasets, with a focus on improving models' understanding capabilities and facilitating the generation of 3D objects.

github

: 253

octopus-v4

github

: 97

Awesome-LLM-RAG

This repository, Awesome-LLM-RAG, aims to record advanced papers on Retrieval Augmented Generation (RAG) in Large Language Models (LLMs). It serves as a resource hub for researchers interested in promoting their work related to LLM RAG by updating paper information through pull requests. The repository covers various topics such as workshops, tutorials, papers, surveys, benchmarks, retrieval-enhanced LLMs, RAG instruction tuning, RAG in-context learning, RAG embeddings, RAG simulators, RAG search, RAG long-text and memory, RAG evaluation, RAG optimization, and RAG applications.

github

: 733

stm32ai-modelzoo

The STM32 AI model zoo is a collection of reference machine learning models optimized to run on STM32 microcontrollers. It provides a large collection of application-oriented models ready for re-training, scripts for easy retraining from user datasets, pre-trained models on reference datasets, and application code examples generated from user AI models. The project offers training scripts for transfer learning or training custom models from scratch. It includes performances on reference STM32 MCU and MPU for float and quantized models. The project is organized by application, providing step-by-step guides for training and deploying models.

github

: 255

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675