gorilla
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
Stars: 11657
Gorilla is a tool that enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke. With Gorilla, you can use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. Gorilla also releases APIBench, the largest collection of APIs, curated and easy to be trained on!
README:
π’ Check out our detailed Berkeley Function Calling Leaderboard changelog (Last updated: ) for the latest dataset / model updates to the Berkeley Function Calling Leaderboard!
-
π― [10/04/2024] Introducing the Agent Arena by Gorilla X LMSYS Chatbot Arena! Compare different agents in tasks like search, finance, RAG, and beyond. Explore which models and tools work best for specific tasks through our novel ranking system and community-driven prompt hub. [Blog] [Arena] [Leaderboard] [Dataset] [Tweet]
-
π£ [09/21/2024] Announcing BFCL V3 - Evaluating multi-turn and multi-step function calling capabilities! New state-based evaluation system tests models on handling complex workflows, sequential functions, and service states. [Blog] [Leaderboard] [Code] [Tweet]
-
π [08/20/2024] Released BFCL V2 β’ Live! The Berkeley Function-Calling Leaderboard now features enterprise-contributed data and real-world scenarios. [Blog] [Live Leaderboard] [V2 Categories Leaderboard] [Tweet]
-
β‘οΈ [04/12/2024] Excited to release GoEx - a runtime for LLM-generated actions like code, API calls, and more. Featuring "post-facto validation" for assessing LLM actions after execution, "undo" and "damage confinement" abstractions to manage unintended actions & risks. This paves the way for fully autonomous LLM agents, enhancing interaction between apps & services with human-out-of-loop. [Blog] [Code] [Paper] [Tweet]
-
β° [04/01/2024] Introducing cost and latency metrics into Berkeley function calling leaderboard!
-
π [03/15/2024] RAFT: Adapting Language Model to Domain Specific RAG is live! [MSFT-Meta blog] [Berkeley Blog]
-
π [02/26/2024] Berkeley Function Calling Leaderboard is live!
-
π― [02/25/2024] OpenFunctions v2 sets new SoTA for open-source LLMs!
-
π₯ [11/16/2023] Excited to release Gorilla OpenFunctions
-
π» [06/29/2023] Released gorilla-cli, LLMs for your CLI!
-
π’ [06/06/2023] Released Commercially usable, Apache 2.0 licensed Gorilla models
-
π [05/30/2023] Provided the CLI interface to chat with Gorilla!
-
π [05/28/2023] Released Torch Hub and TensorFlow Hub Models!
-
π [05/27/2023] Released the first Gorilla model! or π€!
-
π₯ [05/27/2023] We released the APIZoo contribution guide for community API contributions!
-
π₯ [05/25/2023] We release the APIBench dataset and the evaluation code of Gorilla!
Gorilla enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke.
With Gorilla, we are the first to demonstrate how to use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. This repository contains inference code for running Gorilla finetuned models, evaluation code for reproducing results from our paper, and APIBench - the largest collection of APIs, curated and easy to be trained on!
Since our initial release, we've served ~500k requests and witnessed incredible adoption by developers worldwide. The project has expanded to include tools, evaluations, leaderboard, end-to-end finetuning recipes, infrastructure components, and the Gorilla API Store:
Project | Type | Description (click to expand) |
---|---|---|
Gorilla Paper | π€ Model π Fine-tuning π Dataset π Evaluation π§ Infra |
Large Language Model Connected with Massive APIsβ’ Novel finetuning approach for API invocationβ’ Evaluation on 1,600+ APIs (APIBench) β’ Retrieval-augmented training for test-time adaptation |
Gorilla OpenFunctions-V2 | π€ Model | Drop-in alternative for function calling, supporting multiple complex data types and parallel executionβ’ Multiple & parallel function execution with OpenAI-compatible endpointsβ’ Native support for Python, Java, JavaScript, and REST APIs with expanded data types β’ Function relevance detection to reduce hallucinations β’ Enhanced RESTful API formatting capabilities β’ State-of-the-art performance among open-source models |
Berkeley Function Calling Leaderboard (BFCL) | π Evaluation π Leaderboard π§ Function Calling Infra π Dataset |
Comprehensive evaluation of function-calling capabilitiesβ’ V1: Expert-curated dataset for evaluating single-turn function callingβ’ V2: Enterprise-contributed data for real-world scenarios β’ V3: Multi-turn & multi-step function calling evaluation β’ Cost and latency metrics for all models β’ Interactive API explorer for testing β’ Community-driven benchmarking platform |
Agent Arena | π Evaluation π Leaderboard |
Compare LLM agents across models, tools, and frameworksβ’ Head-to-head agent comparisons with ELO rating systemβ’ Framework compatibility testing (LangChain, AutoGPT) β’ Community-driven evaluation platform β’ Real-world task performance metrics |
Gorilla Execution Engine (GoEx) | π§ Infra | Runtime for executing LLM-generated actions with safety guaranteesβ’ Post-facto validation for verifying LLM actions after executionβ’ Undo capabilities and damage confinement for risk mitigation β’ OAuth2 and API key authentication for multiple services β’ Support for RESTful APIs, databases, and filesystem operations β’ Docker-based sandboxed execution environment |
Retrieval-Augmented Fine-tuning (RAFT) | π Fine-tuning π€ Model |
Fine-tuning LLMs for robust domain-specific retrievalβ’ Novel fine-tuning recipe for domain-specific RAGβ’ Chain-of-thought answers with direct document quotes β’ Training with oracle and distractor documents β’ Improved performance on PubMed, HotpotQA, and Gorilla benchmarks β’ Efficient adaptation of smaller models for domain QA |
Gorilla CLI | π€ Model π§ Local CLI Infra |
LLMs for your command-line interfaceβ’ User-friendly CLI tool supporting ~1500 APIs (Kubernetes, AWS, GCP, etc.)β’ Natural language command generation with multi-LLM fusion β’ Privacy-focused with explicit execution approval β’ Command history and interactive selection interface |
Gorilla API Zoo | π Dataset | A community-maintained repository of up-to-date API documentationβ’ Centralized, searchable index of APIs across domainsβ’ Structured documentation format with arguments, versioning, and examples β’ Community-driven updates to keep pace with API changes β’ Rich data source for model training and fine-tuning β’ Enables retrieval-augmented training and inference β’ Reduces hallucination through up-to-date documentation |
Try Gorilla in your browser:
- π Gorilla Colab Demo: Try the base Gorilla model
- π Gorilla Gradio Demo: Interactive web interface
- π₯ OpenFunctions Colab Demo: Try the latest OpenFunctions model
- π― OpenFunctions Website Demo: Experiment with function calling
- π Berkeley Function Calling Leaderboard: Compare function calling capabilities
- Gorilla CLI - Fastest way to get started
pip install gorilla-cli
gorilla generate 100 random characters into a file called test.txt
Learn more about Gorilla CLI β
- Run Gorilla Locally
git clone https://github.com/ShishirPatil/gorilla.git
cd gorilla/inference
Detailed local setup instructions β
- Use OpenFunctions
import openai
openai.api_key = "EMPTY"
openai.api_base = "http://luigi.millennium.berkeley.edu:8000/v1"
# Define your functions
functions = [{
"name": "get_current_weather",
"description": "Get weather in a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}]
# Make API call
completion = openai.ChatCompletion.create(
model="gorilla-openfunctions-v2",
messages=[{"role": "user", "content": "What's the weather in San Francisco?"}],
functions=functions
)
OpenFunctions documentation β
-
π Evaluation & Benchmarking
- Berkeley Function Calling Leaderboard: Compare function calling capabilities
- Agent Arena: Evaluate agent workflows
- Gorilla Paper Evaluation Scripts: Run your own evaluations
-
π οΈ Development Tools
- I would like to use Gorilla commercially. Is there going to be an Apache 2.0 licensed version?
Yes! We now have models that you can use commercially without any obligations.
- Can we use Gorilla with other tools like Langchain etc?
Absolutely! You've highlighted a great aspect of our tools. Gorilla is an end-to-end model, specifically tailored to serve correct API calls (tools) without requiring any additional coding. It's designed to work as part of a wider ecosystem and can be flexibly integrated within agentic frameworks and other tools.
Langchain, is a versatile developer tool. Its "agents" can efficiently swap in any LLM, Gorilla included, making it a highly adaptable solution for various needs.
The beauty of these tools truly shines when they collaborate, complementing each other's strengths and capabilities to create an even more powerful and comprehensive solution. This is where your contribution can make a difference. We enthusiastically welcome any inputs to further refine and enhance these tools.
Check out our blog on How to Use Gorilla: A Step-by-Step Walkthrough to see all the different ways you can integrate Gorilla in your projects.
In the immediate future, we plan to release the following:
- [ ] Multimodal function-calling leaderboard
- [ ] Agentic function-calling leaderboard
- [ ] New batch of user contributed live function calling evals.
- [ ] BFCL metrics to evaluate contamination
- [ ] Openfunctions-v3 model to support more languages and multi-turn capability
- [x] Agent Arena to compare LLM agents across models, tools, and frameworks [10/04/2024]
- [x] Multi-turn and multi-step function calling evaluation [09/21/2024]
- [x] User contributed Live Function Calling Leaderboard [08/20/2024]
- [x] BFCL systems metrics including cost and latency [04/01/2024]
- [x] Gorilla Execution Engine (GoEx) - Runtime for executing LLM-generated actions with safety guarantees [04/12/2024]
- [x] Berkeley Function Calling leaderboard (BFCL) for evaluating tool-calling/function-calling models [02/26/2024]
- [x] Openfunctions-v2 with more languages (Java, JS, Python), relevance detection [02/26/2024]
- [x] API Zoo Index for easy access to all APIs [02/16/2024]
- [x] Openfunctions-v1, Apache 2.0, with parallel and multiple function calling [11/16/2023]
- [x] Openfunctions-v0, Apache 2.0 function calling model [11/16/2023]
- [X] Release a commercially usable, Apache 2.0 licensed Gorilla model [06/05/2023]
- [X] Release weights for all APIs from APIBench [05/28/2023]
- [X] Run Gorilla LLM locally [05/28/2023]
- [X] Release weights for HF model APIs [05/27/2023]
- [X] Hosted Gorilla LLM chat for HF model APIs [05/27/2023]
- [X] Opening up the APIZoo for contributions from community
- [X] Dataset and Eval Code
Gorilla is Apache 2.0 licensed, making it suitable for both academic and commercial use.
- π¬ Join our Discord Community
- π¦ Follow us on X
@article{patil2023gorilla,
title={Gorilla: Large Language Model Connected with Massive APIs},
author={Shishir G. Patil and Tianjun Zhang and Xin Wang and Joseph E. Gonzalez},
year={2023},
journal={arXiv preprint arXiv:2305.15334},
}
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for gorilla
Similar Open Source Tools
gorilla
Gorilla is a tool that enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke. With Gorilla, you can use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. Gorilla also releases APIBench, the largest collection of APIs, curated and easy to be trained on!
lobe-chat
Lobe Chat is an open-source, modern-design ChatGPT/LLMs UI/Framework. Supports speech-synthesis, multi-modal, and extensible ([function call][docs-functionc-call]) plugin system. One-click **FREE** deployment of your private OpenAI ChatGPT/Claude/Gemini/Groq/Ollama chat application.
anything
Anything is an open automation tool built in Rust that aims to rebuild Zapier, enabling local AI to perform a wide range of tasks beyond chat functionalities. The tool focuses on extensibility without sacrificing understandability, allowing users to create custom extensions in Rust or other interpreted languages like Python or Typescript. It features an embedded SQLite DB, a WYSIWYG editor, event system, cron trigger, HTTP and CLI extensions, with plans for additional extensions like Deno, Python, and Local AI. The tool is designed to be user-friendly, with a file-first state approach, portable triggers, actions, and flows, and a human-centric file and folder naming convention. It does not require Docker, making it easy to run on low-powered devices for 24/7 self-hosting. The event processing is focused on simplicity and visibility, with extensibility through custom extensions and a marketplace for templates, actions, and triggers.
esp-ai
ESP-AI provides a complete AI conversation solution for your development board, including IAT+LLM+TTS integration solutions for ESP32 series development boards. It can be injected into projects without affecting existing ones. By providing keys from platforms like iFlytek, Jiling, and local services, you can run the services without worrying about interactions between services or between development boards and services. The project's server-side code is based on Node.js, and the hardware code is based on Arduino IDE.
Flare
Flare is an open-source AI-powered decentralized social network client for Android/iOS/macOS, consolidating multiple social networks into one platform. It allows cross-posting content, ensures privacy, and plans to implement features like mixed timeline, AI-powered functions, and support for various platforms. The project is in active development and aims to provide a seamless social networking experience for users.
gptel
GPTel is a simple Large Language Model chat client for Emacs, with support for multiple models and backends. It's async and fast, streams responses, and interacts with LLMs from anywhere in Emacs. LLM responses are in Markdown or Org markup. Supports conversations and multiple independent sessions. Chats can be saved as regular Markdown/Org/Text files and resumed later. You can go back and edit your previous prompts or LLM responses when continuing a conversation. These will be fed back to the model. Don't like gptel's workflow? Use it to create your own for any supported model/backend with a simple API.
core
OpenSumi is a framework designed to help users quickly build AI Native IDE products. It provides a set of tools and templates for creating Cloud IDEs, Desktop IDEs based on Electron, CodeBlitz web IDE Framework, Lite Web IDE on the Browser, and Mini-App liked IDE. The framework also offers documentation for users to refer to and a detailed guide on contributing to the project. OpenSumi encourages contributions from the community and provides a platform for users to report bugs, contribute code, or improve documentation. The project is licensed under the MIT license and contains third-party code under other open source licenses.
LMOps
LMOps is a research initiative focusing on fundamental research and technology for building AI products with foundation models, particularly enabling AI capabilities with Large Language Models (LLMs) and Generative AI models. The project explores various aspects such as prompt optimization, longer context handling, LLM alignment, acceleration of LLMs, LLM customization, and understanding in-context learning. It also includes tools like Promptist for automatic prompt optimization, Structured Prompting for efficient long-sequence prompts consumption, and X-Prompt for extensible prompts beyond natural language. Additionally, LLMA accelerators are developed to speed up LLM inference by referencing and copying text spans from documents. The project aims to advance technologies that facilitate prompting language models and enhance the performance of LLMs in various scenarios.
LakeSoul
LakeSoul is a cloud-native Lakehouse framework that supports scalable metadata management, ACID transactions, efficient and flexible upsert operation, schema evolution, and unified streaming & batch processing. It supports multiple computing engines like Spark, Flink, Presto, and PyTorch, and computing modes such as batch, stream, MPP, and AI. LakeSoul scales metadata management and achieves ACID control by using PostgreSQL. It provides features like automatic compaction, table lifecycle maintenance, redundant data cleaning, and permission isolation for metadata.
LLMGA
LLMGA (Multimodal Large Language Model-based Generation Assistant) is a tool that leverages Large Language Models (LLMs) to assist users in image generation and editing. It provides detailed language generation prompts for precise control over Stable Diffusion (SD), resulting in more intricate and precise content in generated images. The tool curates a dataset for prompt refinement, similar image generation, inpainting & outpainting, and visual question answering. It offers a two-stage training scheme to optimize SD alignment and a reference-based restoration network to alleviate texture, brightness, and contrast disparities in image editing. LLMGA shows promising generative capabilities and enables wider applications in an interactive manner.
efficient-transformers
Efficient Transformers Library provides reimplemented blocks of Large Language Models (LLMs) to make models functional and highly performant on Qualcomm Cloud AI 100. It includes graph transformations, handling for under-flows and overflows, patcher modules, exporter module, sample applications, and unit test templates. The library supports seamless inference on pre-trained LLMs with documentation for model optimization and deployment. Contributions and suggestions are welcome, with a focus on testing changes for model support and common utilities.
awesome-deliberative-prompting
The 'awesome-deliberative-prompting' repository focuses on how to ask Large Language Models (LLMs) to produce reliable reasoning and make reason-responsive decisions through deliberative prompting. It includes success stories, prompting patterns and strategies, multi-agent deliberation, reflection and meta-cognition, text generation techniques, self-correction methods, reasoning analytics, limitations, failures, puzzles, datasets, tools, and other resources related to deliberative prompting. The repository provides a comprehensive overview of research, techniques, and tools for enhancing reasoning capabilities of LLMs.
Recommendation-Systems-without-Explicit-ID-Features-A-Literature-Review
This repository is a collection of papers and resources related to recommendation systems, focusing on foundation models, transferable recommender systems, large language models, and multimodal recommender systems. It explores questions such as the necessity of ID embeddings, the shift from matching to generating paradigms, and the future of multimodal recommender systems. The papers cover various aspects of recommendation systems, including pretraining, user representation, dataset benchmarks, and evaluation methods. The repository aims to provide insights and advancements in the field of recommendation systems through literature reviews, surveys, and empirical studies.
twelvet
Twelvet is a permission management system based on Spring Cloud Alibaba that serves as a framework for rapid development. It is a scaffolding framework based on microservices architecture, aiming to reduce duplication of business code and provide a common core business code for both microservices and monoliths. It is designed for learning microservices concepts and development, suitable for website management, CMS, CRM, OA, and other system development. The system is intended to quickly meet business needs, improve user experience, and save time by incubating practical functional points in lightweight, highly portable functional plugins.
Crypto-Nft-Airdrop-Tool
Crypto-Nft-Airdrop-Tool is a Python tool designed for conducting airdrops of NFTs in the crypto space. It provides functionality for distributing NFTs to a specified audience efficiently. The tool is compatible with Windows platform and requires Python 3. Users can easily manage and execute airdrop campaigns using this tool, enhancing their engagement with the NFT community. The tool simplifies the process of distributing NFTs and ensures a seamless experience for both creators and recipients.
For similar tasks
gorilla
Gorilla is a tool that enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke. With Gorilla, you can use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. Gorilla also releases APIBench, the largest collection of APIs, curated and easy to be trained on!
one-click-llms
The one-click-llms repository provides templates for quickly setting up an API for language models. It includes advanced inferencing scripts for function calling and offers various models for text generation and fine-tuning tasks. Users can choose between Runpod and Vast.AI for different GPU configurations, with recommendations for optimal performance. The repository also supports Trelis Research and offers templates for different model sizes and types, including multi-modal APIs and chat models.
awesome-llm-json
This repository is an awesome list dedicated to resources for using Large Language Models (LLMs) to generate JSON or other structured outputs. It includes terminology explanations, hosted and local models, Python libraries, blog articles, videos, Jupyter notebooks, and leaderboards related to LLMs and JSON generation. The repository covers various aspects such as function calling, JSON mode, guided generation, and tool usage with different providers and models.
ai-devices
AI Devices Template is a project that serves as an AI-powered voice assistant utilizing various AI models and services to provide intelligent responses to user queries. It supports voice input, transcription, text-to-speech, image processing, and function calling with conditionally rendered UI components. The project includes customizable UI settings, optional rate limiting using Upstash, and optional tracing with Langchain's LangSmith for function execution. Users can clone the repository, install dependencies, add API keys, start the development server, and deploy the application. Configuration settings can be modified in `app/config.tsx` to adjust settings and configurations for the AI-powered voice assistant.
ragtacts
Ragtacts is a Clojure library that allows users to easily interact with Large Language Models (LLMs) such as OpenAI's GPT-4. Users can ask questions to LLMs, create question templates, call Clojure functions in natural language, and utilize vector databases for more accurate answers. Ragtacts also supports RAG (Retrieval-Augmented Generation) method for enhancing LLM output by incorporating external data. Users can use Ragtacts as a CLI tool, API server, or through a RAG Playground for interactive querying.
DelphiOpenAI
Delphi OpenAI API is an unofficial library providing Delphi implementation over OpenAI public API. It allows users to access various models, make completions, chat conversations, generate images, and call functions using OpenAI service. The library aims to facilitate tasks such as content generation, semantic search, and classification through AI models. Users can fine-tune models, work with natural language processing, and apply reinforcement learning methods for diverse applications.
token.js
Token.js is a TypeScript SDK that integrates with over 200 LLMs from 10 providers using OpenAI's format. It allows users to call LLMs, supports tools, JSON outputs, image inputs, and streaming, all running on the client side without the need for a proxy server. The tool is free and open source under the MIT license.
LLMFlex
LLMFlex is a python package designed for developing AI applications with local Large Language Models (LLMs). It provides classes to load LLM models, embedding models, and vector databases to create AI-powered solutions with prompt engineering and RAG techniques. The package supports multiple LLMs with different generation configurations, embedding toolkits, vector databases, chat memories, prompt templates, custom tools, and a chatbot frontend interface. Users can easily create LLMs, load embeddings toolkit, use tools, chat with models in a Streamlit web app, and serve an OpenAI API with a GGUF model. LLMFlex aims to offer a simple interface for developers to work with LLMs and build private AI solutions using local resources.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.
oss-fuzz-gen
This framework generates fuzz targets for real-world `C`/`C++` projects with various Large Language Models (LLM) and benchmarks them via the `OSS-Fuzz` platform. It manages to successfully leverage LLMs to generate valid fuzz targets (which generate non-zero coverage increase) for 160 C/C++ projects. The maximum line coverage increase is 29% from the existing human-written targets.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customerβs subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.