
gorilla
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
Stars: 11888

Gorilla is a tool that enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke. With Gorilla, you can use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. Gorilla also releases APIBench, the largest collection of APIs, curated and easy to be trained on!
README:
📢 Check out our detailed Berkeley Function Calling Leaderboard changelog (Last updated:
) for the latest dataset / model updates to the Berkeley Function Calling Leaderboard!
-
🎯 [10/04/2024] Introducing the Agent Arena by Gorilla X LMSYS Chatbot Arena! Compare different agents in tasks like search, finance, RAG, and beyond. Explore which models and tools work best for specific tasks through our novel ranking system and community-driven prompt hub. [Blog] [Arena] [Leaderboard] [Dataset] [Tweet]
-
📣 [09/21/2024] Announcing BFCL V3 - Evaluating multi-turn and multi-step function calling capabilities! New state-based evaluation system tests models on handling complex workflows, sequential functions, and service states. [Blog] [Leaderboard] [Code] [Tweet]
-
🚀 [08/20/2024] Released BFCL V2 • Live! The Berkeley Function-Calling Leaderboard now features enterprise-contributed data and real-world scenarios. [Blog] [Live Leaderboard] [V2 Categories Leaderboard] [Tweet]
-
⚡️ [04/12/2024] Excited to release GoEx - a runtime for LLM-generated actions like code, API calls, and more. Featuring "post-facto validation" for assessing LLM actions after execution, "undo" and "damage confinement" abstractions to manage unintended actions & risks. This paves the way for fully autonomous LLM agents, enhancing interaction between apps & services with human-out-of-loop. [Blog] [Code] [Paper] [Tweet]
-
⏰ [04/01/2024] Introducing cost and latency metrics into Berkeley function calling leaderboard!
-
🚀 [03/15/2024] RAFT: Adapting Language Model to Domain Specific RAG is live! [MSFT-Meta blog] [Berkeley Blog]
-
🏆 [02/26/2024] Berkeley Function Calling Leaderboard is live!
-
🎯 [02/25/2024] OpenFunctions v2 sets new SoTA for open-source LLMs!
-
🔥 [11/16/2023] Excited to release Gorilla OpenFunctions
-
💻 [06/29/2023] Released gorilla-cli, LLMs for your CLI!
-
🟢 [06/06/2023] Released Commercially usable, Apache 2.0 licensed Gorilla models
-
🚀 [05/30/2023] Provided the CLI interface to chat with Gorilla!
-
🚀 [05/28/2023] Released Torch Hub and TensorFlow Hub Models!
-
🚀 [05/27/2023] Released the first Gorilla model!
or 🤗!
-
🔥 [05/27/2023] We released the APIZoo contribution guide for community API contributions!
-
🔥 [05/25/2023] We release the APIBench dataset and the evaluation code of Gorilla!
Gorilla enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke.
With Gorilla, we are the first to demonstrate how to use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. This repository contains inference code for running Gorilla finetuned models, evaluation code for reproducing results from our paper, and APIBench - the largest collection of APIs, curated and easy to be trained on!
Since our initial release, we've served ~500k requests and witnessed incredible adoption by developers worldwide. The project has expanded to include tools, evaluations, leaderboard, end-to-end finetuning recipes, infrastructure components, and the Gorilla API Store:
Project | Type | Description (click to expand) |
---|---|---|
Gorilla Paper | 🤖 Model 📝 Fine-tuning 📚 Dataset 📊 Evaluation 🔧 Infra |
Large Language Model Connected with Massive APIs• Novel finetuning approach for API invocation• Evaluation on 1,600+ APIs (APIBench) • Retrieval-augmented training for test-time adaptation |
Gorilla OpenFunctions-V2 | 🤖 Model | Drop-in alternative for function calling, supporting multiple complex data types and parallel execution• Multiple & parallel function execution with OpenAI-compatible endpoints• Native support for Python, Java, JavaScript, and REST APIs with expanded data types • Function relevance detection to reduce hallucinations • Enhanced RESTful API formatting capabilities • State-of-the-art performance among open-source models |
Berkeley Function Calling Leaderboard (BFCL) | 📊 Evaluation 🏆 Leaderboard 🔧 Function Calling Infra 📚 Dataset |
Comprehensive evaluation of function-calling capabilities• V1: Expert-curated dataset for evaluating single-turn function calling• V2: Enterprise-contributed data for real-world scenarios • V3: Multi-turn & multi-step function calling evaluation • Cost and latency metrics for all models • Interactive API explorer for testing • Community-driven benchmarking platform |
Agent Arena | 📊 Evaluation 🏆 Leaderboard |
Compare LLM agents across models, tools, and frameworks• Head-to-head agent comparisons with ELO rating system• Framework compatibility testing (LangChain, AutoGPT) • Community-driven evaluation platform • Real-world task performance metrics |
Gorilla Execution Engine (GoEx) | 🔧 Infra | Runtime for executing LLM-generated actions with safety guarantees• Post-facto validation for verifying LLM actions after execution• Undo capabilities and damage confinement for risk mitigation • OAuth2 and API key authentication for multiple services • Support for RESTful APIs, databases, and filesystem operations • Docker-based sandboxed execution environment |
Retrieval-Augmented Fine-tuning (RAFT) | 📝 Fine-tuning 🤖 Model |
Fine-tuning LLMs for robust domain-specific retrieval• Novel fine-tuning recipe for domain-specific RAG• Chain-of-thought answers with direct document quotes • Training with oracle and distractor documents • Improved performance on PubMed, HotpotQA, and Gorilla benchmarks • Efficient adaptation of smaller models for domain QA |
Gorilla CLI | 🤖 Model 🔧 Local CLI Infra |
LLMs for your command-line interface• User-friendly CLI tool supporting ~1500 APIs (Kubernetes, AWS, GCP, etc.)• Natural language command generation with multi-LLM fusion • Privacy-focused with explicit execution approval • Command history and interactive selection interface |
Gorilla API Zoo | 📚 Dataset | A community-maintained repository of up-to-date API documentation• Centralized, searchable index of APIs across domains• Structured documentation format with arguments, versioning, and examples • Community-driven updates to keep pace with API changes • Rich data source for model training and fine-tuning • Enables retrieval-augmented training and inference • Reduces hallucination through up-to-date documentation |
Try Gorilla in your browser:
- 🚀 Gorilla Colab Demo: Try the base Gorilla model
- 🌐 Gorilla Gradio Demo: Interactive web interface
- 🔥 OpenFunctions Colab Demo: Try the latest OpenFunctions model
- 🎯 OpenFunctions Website Demo: Experiment with function calling
- 📊 Berkeley Function Calling Leaderboard: Compare function calling capabilities
- Gorilla CLI - Fastest way to get started
pip install gorilla-cli
gorilla generate 100 random characters into a file called test.txt
Learn more about Gorilla CLI →
- Run Gorilla Locally
git clone https://github.com/ShishirPatil/gorilla.git
cd gorilla/inference
Detailed local setup instructions →
- Use OpenFunctions
import openai
openai.api_key = "EMPTY"
openai.api_base = "http://luigi.millennium.berkeley.edu:8000/v1"
# Define your functions
functions = [{
"name": "get_current_weather",
"description": "Get weather in a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}]
# Make API call
completion = openai.ChatCompletion.create(
model="gorilla-openfunctions-v2",
messages=[{"role": "user", "content": "What's the weather in San Francisco?"}],
functions=functions
)
-
📊 Evaluation & Benchmarking
- Berkeley Function Calling Leaderboard: Compare function calling capabilities
- Agent Arena: Evaluate agent workflows
- Gorilla Paper Evaluation Scripts: Run your own evaluations
-
🛠️ Development Tools
- I would like to use Gorilla commercially. Is there going to be an Apache 2.0 licensed version?
Yes! We now have models that you can use commercially without any obligations.
- Can we use Gorilla with other tools like Langchain etc?
Absolutely! You've highlighted a great aspect of our tools. Gorilla is an end-to-end model, specifically tailored to serve correct API calls (tools) without requiring any additional coding. It's designed to work as part of a wider ecosystem and can be flexibly integrated within agentic frameworks and other tools.
Langchain, is a versatile developer tool. Its "agents" can efficiently swap in any LLM, Gorilla included, making it a highly adaptable solution for various needs.
The beauty of these tools truly shines when they collaborate, complementing each other's strengths and capabilities to create an even more powerful and comprehensive solution. This is where your contribution can make a difference. We enthusiastically welcome any inputs to further refine and enhance these tools.
Check out our blog on How to Use Gorilla: A Step-by-Step Walkthrough to see all the different ways you can integrate Gorilla in your projects.
In the immediate future, we plan to release the following:
- [ ] Multimodal function-calling leaderboard
- [ ] Agentic function-calling leaderboard
- [ ] New batch of user contributed live function calling evals.
- [ ] BFCL metrics to evaluate contamination
- [ ] Openfunctions-v3 model to support more languages and multi-turn capability
- [x] Agent Arena to compare LLM agents across models, tools, and frameworks [10/04/2024]
- [x] Multi-turn and multi-step function calling evaluation [09/21/2024]
- [x] User contributed Live Function Calling Leaderboard [08/20/2024]
- [x] BFCL systems metrics including cost and latency [04/01/2024]
- [x] Gorilla Execution Engine (GoEx) - Runtime for executing LLM-generated actions with safety guarantees [04/12/2024]
- [x] Berkeley Function Calling leaderboard (BFCL) for evaluating tool-calling/function-calling models [02/26/2024]
- [x] Openfunctions-v2 with more languages (Java, JS, Python), relevance detection [02/26/2024]
- [x] API Zoo Index for easy access to all APIs [02/16/2024]
- [x] Openfunctions-v1, Apache 2.0, with parallel and multiple function calling [11/16/2023]
- [x] Openfunctions-v0, Apache 2.0 function calling model [11/16/2023]
- [X] Release a commercially usable, Apache 2.0 licensed Gorilla model [06/05/2023]
- [X] Release weights for all APIs from APIBench [05/28/2023]
- [X] Run Gorilla LLM locally [05/28/2023]
- [X] Release weights for HF model APIs [05/27/2023]
- [X] Hosted Gorilla LLM chat for HF model APIs [05/27/2023]
- [X] Opening up the APIZoo for contributions from community
- [X] Dataset and Eval Code
Gorilla is Apache 2.0 licensed, making it suitable for both academic and commercial use.
- 💬 Join our Discord Community
- 🐦 Follow us on X
@article{patil2023gorilla,
title={Gorilla: Large Language Model Connected with Massive APIs},
author={Shishir G. Patil and Tianjun Zhang and Xin Wang and Joseph E. Gonzalez},
year={2023},
journal={arXiv preprint arXiv:2305.15334},
}
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for gorilla
Similar Open Source Tools

gorilla
Gorilla is a tool that enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke. With Gorilla, you can use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. Gorilla also releases APIBench, the largest collection of APIs, curated and easy to be trained on!

read-frog
Read-frog is a powerful text analysis tool designed to help users extract valuable insights from text data. It offers a wide range of features including sentiment analysis, keyword extraction, entity recognition, and text summarization. With its user-friendly interface and robust algorithms, Read-frog is suitable for both beginners and advanced users looking to analyze text data for various purposes such as market research, social media monitoring, and content optimization. Whether you are a data scientist, marketer, researcher, or student, Read-frog can streamline your text analysis workflow and provide actionable insights to drive decision-making and enhance productivity.

anything
Anything is an open automation tool built in Rust that aims to rebuild Zapier, enabling local AI to perform a wide range of tasks beyond chat functionalities. The tool focuses on extensibility without sacrificing understandability, allowing users to create custom extensions in Rust or other interpreted languages like Python or Typescript. It features an embedded SQLite DB, a WYSIWYG editor, event system, cron trigger, HTTP and CLI extensions, with plans for additional extensions like Deno, Python, and Local AI. The tool is designed to be user-friendly, with a file-first state approach, portable triggers, actions, and flows, and a human-centric file and folder naming convention. It does not require Docker, making it easy to run on low-powered devices for 24/7 self-hosting. The event processing is focused on simplicity and visibility, with extensibility through custom extensions and a marketplace for templates, actions, and triggers.

transformer-tricks
A collection of tricks to simplify and speed up transformer models by removing parts from neural networks. Includes Flash normalization, slim attention, matrix-shrink, precomputing the first layer, and removing weights from skipless transformers. Follows recent trends in neural network optimization.

esp-ai
ESP-AI provides a complete AI conversation solution for your development board, including IAT+LLM+TTS integration solutions for ESP32 series development boards. It can be injected into projects without affecting existing ones. By providing keys from platforms like iFlytek, Jiling, and local services, you can run the services without worrying about interactions between services or between development boards and services. The project's server-side code is based on Node.js, and the hardware code is based on Arduino IDE.

Flare
Flare is an open-source AI-powered decentralized social network client for Android/iOS/macOS, consolidating multiple social networks into one platform. It allows cross-posting content, ensures privacy, and plans to implement features like mixed timeline, AI-powered functions, and support for various platforms. The project is in active development and aims to provide a seamless social networking experience for users.

gptel
GPTel is a simple Large Language Model chat client for Emacs, with support for multiple models and backends. It's async and fast, streams responses, and interacts with LLMs from anywhere in Emacs. LLM responses are in Markdown or Org markup. Supports conversations and multiple independent sessions. Chats can be saved as regular Markdown/Org/Text files and resumed later. You can go back and edit your previous prompts or LLM responses when continuing a conversation. These will be fed back to the model. Don't like gptel's workflow? Use it to create your own for any supported model/backend with a simple API.

awesome-weather-models
A catalogue and categorization of AI-based weather forecasting models. This page provides a catalogue and categorization of AI-based weather forecasting models to enable discovery and comparison of different available model options. The weather models are categorized based on metadata found in the JSON schema specification. The table includes information such as the name of the weather model, the organization that developed it, operational data availability, open-source status, and links for further details.

traceroot
TraceRoot is a tool that helps engineers debug production issues 10× faster using AI-powered analysis of traces, logs, and code context. It accelerates the debugging process with AI-powered insights, integrates seamlessly into the development workflow, provides real-time trace and log analysis, code context understanding, and intelligent assistance. Features include ease of use, LLM flexibility, distributed services, AI debugging interface, and integration support. Users can get started with TraceRoot Cloud for a 7-day trial or self-host the tool. SDKs are available for Python and JavaScript/TypeScript.

core
OpenSumi is a framework designed to help users quickly build AI Native IDE products. It provides a set of tools and templates for creating Cloud IDEs, Desktop IDEs based on Electron, CodeBlitz web IDE Framework, Lite Web IDE on the Browser, and Mini-App liked IDE. The framework also offers documentation for users to refer to and a detailed guide on contributing to the project. OpenSumi encourages contributions from the community and provides a platform for users to report bugs, contribute code, or improve documentation. The project is licensed under the MIT license and contains third-party code under other open source licenses.

LakeSoul
LakeSoul is a cloud-native Lakehouse framework that supports scalable metadata management, ACID transactions, efficient and flexible upsert operation, schema evolution, and unified streaming & batch processing. It supports multiple computing engines like Spark, Flink, Presto, and PyTorch, and computing modes such as batch, stream, MPP, and AI. LakeSoul scales metadata management and achieves ACID control by using PostgreSQL. It provides features like automatic compaction, table lifecycle maintenance, redundant data cleaning, and permission isolation for metadata.

efficient-transformers
Efficient Transformers Library provides reimplemented blocks of Large Language Models (LLMs) to make models functional and highly performant on Qualcomm Cloud AI 100. It includes graph transformations, handling for under-flows and overflows, patcher modules, exporter module, sample applications, and unit test templates. The library supports seamless inference on pre-trained LLMs with documentation for model optimization and deployment. Contributions and suggestions are welcome, with a focus on testing changes for model support and common utilities.

Recommendation-Systems-without-Explicit-ID-Features-A-Literature-Review
This repository is a collection of papers and resources related to recommendation systems, focusing on foundation models, transferable recommender systems, large language models, and multimodal recommender systems. It explores questions such as the necessity of ID embeddings, the shift from matching to generating paradigms, and the future of multimodal recommender systems. The papers cover various aspects of recommendation systems, including pretraining, user representation, dataset benchmarks, and evaluation methods. The repository aims to provide insights and advancements in the field of recommendation systems through literature reviews, surveys, and empirical studies.

LLMRec
LLMRec is a PyTorch implementation for the WSDM 2024 paper 'Large Language Models with Graph Augmentation for Recommendation'. It is a novel framework that enhances recommenders by applying LLM-based graph augmentation strategies to recommendation systems. The tool aims to make the most of content within online platforms to augment interaction graphs by reinforcing u-i interactive edges, enhancing item node attributes, and conducting user node profiling from a natural language perspective.

dom-to-semantic-markdown
DOM to Semantic Markdown is a tool that converts HTML DOM to Semantic Markdown for use in Large Language Models (LLMs). It maximizes semantic information, token efficiency, and preserves metadata to enhance LLMs' processing capabilities. The tool captures rich web content structure, including semantic tags, image metadata, table structures, and link destinations. It offers customizable conversion options and supports both browser and Node.js environments.
For similar tasks

gorilla
Gorilla is a tool that enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke. With Gorilla, you can use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. Gorilla also releases APIBench, the largest collection of APIs, curated and easy to be trained on!

one-click-llms
The one-click-llms repository provides templates for quickly setting up an API for language models. It includes advanced inferencing scripts for function calling and offers various models for text generation and fine-tuning tasks. Users can choose between Runpod and Vast.AI for different GPU configurations, with recommendations for optimal performance. The repository also supports Trelis Research and offers templates for different model sizes and types, including multi-modal APIs and chat models.

awesome-llm-json
This repository is an awesome list dedicated to resources for using Large Language Models (LLMs) to generate JSON or other structured outputs. It includes terminology explanations, hosted and local models, Python libraries, blog articles, videos, Jupyter notebooks, and leaderboards related to LLMs and JSON generation. The repository covers various aspects such as function calling, JSON mode, guided generation, and tool usage with different providers and models.

ai-devices
AI Devices Template is a project that serves as an AI-powered voice assistant utilizing various AI models and services to provide intelligent responses to user queries. It supports voice input, transcription, text-to-speech, image processing, and function calling with conditionally rendered UI components. The project includes customizable UI settings, optional rate limiting using Upstash, and optional tracing with Langchain's LangSmith for function execution. Users can clone the repository, install dependencies, add API keys, start the development server, and deploy the application. Configuration settings can be modified in `app/config.tsx` to adjust settings and configurations for the AI-powered voice assistant.

ragtacts
Ragtacts is a Clojure library that allows users to easily interact with Large Language Models (LLMs) such as OpenAI's GPT-4. Users can ask questions to LLMs, create question templates, call Clojure functions in natural language, and utilize vector databases for more accurate answers. Ragtacts also supports RAG (Retrieval-Augmented Generation) method for enhancing LLM output by incorporating external data. Users can use Ragtacts as a CLI tool, API server, or through a RAG Playground for interactive querying.

DelphiOpenAI
Delphi OpenAI API is an unofficial library providing Delphi implementation over OpenAI public API. It allows users to access various models, make completions, chat conversations, generate images, and call functions using OpenAI service. The library aims to facilitate tasks such as content generation, semantic search, and classification through AI models. Users can fine-tune models, work with natural language processing, and apply reinforcement learning methods for diverse applications.

token.js
Token.js is a TypeScript SDK that integrates with over 200 LLMs from 10 providers using OpenAI's format. It allows users to call LLMs, supports tools, JSON outputs, image inputs, and streaming, all running on the client side without the need for a proxy server. The tool is free and open source under the MIT license.

osaurus
Osaurus is a native, Apple Silicon-only local LLM server built on Apple's MLX for maximum performance on M‑series chips. It is a SwiftUI app + SwiftNIO server with OpenAI‑compatible and Ollama‑compatible endpoints. The tool supports native MLX text generation, model management, streaming and non‑streaming chat completions, OpenAI‑compatible function calling, real-time system resource monitoring, and path normalization for API compatibility. Osaurus is designed for macOS 15.5+ and Apple Silicon (M1 or newer) with Xcode 16.4+ required for building from source.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.

oss-fuzz-gen
This framework generates fuzz targets for real-world `C`/`C++` projects with various Large Language Models (LLM) and benchmarks them via the `OSS-Fuzz` platform. It manages to successfully leverage LLMs to generate valid fuzz targets (which generate non-zero coverage increase) for 160 C/C++ projects. The maximum line coverage increase is 29% from the existing human-written targets.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.