
LocalLLMClient
Swift local LLM client for iOS, macOS, Linux
Stars: 82

LocalLLMClient is a Swift package designed to interact with local Large Language Models (LLMs) on Apple platforms. It supports GGUF, MLX models, and the FoundationModels framework, providing streaming API, multimodal capabilities, and tool calling functionalities. Users can easily integrate this tool to work with various models for text generation and processing. The package also includes advanced features for low-level API control and multimodal image processing. LocalLLMClient is experimental and subject to API changes, offering support for iOS, macOS, and Linux platforms.
README:
A Swift package to interact with local Large Language Models (LLMs) on Apple platforms.
Demo / Multimodal
MobileVLM-3B (llama.cpp) | Qwen2.5 VL 3B (MLX) |
---|---|
iPhone 16 Pro
[!IMPORTANT] This project is still experimental. The API is subject to change.
[!TIP] To run larger models more reliably, consider adding
com.apple.developer.kernel.increased-memory-limit
entitlement to your app.
- Support for GGUF / MLX models / FoundationModels framework
- Support for iOS, macOS and Linux
- Streaming API
- Multimodal (experimental)
- Tool calling (experimental)
Add the following dependency to your Package.swift
file:
dependencies: [
.package(url: "https://github.com/tattn/LocalLLMClient.git", branch: "main")
]
The API documentation is available here.
import LocalLLMClient
import LocalLLMClientLlama
let session = LLMSession(model: .llama(
id: "lmstudio-community/gemma-3-4B-it-qat-GGUF",
model: "gemma-3-4B-it-QAT-Q4_0.gguf"
))
print(try await session.respond(to: "Tell me a joke."))
for try await text in session.streamResponse(to: "Write a story about cats.") {
print(text, terminator: "")
}
Using llama.cpp
import LocalLLMClient
import LocalLLMClientLlama
// Create a model
let model = LLMSession.DownloadModel.llama(
id: "lmstudio-community/gemma-3-4B-it-qat-GGUF",
model: "gemma-3-4B-it-QAT-Q4_0.gguf",
parameter: .init(
temperature: 0.7, // Randomness (0.0〜1.0)
topK: 40, // Top-K sampling
topP: 0.9, // Top-P (nucleus) sampling
options: .init(responseFormat: .json) // Response format
)
)
// You can track download progress
try await model.downloadModel { progress in
print("Download progress: \(progress)")
}
// Create a session with the downloaded model
let session = LLMSession(model: model)
// Generate a response with a specific prompt
let response = try await session.respond(to: """
Create the beginning of a synopsis for an epic story with a cat as the main character.
Format it in JSON, as shown below.
{
"title": "<title>",
"content": "<content>",
}
""")
print(response)
// You can also add system messages before asking questions
session.messages = [.system("You are a helpful assistant.")]
Using Apple MLX
import LocalLLMClient
import LocalLLMClientMLX
// Create a model
let model = LLMSession.DownloadModel.mlx(
id: "mlx-community/Qwen3-1.7B-4bit",
parameter: .init(
temperature: 0.7, // Randomness (0.0 to 1.0)
topP: 0.9 // Top-P (nucleus) sampling
)
)
// You can track download progress
try await model.downloadModel { progress in
print("Download progress: \(progress)")
}
// Create a session with the downloaded model
let session = LLMSession(model: model)
// Generate text with system and user messages
session.messages = [.system("You are a helpful assistant.")]
let response = try await session.respond(to: "Tell me a story about a cat.")
print(response)
Using Apple FoundationModels
import LocalLLMClient
import LocalLLMClientFoundationModels
// Available on iOS 26.0+ / macOS 26.0+ and requires Apple Intelligence
let session = LLMSession(model: .foundationModels(
// Use system's default model
model: .default,
// Configure generation options
parameter: .init(
temperature: 0.7,
)
))
// Generate a response with a specific prompt
let response = try await session.respond(to: "Tell me a short story about a clever fox.")
print(response)
LocalLLMClient supports tool calling for integrations with external systems.
[!IMPORTANT] Tool calling is only available with models that support this feature. Each backend has different model compatibility.
Make sure your chosen model explicitly supports tool calling before using this feature.
Using tool calling
import LocalLLMClient
import LocalLLMClientLlama
@Tool("get_weather")
struct GetWeatherTool {
let description = "Get the current weather in a given location"
@ToolArguments
struct Arguments {
@ToolArgument("The city and state, e.g. San Francisco, CA")
var location: String
@ToolArgument("Temperature unit")
var unit: Unit?
@ToolArgumentEnum
enum Unit: String {
case celsius
case fahrenheit
}
}
func call(arguments: Arguments) async throws -> ToolOutput {
// In a real implementation, this would call a weather API
let temp = arguments.unit == .celsius ? "22°C" : "72°F"
return ToolOutput([
"location": arguments.location,
"temperature": temp,
"condition": "sunny"
])
}
}
// Create the tool
let weatherTool = GetWeatherTool()
// Create a session with a model that supports tool calling and register tools
let session = LLMSession(
model: .llama(
id: "Qwen/Qwen2.5-1.5B-Instruct-GGUF",
model: "qwen2.5-1.5b-instruct-q4_k_m.gguf"
),
tools: [weatherTool]
)
// Ask a question that requires tool use
let response = try await session.respond(to: "What's the weather like in Tokyo?")
print(response)
// The model will automatically call the weather tool and include the result in its response
LocalLLMClient also supports multimodal models for processing images.
Using with llama.cpp
import LocalLLMClient
import LocalLLMClientLlama
// Create a session with a multimodal model
let session = LLMSession(model: .llama(
id: "ggml-org/gemma-3-4b-it-GGUF",
model: "gemma-3-4b-it-Q8_0.gguf",
mmproj: "mmproj-model-f16.gguf"
))
// Ask a question about an image
let response = try await session.respond(
to: "What's in this image?",
attachments: [.image(.init(resource: .yourImage))]
)
print(response)
// You can also stream the response
for try await text in session.streamResponse(
to: "Describe this image in detail",
attachments: [.image(.init(resource: .yourImage))]
) {
print(text, terminator: "")
}
Using with Apple MLX
import LocalLLMClient
import LocalLLMClientMLX
// Create a session with a multimodal model
let session = LLMSession(model: .mlx(
id: "mlx-community/Qwen2.5-VL-3B-Instruct-abliterated-4bit"
))
// Ask a question about an image
let response = try await session.respond(
to: "What's in this image?",
attachments: [.image(.init(resource: .yourImage))]
)
print(response)
For more advanced control over model loading and inference, you can use the LocalLLMClient
APIs directly.
Using with llama.cpp
import LocalLLMClient
import LocalLLMClientLlama
import LocalLLMClientUtility
// Download model from Hugging Face (Gemma 3)
let ggufName = "gemma-3-4B-it-QAT-Q4_0.gguf"
let downloader = FileDownloader(source: .huggingFace(
id: "lmstudio-community/gemma-3-4B-it-qat-GGUF",
globs: [ggufName]
))
try await downloader.download { print("Progress: \($0)") }
// Initialize a client with the downloaded model
let modelURL = downloader.destination.appending(component: ggufName)
let client = try await LocalLLMClient.llama(url: modelURL, parameter: .init(
context: 4096, // Context size
temperature: 0.7, // Randomness (0.0〜1.0)
topK: 40, // Top-K sampling
topP: 0.9, // Top-P (nucleus) sampling
options: .init(responseFormat: .json) // Response format
))
let prompt = """
Create the beginning of a synopsis for an epic story with a cat as the main character.
Format it in JSON, as shown below.
{
"title": "<title>",
"content": "<content>",
}
"""
// Generate text
let input = LLMInput.chat([
.system("You are a helpful assistant."),
.user(prompt)
])
for try await text in try await client.textStream(from: input) {
print(text, terminator: "")
}
Using with Apple MLX
import LocalLLMClient
import LocalLLMClientMLX
import LocalLLMClientUtility
// Download model from Hugging Face
let downloader = FileDownloader(
source: .huggingFace(id: "mlx-community/Qwen3-1.7B-4bit", globs: .mlx)
)
try await downloader.download { print("Progress: \($0)") }
// Initialize a client with the downloaded model
let client = try await LocalLLMClient.mlx(url: downloader.destination, parameter: .init(
temperature: 0.7, // Randomness (0.0 to 1.0)
topP: 0.9 // Top-P (nucleus) sampling
))
// Generate text
let input = LLMInput.chat([
.system("You are a helpful assistant."),
.user("Tell me a story about a cat.")
])
for try await text in try await client.textStream(from: input) {
print(text, terminator: "")
}
Using with Apple FoundationModels
import LocalLLMClient
import LocalLLMClientFoundationModels
// Available on iOS 26.0+ / macOS 26.0+ and requires Apple Intelligence
let client = try await LocalLLMClient.foundationModels(
// Use system's default model
model: .default,
// Configure generation options
parameter: .init(
temperature: 0.7,
)
)
// Generate text
let input = LLMInput.chat([
.system("You are a helpful assistant."),
.user("Tell me a short story about a clever fox.")
])
for try await text in try await client.textStream(from: input) {
print(text, terminator: "")
}
Advanced Multimodal with llama.cpp
import LocalLLMClient
import LocalLLMClientLlama
import LocalLLMClientUtility
// Download model from Hugging Face (Gemma 3)
let model = "gemma-3-4b-it-Q8_0.gguf"
let mmproj = "mmproj-model-f16.gguf"
let downloader = FileDownloader(
source: .huggingFace(id: "ggml-org/gemma-3-4b-it-GGUF", globs: [model, mmproj]),
)
try await downloader.download { print("Download: \($0)") }
// Initialize a client with the downloaded model
let client = try await LocalLLMClient.llama(
url: downloader.destination.appending(component: model),
mmprojURL: downloader.destination.appending(component: mmproj)
)
let input = LLMInput.chat([
.user("What's in this image?", attachments: [.image(.init(resource: .yourImage))]),
])
// Generate text without streaming
print(try await client.generateText(from: input))
Advanced Multimodal with Apple MLX
import LocalLLMClient
import LocalLLMClientMLX
import LocalLLMClientUtility
// Download model from Hugging Face (Qwen2.5 VL)
let downloader = FileDownloader(source: .huggingFace(
id: "mlx-community/Qwen2.5-VL-3B-Instruct-abliterated-4bit",
globs: .mlx
))
try await downloader.download { print("Progress: \($0)") }
let client = try await LocalLLMClient.mlx(url: downloader.destination)
let input = LLMInput.chat([
.user("What's in this image?", attachments: [.image(.init(resource: .yourImage))]),
])
// Generate text without streaming
print(try await client.generateText(from: input))
You can use LocalLLMClient directly from the terminal using the command line tool:
# Run using llama.cpp
swift run LocalLLMCLI --model /path/to/your/model.gguf "Your prompt here"
# Run using MLX
./scripts/run_mlx.sh --model https://huggingface.co/mlx-community/Qwen3-1.7B-4bit "Your prompt here"
- LLaMA 3
- Gemma 3 / 2
- Qwen 3 / 2
- Phi 4
Models compatible with llama.cpp backend
Models compatible with MLX backend
If you have a model that works, please open an issue or PR to add it to the list.
- iOS 16.0+ / macOS 14.0+
- Xcode 16.0+
This package uses llama.cpp, Apple's MLX and Foundation Models framework for model inference.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for LocalLLMClient
Similar Open Source Tools

LocalLLMClient
LocalLLMClient is a Swift package designed to interact with local Large Language Models (LLMs) on Apple platforms. It supports GGUF, MLX models, and the FoundationModels framework, providing streaming API, multimodal capabilities, and tool calling functionalities. Users can easily integrate this tool to work with various models for text generation and processing. The package also includes advanced features for low-level API control and multimodal image processing. LocalLLMClient is experimental and subject to API changes, offering support for iOS, macOS, and Linux platforms.

chatluna
Chatluna is a machine learning model plugin that provides chat services with large language models. It is highly extensible, supports multiple output formats, and offers features like custom conversation presets, rate limiting, and context awareness. Users can deploy Chatluna under Koishi without additional configuration. The plugin supports various models/platforms like OpenAI, Azure OpenAI, Google Gemini, and more. It also provides preset customization using YAML files and allows for easy forking and development within Koishi projects. However, the project lacks web UI, HTTP server, and project documentation, inviting contributions from the community.

ai21-python
The AI21 Labs Python SDK is a comprehensive tool for interacting with the AI21 API. It provides functionalities for chat completions, conversational RAG, token counting, error handling, and support for various cloud providers like AWS, Azure, and Vertex. The SDK offers both synchronous and asynchronous usage, along with detailed examples and documentation. Users can quickly get started with the SDK to leverage AI21's powerful models for various natural language processing tasks.

koog
Koog is a Kotlin-based framework for building and running AI agents entirely in idiomatic Kotlin. It allows users to create agents that interact with tools, handle complex workflows, and communicate with users. Key features include pure Kotlin implementation, MCP integration, embedding capabilities, custom tool creation, ready-to-use components, intelligent history compression, powerful streaming API, persistent agent memory, comprehensive tracing, flexible graph workflows, modular feature system, scalable architecture, and multiplatform support.

arcade-ai
Arcade AI is a developer-focused tooling and API platform designed to enhance the capabilities of LLM applications and agents. It simplifies the process of connecting agentic applications with user data and services, allowing developers to concentrate on building their applications. The platform offers prebuilt toolkits for interacting with various services, supports multiple authentication providers, and provides access to different language models. Users can also create custom toolkits and evaluate their tools using Arcade AI. Contributions are welcome, and self-hosting is possible with the provided documentation.

tools
Strands Agents Tools is a community-driven project that provides a powerful set of tools for your agents to use. It bridges the gap between large language models and practical applications by offering ready-to-use tools for file operations, system execution, API interactions, mathematical operations, and more. The tools cover a wide range of functionalities including file operations, shell integration, memory storage, web infrastructure, HTTP client, Slack client, Python execution, mathematical tools, AWS integration, image and video processing, audio output, environment management, task scheduling, advanced reasoning, swarm intelligence, dynamic MCP client, parallel tool execution, browser automation, diagram creation, RSS feed management, and computer automation.

llm
The 'llm' package for Emacs provides an interface for interacting with Large Language Models (LLMs). It abstracts functionality to a higher level, concealing API variations and ensuring compatibility with various LLMs. Users can set up providers like OpenAI, Gemini, Vertex, Claude, Ollama, GPT4All, and a fake client for testing. The package allows for chat interactions, embeddings, token counting, and function calling. It also offers advanced prompt creation and logging capabilities. Users can handle conversations, create prompts with placeholders, and contribute by creating providers.

BentoVLLM
BentoVLLM is an example project demonstrating how to serve and deploy open-source Large Language Models using vLLM, a high-throughput and memory-efficient inference engine. It provides a basis for advanced code customization, such as custom models, inference logic, or vLLM options. The project allows for simple LLM hosting with OpenAI compatible endpoints without the need to write any code. Users can interact with the server using Swagger UI or other methods, and the service can be deployed to BentoCloud for better management and scalability. Additionally, the repository includes integration examples for different LLM models and tools.

cellm
Cellm is an Excel extension that allows users to leverage Large Language Models (LLMs) like ChatGPT within cell formulas. It enables users to extract AI responses to text ranges, making it useful for automating repetitive tasks that involve data processing and analysis. Cellm supports various models from Anthropic, Mistral, OpenAI, and Google, as well as locally hosted models via Llamafiles, Ollama, or vLLM. The tool is designed to simplify the integration of AI capabilities into Excel for tasks such as text classification, data cleaning, content summarization, entity extraction, and more.

mcp-context-forge
MCP Context Forge is a powerful tool for generating context-aware data for machine learning models. It provides functionalities to create diverse datasets with contextual information, enhancing the performance of AI algorithms. The tool supports various data formats and allows users to customize the context generation process easily. With MCP Context Forge, users can efficiently prepare training data for tasks requiring contextual understanding, such as sentiment analysis, recommendation systems, and natural language processing.

pdr_ai_v2
pdr_ai_v2 is a Python library for implementing machine learning algorithms and models. It provides a wide range of tools and functionalities for data preprocessing, model training, evaluation, and deployment. The library is designed to be user-friendly and efficient, making it suitable for both beginners and experienced data scientists. With pdr_ai_v2, users can easily build and deploy machine learning models for various applications, such as classification, regression, clustering, and more.

SpecForge
SpecForge is a powerful tool for generating API specifications from code. It helps developers to easily create and maintain accurate API documentation by extracting information directly from the codebase. With SpecForge, users can streamline the process of documenting APIs, ensuring consistency and reducing manual effort. The tool supports various programming languages and frameworks, making it versatile and adaptable to different development environments. By automating the generation of API specifications, SpecForge enhances collaboration between developers and stakeholders, improving overall project efficiency and quality.

baibot
Baibot is a versatile chatbot framework designed to simplify the process of creating and deploying chatbots. It provides a user-friendly interface for building custom chatbots with various functionalities such as natural language processing, conversation flow management, and integration with external APIs. Baibot is highly customizable and can be easily extended to suit different use cases and industries. With Baibot, developers can quickly create intelligent chatbots that can interact with users in a seamless and engaging manner, enhancing user experience and automating customer support processes.

langfuse-docs
Langfuse Docs is a repository for langfuse.com, built on Nextra. It provides guidelines for contributing to the documentation using GitHub Codespaces and local development setup. The repository includes Python cookbooks in Jupyter notebooks format, which are converted to markdown for rendering on the site. It also covers media management for images, videos, and gifs. The stack includes Nextra, Next.js, shadcn/ui, and Tailwind CSS. Additionally, there is a bundle analysis feature to analyze the production build bundle size using @next/bundle-analyzer.

DelhiLM
DelhiLM is a natural language processing tool for building and training language models. It provides a user-friendly interface for text processing tasks such as tokenization, lemmatization, and language model training. With DelhiLM, users can easily preprocess text data and train custom language models for various NLP applications. The tool supports different languages and allows for fine-tuning pre-trained models to suit specific needs. DelhiLM is designed to be flexible, efficient, and easy to use for both beginners and experienced NLP practitioners.

hujiang_dictionary
Hujiang Dictionary is a tool that provides translation services between Japanese, Chinese, and English. It supports various translation modes such as Japanese to Chinese, Chinese to Japanese, English to Japanese, and more. The tool utilizes cloud services like Telegram, Lambda, and Cloudflare Workers for different deployment options. Users can interact with the tool via a command-line interface (CLI) to perform translations and access online resources like weblio and Google Translate. Additionally, the tool offers a Telegram bot for users to access translation services conveniently. The tool also supports setting up and managing databases for storing translation data.
For similar tasks

LLM-Tool-Survey
This repository contains a collection of papers related to tool learning with large language models (LLMs). The papers are organized according to the survey paper 'Tool Learning with Large Language Models: A Survey'. The survey focuses on the benefits and implementation of tool learning with LLMs, covering aspects such as task planning, tool selection, tool calling, response generation, benchmarks, evaluation, challenges, and future directions in the field. It aims to provide a comprehensive understanding of tool learning with LLMs and inspire further exploration in this emerging area.

tool-ahead-of-time
Tool-Ahead-of-Time (TAoT) is a Python package that enables tool calling for any model available through Langchain's ChatOpenAI library, even before official support is provided. It reformats model output into a JSON parser for tool calling. The package supports OpenAI and non-OpenAI models, following LangChain's syntax for tool calling. Users can start using the tool without waiting for official support, providing a more robust solution for tool calling.

mcphub.nvim
MCPHub.nvim is a powerful Neovim plugin that integrates MCP (Model Context Protocol) servers into your workflow. It offers a centralized config file for managing servers and tools, with an intuitive UI for testing resources. Ideal for LLM integration, it provides programmatic API access and interactive testing through the `:MCPHub` command.

go-utcp
The Universal Tool Calling Protocol (UTCP) is a modern, flexible, and scalable standard for defining and interacting with tools across various communication protocols. It emphasizes scalability, interoperability, and ease of use. It provides built-in transports for HTTP, CLI, Server-Sent Events, streaming HTTP, GraphQL, MCP, and UDP. Users can use the library to construct a client and call tools using the available transports. The library also includes utilities for variable substitution, in-memory repository for storing providers and tools, and OpenAPI conversion to UTCP manuals.

utcp-specification
The Universal Tool Calling Protocol (UTCP) Specification repository contains the official documentation for a modern and scalable standard that enables AI systems and clients to discover and interact with tools across different communication protocols. It defines tool discovery mechanisms, call formats, provider configuration, authentication methods, and response handling.

ailoy
Ailoy is a lightweight library for building AI applications such as agent systems or RAG pipelines with ease. It enables AI features effortlessly, supporting AI models locally or via cloud APIs, multi-turn conversation, system message customization, reasoning-based workflows, tool calling capabilities, and built-in vector store support. It also supports running native-equivalent functionality in web browsers using WASM. The library is in early development stages and provides examples in the `examples` directory for inspiration on building applications with Agents.

LocalLLMClient
LocalLLMClient is a Swift package designed to interact with local Large Language Models (LLMs) on Apple platforms. It supports GGUF, MLX models, and the FoundationModels framework, providing streaming API, multimodal capabilities, and tool calling functionalities. Users can easily integrate this tool to work with various models for text generation and processing. The package also includes advanced features for low-level API control and multimodal image processing. LocalLLMClient is experimental and subject to API changes, offering support for iOS, macOS, and Linux platforms.

ai-sdk-cpp
The AI SDK CPP is a modern C++ toolkit that provides a unified, easy-to-use API for building AI-powered applications with popular model providers like OpenAI and Anthropic. It bridges the gap for C++ developers by offering a clean, expressive codebase with minimal dependencies. The toolkit supports text generation, streaming content, multi-turn conversations, error handling, tool calling, async tool execution, and configurable retries. Future updates will include additional providers, text embeddings, and image generation models. The project also includes a patched version of nlohmann/json for improved thread safety and consistent behavior in multi-threaded environments.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.