
langchain-swift
🚀 LangChain for Swift. Optimized for iOS, macOS, watchOS (part) and visionOS.(beta)
Stars: 369

LangChain for Swift. Optimized for iOS, macOS, watchOS (part) and visionOS.(beta) This is a pure client library, no server required
README:
🚀 LangChain for Swift. Optimized for iOS, macOS, watchOS (part) and visionOS.(beta)
This is a pure client library, no server required
- Please set up before using this library
LC.initSet([
"NOTION_API_KEY":"xx",
"NOTION_ROOT_NODE_ID":"xx",
"OPENAI_API_KEY":"xx",
"OPENAI_API_BASE":"xx",
])
- Set some var like OPENAI_API_KEY or OPENAI_API_BASE
Such as.
OPENAI_API_KEY=sk-xxx
OPENAI_API_BASE=xxx
SUPABASE_URL=xxx
SUPABASE_KEY=xxx
SERPER_API_KEY=xxx
HF_API_KEY=xxx
BAIDU_OCR_AK=xxx
BAIDU_OCR_SK=xxx
BAIDU_LLM_AK=xxx
BAIDU_LLM_SK=xxx
CHATGLM_API_KEY=xxx
OPENWEATHER_API_KEY=xxx
LLAMA2_API_KEY=xxx
GOOGLEAI_API_KEY=xxx
LMSTUDIO_URL=xxx
OLLAMA_URL=xxx
OLLAMA_MODEL=xxx
NOTION_API_KEY=xxx
NOTION_ROOT_NODE_ID=xxx
BILIBILI_SESSION=xxx
BILIBILI_JCT=xxx
🔥 Local Model
Please use 'local' branch, because of dependency on projects. Model here
.package(url: "https://github.com/buhe/langchain-swift", .branch("local"))
Code
Task {
if let modelPath = Bundle.main.path(forResource: "stablelm-3b-4e1t-Q4_K_M", ofType: "txt") {
let local = Local(inference: .GPTNeox_gguf, modelPath: modelPath, useMetal: true)
let r = await local.generate(text: "hi")
print("🥰\(r!.llm_output!)")
} else {
print("⚠️ loss model")
}
}
💬 Chatbots
Code
let template = """
Assistant is a large language model trained by OpenAI.
Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.
Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.
Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.
{history}
Human: {human_input}
Assistant:
"""
let prompt = PromptTemplate(input_variables: ["history", "human_input"], partial_variable: [:], template: template)
let chatgpt_chain = LLMChain(
llm: OpenAI(),
prompt: prompt,
memory: ConversationBufferWindowMemory()
)
Task(priority: .background) {
var input = "I want you to act as a Linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply with the terminal output inside one unique code block, and nothing else. Do not write explanations. Do not type commands unless I instruct you to do so. When I need to tell you something in English I will do so by putting text inside curly brackets {like this}. My first command is pwd."
var res = await chatgpt_chain.predict(args: ["human_input": input])
print(input)
print("🌈:" + res!)
input = "ls ~"
res = await chatgpt_chain.predict(args: ["human_input": input])
print(input)
print("🌈:" + res!)
}
Log
I want you to act as a Linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply with the terminal output inside one unique code block, and nothing else. Do not write explanations. Do not type commands unless I instruct you to do so. When I need to tell you something in English I will do so by putting text inside curly brackets {like this}. My first command is pwd.
🌈:
/home/user
ls ~
🌈:
Desktop Documents Downloads Music Pictures Public Templates Videos
❓ QA bot
An main/Sources/LangChain/vectorstores/supabase/supabase.sql is required.
ref: https://supabase.com/docs/guides/database/extensions/pgvector
Code
Task(priority: .background) {
let loader = TextLoader(file_path: "state_of_the_union.txt")
let documents = await loader.load()
let text_splitter = CharacterTextSplitter(chunk_size: 1000, chunk_overlap: 0)
let embeddings = OpenAIEmbeddings()
let s = Supabase(embeddings: embeddings)
for text in documents {
let docs = text_splitter.split_text(text: text.page_content)
for doc in docs {
await s.addText(text: doc)
}
}
let m = await s.similaritySearch(query: "What did the president say about Ketanji Brown Jackson", k: 1)
print("Q🖥️:What did the president say about Ketanji Brown Jackson")
print("A🚀:\(m)")
}
Log
Q🖥️:What did the president say about Ketanji Brown Jackson
A🚀:[LangChain.MatchedModel(content: Optional("In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. We cannot let this happen. Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. "), similarity: 0.8024642)]
📄 Retriever
Code
Task(priority: .background) {
let retriever = WikipediaRetriever()
let qa = ConversationalRetrievalChain(retriver: retriever, llm: OpenAI())
let questions = [
"What is Apify?",
"When the Monument to the Martyrs of the 1830 Revolution was created?",
"What is the Abhayagiri Vihāra?"
]
var chat_history:[(String, String)] = []
for question in questions{
let result = await qa.predict(args: ["question": question, "chat_history": ConversationalRetrievalChain.get_chat_history(chat_history: chat_history)])
chat_history.append((question, result!))
print("⚠️**Question**: \(question)")
print("✅**Answer**: \(result!)")
}
}
Log
⚠️**Question**: What is Apify?
✅**Answer**: Apify refers to a web scraping and automation platform.
read(descriptor:pointer:size:): Connection reset by peer (errno: 54)
⚠️**Question**: When the Monument to the Martyrs of the 1830 Revolution was created?
✅**Answer**: The Monument to the Martyrs of the 1830 Revolution was created in 1906.
⚠️**Question**: What is the Abhayagiri Vihāra?
✅**Answer**: The term "Abhayagiri Vihāra" refers to a Buddhist monastery in ancient Sri Lanka.
🤖 Agent
Code
let agent = initialize_agent(llm: OpenAI(), tools: [WeatherTool()])
Task(priority: .background) {
let res = await agent.run(args: "Query the weather of this week")
switch res {
case Parsed.str(let str):
print("🌈:" + str)
default: break
}
}
Log
🌈: The weather for this week is sunny.
📡 Router
let physics_template = """
You are a very smart physics professor. \
You are great at answering questions about physics in a concise and easy to understand manner. \
When you don't know the answer to a question you admit that you don't know.
Here is a question:
{input}
"""
let math_template = """
You are a very good mathematician. You are great at answering math questions. \
You are so good because you are able to break down hard problems into their component parts, \
answer the component parts, and then put them together to answer the broader question.
Here is a question:
{input}
"""
let prompt_infos = [
[
"name": "physics",
"description": "Good for answering questions about physics",
"prompt_template": physics_template,
],
[
"name": "math",
"description": "Good for answering math questions",
"prompt_template": math_template,
]
]
let llm = OpenAI()
var destination_chains: [String: DefaultChain] = [:]
for p_info in prompt_infos {
let name = p_info["name"]!
let prompt_template = p_info["prompt_template"]!
let prompt = PromptTemplate(input_variables: ["input"], partial_variable: [:], template: prompt_template)
let chain = LLMChain(llm: llm, prompt: prompt, parser: StrOutputParser())
destination_chains[name] = chain
}
let default_prompt = PromptTemplate(input_variables: [], partial_variable: [:], template: "")
let default_chain = LLMChain(llm: llm, prompt: default_prompt, parser: StrOutputParser())
let destinations = prompt_infos.map{
"\($0["name"]!): \($0["description"]!)"
}
let destinations_str = destinations.joined(separator: "\n")
let router_template = MultiPromptRouter.formatDestinations(destinations: destinations_str)
let router_prompt = PromptTemplate(input_variables: ["input"], partial_variable: [:], template: router_template)
let llmChain = LLMChain(llm: llm, prompt: router_prompt, parser: RouterOutputParser())
let router_chain = LLMRouterChain(llmChain: llmChain)
let chain = MultiRouteChain(router_chain: router_chain, destination_chains: destination_chains, default_chain: default_chain)
Task(priority: .background) {
print("💁🏻♂️", await chain.run(args: "What is black body radiation?"))
}
Log
router text: {
"destination": "physics",
"next_inputs": "What is black body radiation?"
}
💁🏻♂️ str("Black body radiation refers to the electromagnetic radiation emitted by an object that absorbs all incident radiation and reflects or transmits none. It is an idealized concept used in physics to understand the behavior of objects that emit and absorb radiation. \n\nAccording to Planck\'s law, the intensity and spectrum of black body radiation depend on the temperature of the object. As the temperature increases, the peak intensity of the radiation shifts to shorter wavelengths, resulting in a change in color from red to orange, yellow, white, and eventually blue.\n\nBlack body radiation is important in various fields of physics, such as astrophysics, where it helps explain the emission of radiation from stars and other celestial bodies. It also plays a crucial role in understanding the behavior of objects at high temperatures, such as in industrial processes or the study of the early universe.\n\nHowever, it\'s worth noting that while I strive to provide accurate and concise explanations, there may be more intricate details or specific mathematical formulations related to black body radiation that I haven\'t covered.")
ObjectOutputParser
let demo = Book(title: "a", content: "b", unit: Unit(num: 1))
var parser = ObjectOutputParser(demo: demo)
let llm = OpenAI()
let t = PromptTemplate(input_variables: ["query"], partial_variable:["format_instructions": parser.get_format_instructions()], template: "Answer the user query.\n{format_instructions}\n{query}\n")
let chain = LLMChain(llm: llm, prompt: t, parser: parser, inputKey: "query")
Task(priority: .background) {
let pasred = await chain.run(args: "The book title is 123 , content is 456 , num of unit is 7")
switch pasred {
case Parsed.object(let o): print("🚗object: \(o)")
default: break
}
}
EnumOutputParser
enum MyEnum: String, CaseIterable {
case value1
case value2
case value3
}
for v in MyEnum.allCases {
print(v.rawValue)
}
let llm = OpenAI()
let parser = EnumOutputParser<MyEnum>(enumType: MyEnum.self)
let i = parser.get_format_instructions()
print("ins: \(i)")
let t = PromptTemplate(input_variables: ["query"], partial_variable:["format_instructions": parser.get_format_instructions()], template: "Answer the user query.\n{format_instructions}\n{query}\n")
let chain = LLMChain(llm: llm, prompt: t, parser: parser, inputKey: "query")
Task(priority: .background) {
let result = await chain.run(args: "Value is 'value2'")
switch result {
case .enumType(let e):
print("🦙enum: \(e)")
default:
print("parse fail. \(result)")
}
}
Stream Chat - Must be use ChatOpenAI model
Task(priority: .background) {
let eventLoopGroup = MultiThreadedEventLoopGroup(numberOfThreads: 1)
let httpClient = HTTPClient(eventLoopGroupProvider: .shared(eventLoopGroup))
defer {
// it's important to shutdown the httpClient after all requests are done, even if one failed. See: https://github.com/swift-server/async-http-client
try? httpClient.syncShutdown()
}
let llm = ChatOpenAI(httpClient: httpClient, temperature: 0.8)
let answer = await llm.generate(text: "Hey")
print("🥰")
for try await c in answer!.getGeneration()! {
if let message = c {
print(message)
}
}
}
Open an issue or PR to add your app.
- LLMs
- [x] OpenAI
- [x] Hugging Face
- [x] Dalle
- [x] ChatGLM
- [x] ChatOpenAI
- [x] Baidu
- [x] Llama 2
- [x] Gemini
- [x] LMStudio API
- [x] Ollama API
- [x] Local Model
- Vectorstore
- [x] Supabase
- [x] SimilaritySearchKit
- Store
- [x] BaseStore
- [x] InMemoryStore
- [x] FileStore
- Embedding
- [x] OpenAI
- [x] Ollama
- [ ] Distilbert
- Chain
- [x] Base
- [x] LLM
- [x] SimpleSequentialChain
- [x] SequentialChain
- [x] TransformChain
- Router
- [x] LLMRouterChain
- [x] MultiRouteChain
- QA
- [x] ConversationalRetrievalChain
- Tools
- [x] Dummy
- [x] InvalidTool
- [x] Serper
- [x] JavascriptREPLTool(Via JSC)
- [x] GetLocation(Via CoreLocation)
- [x] Weather
- [x] TTSTool
- Agent
- [x] ZeroShotAgent
- Memory
- [x] BaseMemory
- [x] BaseChatMemory
- [x] ConversationBufferWindowMemory
- [x] ReadOnlySharedMemory
- Text Splitter
- [x] CharacterTextSplitter
- [x] RecursiveCharacterTextSplitter
- Document Loader
- [x] TextLoader
- [x] YoutubeLoader
- [x] HtmlLoader
- [x] PDFLoader
- [x] BilibilLoader https://nemo2011.github.io/bilibili-api/#/get-credential
- [x] ImageOCRLoader
- [x] AudioLoader
- [x] NotionLoader
- [x] RSSLoader
- OutputParser
- [x] MRKLOutputParser
- [x] ListOutputParser
- [x] SimpleJsonOutputParser
- [x] StrOutputParser
- [x] RouterOutputParser
- [x] ObjectOutputParser
- [x] EnumOutputParser
- [x] DateOutputParser
- Prompt
- [x] PromptTemplate
- [x] MultiPromptRouter
- Callback
- [x] StdOutCallbackHandler
- LLM Cache
- [x] InMemery
- [x] File
- Retriever
- [x] WikipediaRetriever
- [x] PubmedRetriever
- [x] ParentDocumentRetriever
Open an issue, and let's discuss!
Join Slack: https://join.slack.com/t/langchain-mobile/shared_invite/zt-26tzdzb2u-8RnP7hDQz~MWMg8EeIu0lQ
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for langchain-swift
Similar Open Source Tools

langchain-swift
LangChain for Swift. Optimized for iOS, macOS, watchOS (part) and visionOS.(beta) This is a pure client library, no server required

DeRTa
DeRTa (Refuse Whenever You Feel Unsafe) is a tool designed to improve safety in Large Language Models (LLMs) by training them to refuse compliance at any response juncture. The tool incorporates methods such as MLE with Harmful Response Prefix and Reinforced Transition Optimization (RTO) to address refusal positional bias and strengthen the model's capability to transition from potential harm to safety refusal. DeRTa provides training data, model weights, and evaluation scripts for LLMs, enabling users to enhance safety in language generation tasks.

LLM-Blender
LLM-Blender is a framework for ensembling large language models (LLMs) to achieve superior performance. It consists of two modules: PairRanker and GenFuser. PairRanker uses pairwise comparisons to distinguish between candidate outputs, while GenFuser merges the top-ranked candidates to create an improved output. LLM-Blender has been shown to significantly surpass the best LLMs and baseline ensembling methods across various metrics on the MixInstruct benchmark dataset.

comet-llm
CometLLM is a tool to log and visualize your LLM prompts and chains. Use CometLLM to identify effective prompt strategies, streamline your troubleshooting, and ensure reproducible workflows!

llm-sandbox
LLM Sandbox is a lightweight and portable sandbox environment designed to securely execute large language model (LLM) generated code in a safe and isolated manner using Docker containers. It provides an easy-to-use interface for setting up, managing, and executing code in a controlled Docker environment, simplifying the process of running code generated by LLMs. The tool supports multiple programming languages, offers flexibility with predefined Docker images or custom Dockerfiles, and allows scalability with support for Kubernetes and remote Docker hosts.

halbot
halbot is a Telegram bot that uses ChatGPT, Gemini, Mistral, and other AI engines to provide a variety of services, including text generation, translation, summarization, and question answering. It is easy to use and extend, and it can be integrated into your own projects. halbot is open source and free to use.

FlashLearn
FlashLearn is a tool that provides a simple interface and orchestration for incorporating Agent LLMs into workflows and ETL pipelines. It allows data transformations, classifications, summarizations, rewriting, and custom multi-step tasks using LLMs. Each step and task has a compact JSON definition, making pipelines easy to understand and maintain. FlashLearn supports LiteLLM, Ollama, OpenAI, DeepSeek, and other OpenAI-compatible clients.

OpenAI
OpenAI is a Swift community-maintained implementation over OpenAI public API. It is a non-profit artificial intelligence research organization founded in San Francisco, California in 2015. OpenAI's mission is to ensure safe and responsible use of AI for civic good, economic growth, and other public benefits. The repository provides functionalities for text completions, chats, image generation, audio processing, edits, embeddings, models, moderations, utilities, and Combine extensions.

orch
orch is a library for building language model powered applications and agents for the Rust programming language. It can be used for tasks such as text generation, streaming text generation, structured data generation, and embedding generation. The library provides functionalities for executing various language model tasks and can be integrated into different applications and contexts. It offers flexibility for developers to create language model-powered features and applications in Rust.

mcp-go
MCP Go is a Go implementation of the Model Context Protocol (MCP), facilitating seamless integration between LLM applications and external data sources and tools. It handles complex protocol details and server management, allowing developers to focus on building tools. The tool is designed to be fast, simple, and complete, aiming to provide a high-level and easy-to-use interface for developing MCP servers. MCP Go is currently under active development, with core features working and advanced capabilities in progress.

continuous-eval
Open-Source Evaluation for LLM Applications. `continuous-eval` is an open-source package created for granular and holistic evaluation of GenAI application pipelines. It offers modularized evaluation, a comprehensive metric library covering various LLM use cases, the ability to leverage user feedback in evaluation, and synthetic dataset generation for testing pipelines. Users can define their own metrics by extending the Metric class. The tool allows running evaluation on a pipeline defined with modules and corresponding metrics. Additionally, it provides synthetic data generation capabilities to create user interaction data for evaluation or training purposes.

json-repair
JSON Repair is a toolkit designed to address JSON anomalies that can arise from Large Language Models (LLMs). It offers a comprehensive solution for repairing JSON strings, ensuring accuracy and reliability in your data processing. With its user-friendly interface and extensive capabilities, JSON Repair empowers developers to seamlessly integrate JSON repair into their workflows.

instructor
Instructor is a Python library that makes it a breeze to work with structured outputs from large language models (LLMs). Built on top of Pydantic, it provides a simple, transparent, and user-friendly API to manage validation, retries, and streaming responses. Get ready to supercharge your LLM workflows!

StepWise
StepWise is a code-first, event-driven workflow framework for .NET designed to help users build complex workflows in a simple and efficient way. It allows users to define workflows using C# code, visualize and execute workflows from a browser, execute steps in parallel, and resolve dependencies automatically. StepWise also features an AI assistant called `Geeno` in its WebUI to help users run and analyze workflows with ease.

langchaingo
LangChain Go is a Go language implementation of LangChain, a framework for building applications with LLMs through composability. It provides a simple and easy-to-use API for interacting with LLMs, making it easy to add language-based features to your applications.
For similar tasks

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

onnxruntime-genai
ONNX Runtime Generative AI is a library that provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management. Users can call a high level `generate()` method, or run each iteration of the model in a loop. It supports greedy/beam search and TopP, TopK sampling to generate token sequences, has built in logits processing like repetition penalties, and allows for easy custom scoring.

jupyter-ai
Jupyter AI connects generative AI with Jupyter notebooks. It provides a user-friendly and powerful way to explore generative AI models in notebooks and improve your productivity in JupyterLab and the Jupyter Notebook. Specifically, Jupyter AI offers: * An `%%ai` magic that turns the Jupyter notebook into a reproducible generative AI playground. This works anywhere the IPython kernel runs (JupyterLab, Jupyter Notebook, Google Colab, Kaggle, VSCode, etc.). * A native chat UI in JupyterLab that enables you to work with generative AI as a conversational assistant. * Support for a wide range of generative model providers, including AI21, Anthropic, AWS, Cohere, Gemini, Hugging Face, NVIDIA, and OpenAI. * Local model support through GPT4All, enabling use of generative AI models on consumer grade machines with ease and privacy.

khoj
Khoj is an open-source, personal AI assistant that extends your capabilities by creating always-available AI agents. You can share your notes and documents to extend your digital brain, and your AI agents have access to the internet, allowing you to incorporate real-time information. Khoj is accessible on Desktop, Emacs, Obsidian, Web, and Whatsapp, and you can share PDF, markdown, org-mode, notion files, and GitHub repositories. You'll get fast, accurate semantic search on top of your docs, and your agents can create deeply personal images and understand your speech. Khoj is self-hostable and always will be.

langchain_dart
LangChain.dart is a Dart port of the popular LangChain Python framework created by Harrison Chase. LangChain provides a set of ready-to-use components for working with language models and a standard interface for chaining them together to formulate more advanced use cases (e.g. chatbots, Q&A with RAG, agents, summarization, extraction, etc.). The components can be grouped into a few core modules: * **Model I/O:** LangChain offers a unified API for interacting with various LLM providers (e.g. OpenAI, Google, Mistral, Ollama, etc.), allowing developers to switch between them with ease. Additionally, it provides tools for managing model inputs (prompt templates and example selectors) and parsing the resulting model outputs (output parsers). * **Retrieval:** assists in loading user data (via document loaders), transforming it (with text splitters), extracting its meaning (using embedding models), storing (in vector stores) and retrieving it (through retrievers) so that it can be used to ground the model's responses (i.e. Retrieval-Augmented Generation or RAG). * **Agents:** "bots" that leverage LLMs to make informed decisions about which available tools (such as web search, calculators, database lookup, etc.) to use to accomplish the designated task. The different components can be composed together using the LangChain Expression Language (LCEL).

danswer
Danswer is an open-source Gen-AI Chat and Unified Search tool that connects to your company's docs, apps, and people. It provides a Chat interface and plugs into any LLM of your choice. Danswer can be deployed anywhere and for any scale - on a laptop, on-premise, or to cloud. Since you own the deployment, your user data and chats are fully in your own control. Danswer is MIT licensed and designed to be modular and easily extensible. The system also comes fully ready for production usage with user authentication, role management (admin/basic users), chat persistence, and a UI for configuring Personas (AI Assistants) and their Prompts. Danswer also serves as a Unified Search across all common workplace tools such as Slack, Google Drive, Confluence, etc. By combining LLMs and team specific knowledge, Danswer becomes a subject matter expert for the team. Imagine ChatGPT if it had access to your team's unique knowledge! It enables questions such as "A customer wants feature X, is this already supported?" or "Where's the pull request for feature Y?"

infinity
Infinity is an AI-native database designed for LLM applications, providing incredibly fast full-text and vector search capabilities. It supports a wide range of data types, including vectors, full-text, and structured data, and offers a fused search feature that combines multiple embeddings and full text. Infinity is easy to use, with an intuitive Python API and a single-binary architecture that simplifies deployment. It achieves high performance, with 0.1 milliseconds query latency on million-scale vector datasets and up to 15K QPS.
For similar jobs

langchain-swift
LangChain for Swift. Optimized for iOS, macOS, watchOS (part) and visionOS.(beta) This is a pure client library, no server required

Fay
Fay is an open-source digital human framework that offers different versions for various purposes. The '带货完整版' is suitable for online and offline salespersons. The '助理完整版' serves as a human-machine interactive digital assistant that can also control devices upon command. The 'agent版' is designed to be an autonomous agent capable of making decisions and contacting its owner. The framework provides updates and improvements across its different versions, including features like emotion analysis integration, model optimizations, and compatibility enhancements. Users can access detailed documentation for each version through the provided links.

h2ogpt
h2oGPT is an Apache V2 open-source project that allows users to query and summarize documents or chat with local private GPT LLMs. It features a private offline database of any documents (PDFs, Excel, Word, Images, Video Frames, Youtube, Audio, Code, Text, MarkDown, etc.), a persistent database (Chroma, Weaviate, or in-memory FAISS) using accurate embeddings (instructor-large, all-MiniLM-L6-v2, etc.), and efficient use of context using instruct-tuned LLMs (no need for LangChain's few-shot approach). h2oGPT also offers parallel summarization and extraction, reaching an output of 80 tokens per second with the 13B LLaMa2 model, HYDE (Hypothetical Document Embeddings) for enhanced retrieval based upon LLM responses, a variety of models supported (LLaMa2, Mistral, Falcon, Vicuna, WizardLM. With AutoGPTQ, 4-bit/8-bit, LORA, etc.), GPU support from HF and LLaMa.cpp GGML models, and CPU support using HF, LLaMa.cpp, and GPT4ALL models. Additionally, h2oGPT provides Attention Sinks for arbitrarily long generation (LLaMa-2, Mistral, MPT, Pythia, Falcon, etc.), a UI or CLI with streaming of all models, the ability to upload and view documents through the UI (control multiple collaborative or personal collections), Vision Models LLaVa, Claude-3, Gemini-Pro-Vision, GPT-4-Vision, Image Generation Stable Diffusion (sdxl-turbo, sdxl) and PlaygroundAI (playv2), Voice STT using Whisper with streaming audio conversion, Voice TTS using MIT-Licensed Microsoft Speech T5 with multiple voices and Streaming audio conversion, Voice TTS using MPL2-Licensed TTS including Voice Cloning and Streaming audio conversion, AI Assistant Voice Control Mode for hands-free control of h2oGPT chat, Bake-off UI mode against many models at the same time, Easy Download of model artifacts and control over models like LLaMa.cpp through the UI, Authentication in the UI by user/password via Native or Google OAuth, State Preservation in the UI by user/password, Linux, Docker, macOS, and Windows support, Easy Windows Installer for Windows 10 64-bit (CPU/CUDA), Easy macOS Installer for macOS (CPU/M1/M2), Inference Servers support (oLLaMa, HF TGI server, vLLM, Gradio, ExLLaMa, Replicate, OpenAI, Azure OpenAI, Anthropic), OpenAI-compliant, Server Proxy API (h2oGPT acts as drop-in-replacement to OpenAI server), Python client API (to talk to Gradio server), JSON Mode with any model via code block extraction. Also supports MistralAI JSON mode, Claude-3 via function calling with strict Schema, OpenAI via JSON mode, and vLLM via guided_json with strict Schema, Web-Search integration with Chat and Document Q/A, Agents for Search, Document Q/A, Python Code, CSV frames (Experimental, best with OpenAI currently), Evaluate performance using reward models, and Quality maintained with over 1000 unit and integration tests taking over 4 GPU-hours.

mistral.rs
Mistral.rs is a fast LLM inference platform written in Rust. We support inference on a variety of devices, quantization, and easy-to-use application with an Open-AI API compatible HTTP server and Python bindings.

ollama
Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications. Ollama is designed to be easy to use and accessible to developers of all levels. It is open source and available for free on GitHub.

llama-cpp-agent
The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM models, execute structured function calls and get structured output (objects). It provides a simple yet robust interface and supports llama-cpp-python and OpenAI endpoints with GBNF grammar support (like the llama-cpp-python server) and the llama.cpp backend server. It works by generating a formal GGML-BNF grammar of the user defined structures and functions, which is then used by llama.cpp to generate text valid to that grammar. In contrast to most GBNF grammar generators it also supports nested objects, dictionaries, enums and lists of them.

llama_ros
This repository provides a set of ROS 2 packages to integrate llama.cpp into ROS 2. By using the llama_ros packages, you can easily incorporate the powerful optimization capabilities of llama.cpp into your ROS 2 projects by running GGUF-based LLMs and VLMs.

MITSUHA
OneReality is a virtual waifu/assistant that you can speak to through your mic and it'll speak back to you! It has many features such as: * You can speak to her with a mic * It can speak back to you * Has short-term memory and long-term memory * Can open apps * Smarter than you * Fluent in English, Japanese, Korean, and Chinese * Can control your smart home like Alexa if you set up Tuya (more info in Prerequisites) It is built with Python, Llama-cpp-python, Whisper, SpeechRecognition, PocketSphinx, VITS-fast-fine-tuning, VITS-simple-api, HyperDB, Sentence Transformers, and Tuya Cloud IoT.