langrila
useful tool to use API-based LLM in an easy way
Stars: 62
Langrila is a library that provides an easy way to use API-based LLM (Large Language Models) with an emphasis on simple architecture for readability. It supports various AI models for chat and embedding tasks, as well as retrieval functionalities using Qdrant, Chroma, and Usearch. Langrila also includes modules for function calling, conversation memory management, and prompt templates. It enforces coding policies for simplicity, responsibility independence, and minimum module implementation. The library requires Python version 3.10 to 3.13 and additional dependencies like OpenAI, Gemini, Qdrant, Chroma, and Usearch for specific functionalities.
README:
NOTE: This package is under rebuilding including significant changes on the interface.
Langrila is an open-source third-party python package allows you to develop type-safe multi-agent in an easy way. This package is just personal project.
Widely used existing agent framework is all awesome, but I feel like:
- Often highly abstracted, making their behavior unclear at times and causes low extensibility.
- Code readability has some issues derived from many decorators and special operator when we are developing agent using that framework.
- Handling an agent's state or context is a little annoying. Typically it requires special arguments in functions and the state is updated hidden by tool. This might causes confusion on treaceability of the state. Ideally, functions implemented in regular development should be usable as-is with an agent, and state scope is explained by dependencies of agents, I think.
- The arguments, specifications, and available models of client APIs are frequently updated. To follow that is a little cumbersome.
To address these issues, I breathed new life into langrila, which has the following features:
- Minimal wrapper classes for each client API:
- Wrapper classes are minimally designed, avoiding unnecessary unification of arguments and eliminating processing dependent on model names.
- No need for special argument to inject context into tools:
- There is no requirement to set instances of special classes as arguments when injecting context into tools.
- Support for sub-agents as tools:
- In addition to passing tools to agents, sub-agents can be passed directly to the agent. Sub-agents are dynamically converted into tools internally and controlled by the parent agent, enabling easy construction of multi-agent systems. This feature brings to you intuitive coding and readability of the multi-agent.
- Unified message-response model independent of client APIs:
- Langrila defines a standardized message-response model that allows multi-agent systems to be built across different clients.
- Serializable conversation history:
- The standardized message-response model can be serialized to JSON and easily stored not only in memory but also in formats like JSON, Pickle, Azure Cosmos DB, and AWS S3.
- Type-safe structured output:
- Inspired by PydanticAI.
- Multi-modal I/O:
- Whatever the client supports like image/video/pdf/audio/uri input, image/audio generation, embed texts and so on.
- I will be rolling out support progressively.
- Whatever the client supports like image/video/pdf/audio/uri input, image/audio generation, embed texts and so on.
- Others:
- Automatic retry when error raised.
- Customizable internal prompts.
- Usage gathering for all sub-agents.
- All we have to do to support new client is to implement a single class in many cases.
If necessary, set environment variables to use OpenAI API, Azure OpenAI Service, Gemini API, and Claude API; if using VertexAI or Amazon Bedrock, check each platform's user guide and authenticate in advance VertexAI and Amazon Bedrock.
- OpenAI
- Azure OpenAI
- Gemini on Google AI Studio
- Gemini on VertexAI
- Claude on Anthropic
- Claude on Amazon Bedrock
- Claude on VertexAI (not tested)
Significant breaking changes will be introduced to become agent framework. It's more like a rebuild than just an update. Please be careful to update from the previous version.
Coming soon.
python = ">=3.10,<3.13"
matplotlib = "^3.8.0"
plotly = "^5.17.0"
numpy = "^1.26.1"
pandas = "^2.1.1"
scipy = "^1.11.3"
scikit-learn = "^1.3.2"
pydantic = "^2.10.0"
griffe = "^1.5.1"
loguru = "^0.7.3"
Langrila has various extra installation options. See the following installation section and pyproject.toml.
See extra dependencies section in pyproject.toml for more detail installation options.
# For OpenAI
pip install langrila[openai]
# For Gemini
pip install langrila[gemini]
# For Claude
pip install langrila[claude]
# For multiple clients
pip install langrila[openai,gemini,claude]
# With dependencies to handle specific data. Here is an example using gemini
pip install langrila[gemini,audio,video,pdf]
# With dependencies for specific platform. Here is an example using gemini on VertexAI
pip install langrila[gemini,vertexai]
# With dependencies for specific vectorDB. Here is an example using Qdrant
pip install langrila[openai,qdrant]
# For OpenAI
poetry add langrila --extras openai
# For Gemini
poetry add langrila --extras gemini
# For Claude
poetry add langrila --extras claude
# For multiple clients
poetry add langrila --extras "openai gemini claude"
# With dependencies to handle specific data. Here is an example using gemini
poetry add langrila --extras "gemini audio video pdf"
# With dependencies for specific platform. Here is an example using gemini on VertexAI
poetry add langrila --extras "gemini vertexai"
# With dependencies for specific vectorDB. Here is an example using Qdrant
poetry add langrila --extras "openai qdrant"
git clone [email protected]:taikinman/langrila.git
cd langrila
pip install -e .{extra packages}
# For OpenAI
poetry add --editable /path/to/langrila/ --extras "{extra packages}"
In langrila, we can build orchestrator-typed multi-agent, not graph-based multi-agent. The orchestrator routes the execution of tools to individual agents, aggregates the results, and outputs the final answer.
Here is a fragment of the example code with dummy tools.
from langrila import Agent, InMemoryConversationMemory
from langrila.anthropic import AnthropicClient
from langrila.google import GoogleClient
from langrila.openai import OpenAIClient
from enum import Enum
from pydantic import BaseModel, Field
#################################
# Client wrapper
#################################
# For OpenAI
openai_client = OpenAIClient(api_key_env_name="OPENAI_API_KEY")
# For Gemini on Google AI Studio
google_client = GoogleClient(api_key_env_name="GEMINI_API_KEY")
# For Claude of Anthropic
anthropic_client = AnthropicClient(api_key_env_name="ANTHROPIC_API_KEY")
#################################
# Tool definition
#################################
def power_disco_ball(power: bool) -> bool:
"""
Powers the spinning dissco ball.
Parameters
----------
power : bool
Whether to power the disco ball or not.
Returns
----------
bool
Whether the disco ball is spinning or not.
"""
return f"Disco ball is {'spinning!' if power else 'stopped.'}"
...
#################################
# Definition of response schema
#################################
class DiscoBallSchema(BaseModel):
power: bool = Field(..., description="Whether to power the disco ball.")
spinning: bool = Field(..., description="Whether the disco ball is spinning.")
class MusicGenre(str, Enum):
rock = "rock"
pop = "pop"
jazz = "jazz"
classical = "classical"
hip_hop = "hip-hop"
class MusicSchema(BaseModel):
genre: MusicGenre = Field(
...,
description="The genre of music to play.",
)
bpm: int = Field(
...,
description="The BPM of the music.",
ge=60,
le=180,
)
volume: float = Field(
...,
description="The volume level to set the music to.",
ge=0,
le=1,
)
class LightsSchema(BaseModel):
brightness: float = Field(
...,
description="The brightness level to set the lights to.",
ge=0,
le=1,
)
class ResponseSchema(BaseModel):
disco_ball: DiscoBallSchema = Field(None, description="The disco ball settings.")
music: MusicSchema = Field(None, description="The music settings.")
lights: LightsSchema = Field(None, description="The lights settings.")
#################################
# Orchestration
#################################
lights_agent = Agent(
client=openai_client,
model="gpt-4o-mini-2024-07-18",
temperature=0.0,
tools=[dim_lights, brighten_lights, turn_light_on],
)
disco_ball_agent = Agent(
client=openai_client,
model="gpt-4o-mini-2024-07-18",
temperature=0.0,
tools=[power_disco_ball, stop_disco_ball],
max_tokens=500,
)
music_power_agent = Agent(
client=openai_client,
model="gpt-4o-mini-2024-07-18",
temperature=0.0,
tools=[start_music],
)
music_control_agent = Agent(
client=openai_client,
model="gpt-4o-mini-2024-07-18",
temperature=0.0,
tools=[change_music, adjust_volume, change_bpm],
)
# Orchestrator as a sub-agent
music_agent_orchesrator = Agent(
client=anthropic_client,
model="claude-3-5-sonnet-20240620",
temperature=0.0,
subagents=[music_power_agent, music_control_agent],
max_tokens=500,
)
# Master orchestrator
master_orchesrator = Agent(
client=google_client,
model="gemini-2.0-flash-exp",
temperature=0.0,
subagents=[lights_agent, disco_ball_agent, music_agent_orchesrator],
response_schema_as_tool=ResponseSchema,
conversation_memory=InMemoryConversationMemory(),
)
#################################
# Invoke agent
#################################
prompt = "Turn this place into a party mood."
# synchronous generation
response = master_orchesrator.generate_text(prompt=prompt)
# asynchronous generation
response = await master_orchesrator.generate_text_async(prompt=prompt)
#################################
# Result
#################################
ResponseSchema.model_validate_json(response.contents[0].text)
# >>> ResponseSchema(disco_ball=DiscoBallSchema(power=True, spinning=True), music=MusicSchema(genre=<MusicGenre.pop: 'pop'>, bpm=120, volume=0.7), lights=LightsSchema(brightness=1.0))
#################################
# Usage
#################################
list(response.usage.items())
# >>> [('music_power_agent',
# >>> Usage(model_name='gpt-4o-mini-2024-07-18', prompt_tokens=123, output_tokens=23, raw=None)),
# >>> ('music_agent_orchesrator',
# >>> Usage(model_name='claude-3-5-sonnet-20240620', prompt_tokens=2510, output_tokens=368, raw=None)),
# >>> ('music_control_agent',
# >>> Usage(model_name='gpt-4o-mini-2024-07-18', prompt_tokens=541, output_tokens=83, raw=None)),
# >>> ('lights_agent',
# >>> Usage(model_name='gpt-4o-mini-2024-07-18', prompt_tokens=345, output_tokens=60, raw=None)),
# >>> ('root',
# >>> Usage(model_name='gemini-2.0-flash-exp', prompt_tokens=3273, output_tokens=82, raw=None)),
# >>> ('disco_ball_agent',
# >>> Usage(model_name='gpt-4o-mini-2024-07-18', prompt_tokens=211, output_tokens=31, raw=None))]
- [ ] Error handling more
- [ ] Preparing example notebooks
- [ ] Linting and refactor
- [ ] Supporting Huggingface
- [ ] Aim integration
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for langrila
Similar Open Source Tools
langrila
Langrila is a library that provides an easy way to use API-based LLM (Large Language Models) with an emphasis on simple architecture for readability. It supports various AI models for chat and embedding tasks, as well as retrieval functionalities using Qdrant, Chroma, and Usearch. Langrila also includes modules for function calling, conversation memory management, and prompt templates. It enforces coding policies for simplicity, responsibility independence, and minimum module implementation. The library requires Python version 3.10 to 3.13 and additional dependencies like OpenAI, Gemini, Qdrant, Chroma, and Usearch for specific functionalities.
flow-prompt
Flow Prompt is a dynamic library for managing and optimizing prompts for large language models. It facilitates budget-aware operations, dynamic data integration, and efficient load distribution. Features include CI/CD testing, dynamic prompt development, multi-model support, real-time insights, and prompt testing and evolution.
ChatTTS
ChatTTS is a generative speech model optimized for dialogue scenarios, providing natural and expressive speech synthesis with fine-grained control over prosodic features. It supports multiple speakers and surpasses most open-source TTS models in terms of prosody. The model is trained with 100,000+ hours of Chinese and English audio data, and the open-source version on HuggingFace is a 40,000-hour pre-trained model without SFT. The roadmap includes open-sourcing additional features like VQ encoder, multi-emotion control, and streaming audio generation. The tool is intended for academic and research use only, with precautions taken to limit potential misuse.
PhoGPT
PhoGPT is an open-source 4B-parameter generative model series for Vietnamese, including the base pre-trained monolingual model PhoGPT-4B and its chat variant, PhoGPT-4B-Chat. PhoGPT-4B is pre-trained from scratch on a Vietnamese corpus of 102B tokens, with an 8192 context length and a vocabulary of 20K token types. PhoGPT-4B-Chat is fine-tuned on instructional prompts and conversations, demonstrating superior performance. Users can run the model with inference engines like vLLM and Text Generation Inference, and fine-tune it using llm-foundry. However, PhoGPT has limitations in reasoning, coding, and mathematics tasks, and may generate harmful or biased responses.
taipy
Taipy is an open-source Python library for easy, end-to-end application development, featuring what-if analyses, smart pipeline execution, built-in scheduling, and deployment tools.
infinity
Infinity is an AI-native database designed for LLM applications, providing incredibly fast full-text and vector search capabilities. It supports a wide range of data types, including vectors, full-text, and structured data, and offers a fused search feature that combines multiple embeddings and full text. Infinity is easy to use, with an intuitive Python API and a single-binary architecture that simplifies deployment. It achieves high performance, with 0.1 milliseconds query latency on million-scale vector datasets and up to 15K QPS.
SimplerLLM
SimplerLLM is an open-source Python library that simplifies interactions with Large Language Models (LLMs) for researchers and beginners. It provides a unified interface for different LLM providers, tools for enhancing language model capabilities, and easy development of AI-powered tools and apps. The library offers features like unified LLM interface, generic text loader, RapidAPI connector, SERP integration, prompt template builder, and more. Users can easily set up environment variables, create LLM instances, use tools like SERP, generic text loader, calling RapidAPI APIs, and prompt template builder. Additionally, the library includes chunking functions to split texts into manageable chunks based on different criteria. Future updates will bring more tools, interactions with local LLMs, prompt optimization, response evaluation, GPT Trainer, document chunker, advanced document loader, integration with more providers, Simple RAG with SimplerVectors, integration with vector databases, agent builder, and LLM server.
langfair
LangFair is a Python library for bias and fairness assessments of large language models (LLMs). It offers a comprehensive framework for choosing bias and fairness metrics, demo notebooks, and a technical playbook. Users can tailor evaluations to their use cases with a Bring Your Own Prompts approach. The focus is on output-based metrics practical for governance audits and real-world testing.
viitor-voice
ViiTor-Voice is an LLM based TTS Engine that offers a lightweight design with 0.5B parameters for efficient deployment on various platforms. It provides real-time streaming output with low latency experience, a rich voice library with over 300 voice options, flexible speech rate adjustment, and zero-shot voice cloning capabilities. The tool supports both Chinese and English languages and is suitable for applications requiring quick response and natural speech fluency.
iceburgcrm
Iceburg CRM is a metadata driven CRM with AI abilities that allows users to quickly prototype any CRM. It offers features like metadata creations, import/export in multiple formats, field validation, themes, role permissions, calendar, audit logs, API, workflow, field level relationships, module level relationships, and more. Created with Vue 3 for the frontend, Laravel 10 for the backend, Tailwinds with DaisyUI plugin, and Inertia for routing. Users can install default, admin panel, core, custom, or AI versions. The tool supports AI Assist for module data suggestions and provides API endpoints for CRM modules, search, specific module data, record updates, and deletions. Iceburg CRM also includes themes, custom field types, calendar, datalets, workflow, roles and permissions, import/export functionality, and custom seeding options.
videokit
VideoKit is a full-featured user-generated content solution for Unity Engine, enabling video recording, camera streaming, microphone streaming, social sharing, and conversational interfaces. It is cross-platform, with C# source code available for inspection. Users can share media, save to camera roll, pick from camera roll, stream camera preview, record videos, remove background, caption audio, and convert text commands. VideoKit requires Unity 2022.3+ and supports Android, iOS, macOS, Windows, and WebGL platforms.
mlx-llm
mlx-llm is a library that allows you to run Large Language Models (LLMs) on Apple Silicon devices in real-time using Apple's MLX framework. It provides a simple and easy-to-use API for creating, loading, and using LLM models, as well as a variety of applications such as chatbots, fine-tuning, and retrieval-augmented generation.
LLM-Drop
LLM-Drop is an official implementation of the paper 'What Matters in Transformers? Not All Attention is Needed'. The tool investigates redundancy in transformer-based Large Language Models (LLMs) by analyzing the architecture of Blocks, Attention layers, and MLP layers. It reveals that dropping certain Attention layers can enhance computational and memory efficiency without compromising performance. The tool provides a pipeline for Block Drop and Layer Drop based on LLaMA-Factory, and implements quantization using AutoAWQ and AutoGPTQ.
Phi-3-Vision-MLX
Phi-3-MLX is a versatile AI framework that leverages both the Phi-3-Vision multimodal model and the Phi-3-Mini-128K language model optimized for Apple Silicon using the MLX framework. It provides an easy-to-use interface for a wide range of AI tasks, from advanced text generation to visual question answering and code execution. The project features support for batched generation, flexible agent system, custom toolchains, model quantization, LoRA fine-tuning capabilities, and API integration for extended functionality.
LLM-Blender
LLM-Blender is a framework for ensembling large language models (LLMs) to achieve superior performance. It consists of two modules: PairRanker and GenFuser. PairRanker uses pairwise comparisons to distinguish between candidate outputs, while GenFuser merges the top-ranked candidates to create an improved output. LLM-Blender has been shown to significantly surpass the best LLMs and baseline ensembling methods across various metrics on the MixInstruct benchmark dataset.
cappr
CAPPr is a tool for text classification that does not require training or post-processing. It allows users to have their language models pick from a list of choices or compute the probability of a completion given a prompt. The tool aims to help users get more out of open source language models by simplifying the text classification process. CAPPr can be used with GGUF models, Hugging Face models, models from the OpenAI API, and for tasks like caching instructions, extracting final answers from step-by-step completions, and running predictions in batches with different sets of completions.
For similar tasks
langrila
Langrila is a library that provides an easy way to use API-based LLM (Large Language Models) with an emphasis on simple architecture for readability. It supports various AI models for chat and embedding tasks, as well as retrieval functionalities using Qdrant, Chroma, and Usearch. Langrila also includes modules for function calling, conversation memory management, and prompt templates. It enforces coding policies for simplicity, responsibility independence, and minimum module implementation. The library requires Python version 3.10 to 3.13 and additional dependencies like OpenAI, Gemini, Qdrant, Chroma, and Usearch for specific functionalities.
lunary
Lunary is an open-source observability and prompt platform for Large Language Models (LLMs). It provides a suite of features to help AI developers take their applications into production, including analytics, monitoring, prompt templates, fine-tuning dataset creation, chat and feedback tracking, and evaluations. Lunary is designed to be usable with any model, not just OpenAI, and is easy to integrate and self-host.
minimal-llm-ui
This minimalistic UI serves as a simple interface for Ollama models, enabling real-time interaction with Local Language Models (LLMs). Users can chat with models, switch between different LLMs, save conversations, and create parameter-driven prompt templates. The tool is built using React, Next.js, and Tailwind CSS, with seamless integration with LangchainJs and Ollama for efficient model switching and context storage.
For similar jobs
promptflow
**Prompt flow** is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.
deepeval
DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.
MegaDetector
MegaDetector is an AI model that identifies animals, people, and vehicles in camera trap images (which also makes it useful for eliminating blank images). This model is trained on several million images from a variety of ecosystems. MegaDetector is just one of many tools that aims to make conservation biologists more efficient with AI. If you want to learn about other ways to use AI to accelerate camera trap workflows, check out our of the field, affectionately titled "Everything I know about machine learning and camera traps".
leapfrogai
LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.
llava-docker
This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.
carrot
The 'carrot' repository on GitHub provides a list of free and user-friendly ChatGPT mirror sites for easy access. The repository includes sponsored sites offering various GPT models and services. Users can find and share sites, report errors, and access stable and recommended sites for ChatGPT usage. The repository also includes a detailed list of ChatGPT sites, their features, and accessibility options, making it a valuable resource for ChatGPT users seeking free and unlimited GPT services.
TrustLLM
TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.
AI-YinMei
AI-YinMei is an AI virtual anchor Vtuber development tool (N card version). It supports fastgpt knowledge base chat dialogue, a complete set of solutions for LLM large language models: [fastgpt] + [one-api] + [Xinference], supports docking bilibili live broadcast barrage reply and entering live broadcast welcome speech, supports Microsoft edge-tts speech synthesis, supports Bert-VITS2 speech synthesis, supports GPT-SoVITS speech synthesis, supports expression control Vtuber Studio, supports painting stable-diffusion-webui output OBS live broadcast room, supports painting picture pornography public-NSFW-y-distinguish, supports search and image search service duckduckgo (requires magic Internet access), supports image search service Baidu image search (no magic Internet access), supports AI reply chat box [html plug-in], supports AI singing Auto-Convert-Music, supports playlist [html plug-in], supports dancing function, supports expression video playback, supports head touching action, supports gift smashing action, supports singing automatic start dancing function, chat and singing automatic cycle swing action, supports multi scene switching, background music switching, day and night automatic switching scene, supports open singing and painting, let AI automatically judge the content.