
ezlocalai
ezlocalai is an easy to set up local artificial intelligence server with OpenAI Style Endpoints.
Stars: 67

ezlocalai is an artificial intelligence server that simplifies running multimodal AI models locally. It handles model downloading and server configuration based on hardware specs. It offers OpenAI Style endpoints for integration, voice cloning, text-to-speech, voice-to-text, and offline image generation. Users can modify environment variables for customization. Supports NVIDIA GPU and CPU setups. Provides demo UI and workflow visualization for easy usage.
README:
ezlocalai is an easy set up artificial intelligence server that allows you to easily run multimodal artificial intelligence from your computer. It is designed to be as easy as possible to get started with running local models. It automatically handles downloading the model of your choice and configuring the server based on your CPU, RAM, and GPU specifications. It also includes OpenAI Style endpoints for easy integration with other applications using ezlocalai as an OpenAI API proxy with any model. Additional functionality is built in for voice cloning text to speech and a voice to text for easy voice communication as well as image generation entirely offline after the initial setup.
- Git
- Docker Desktop (Windows or Mac)
- CUDA Toolkit (NVIDIA GPU only)
Additional Linux Prerequisites
- Docker
- Docker Compose
- NVIDIA Container Toolkit (NVIDIA GPU only)
git clone https://github.com/DevXT-LLC/ezlocalai
cd ezlocalai
Expand Environment Setup if you would like to modify the default environment variables, otherwise skip to Usage. All environment variables are optional and have useful defaults. Change the default model that starts with ezlocalai in your .env
file.
Environment Setup (Optional)
None of the values need modified in order to run the server. If you are using an NVIDIA GPU, I would recommend setting the GPU_LAYERS
and MAIN_GPU
environment variables. If you plan to expose the server to the internet, I would recommend setting the EZLOCALAI_API_KEY
environment variable for security. THREADS
is set to your CPU thread count minus 2 by default, if this causes significant performance issues, consider setting the THREADS
environment variable manually to a lower number.
Modify the .env
file to your desired settings. Assumptions will be made on all of these values if you choose to accept the defaults.
Replace the environment variables with your desired settings. Assumptions will be made on all of these values if you choose to accept the defaults.
-
EZLOCALAI_URL
- The URL to use for the server. Default ishttp://localhost:8091
. -
EZLOCALAI_API_KEY
- The API key to use for the server. If not set, the server will not require an API key when accepting requests. -
NGROK_TOKEN
- The ngrok token to use for the server. If not set, ngrok will not be used. Using ngrok will allow you to expose your ezlocalai server to the public with as simple as an API key. Get your free NGROK_TOKEN here. -
DEFAULT_MODEL
- The default model to use when no model is specified. Use the Hugging Face path. Default isTheBloke/phi-2-dpo-GGUF
. -
LLM_MAX_TOKENS
- The maximum number of tokens to use for the language model. If set to0
, it will automatically use the max tokens for the model. Default is0
. -
WHISPER_MODEL
- The model to use for speech-to-text. Default isbase.en
. -
AUTO_UPDATE
- Whether or not to automatically update ezlocalai. Default istrue
. -
THREADS
- The number of CPU threads ezlocalai is allowed to use. Default is 4. -
GPU_LAYERS
(Only applicable to NVIDIA GPU) - The number of layers to use on the GPU. Default is0
. YourGPU_LAYERS
will automatically determine a number of layers to use based on your GPU's memory if it is set to-1
and you have an NVIDIA GPU. If it is set to-2
, it will use the maximum number of layers requested by the model. -
MAIN_GPU
(Only applicable to NVIDIA GPU) - The GPU to use for the language model. Default is0
. -
IMG_ENABLED
- If set to true, models will choose to generate images when they want to based on the user input. This is only available on GPU. Default isfalse
. -
SD_MODEL
- The stable diffusion model to use. Default isstabilityai/sdxl-turbo
. -
VISION_MODEL
- The vision model to use. Default is None. Current options aredeepseek-ai/deepseek-vl-1.3b-chat
anddeepseek-ai/deepseek-vl-7b-chat
.
docker-compose -f docker-compose-cuda.yml down
docker-compose -f docker-compose-cuda.yml build
docker-compose -f docker-compose-cuda.yml up
docker-compose down
docker-compose build
docker-compose up
OpenAI Style endpoints available at http://<YOUR LOCAL IP ADDRESS>:8091/v1/
by default. Documentation can be accessed at that http://localhost:8091 when the server is running.
For examples on how to use the server to communicate with the models, see the Examples Jupyter Notebook once the server is running. We also have an example to use in Google Colab.
You can access the basic demo UI at http://localhost:8502, or your local IP address with port 8502.
graph TD
A[app.py] --> B[FASTAPI]
B --> C[Pipes]
C --> D[LLM]
C --> E[STT]
C --> F[CTTS]
C --> G[IMG]
D --> H[llama_cpp]
D --> I[tiktoken]
D --> J[torch]
E --> K[faster_whisper]
E --> L[pyaudio]
E --> M[webrtcvad]
E --> N[pydub]
F --> O[TTS]
F --> P[torchaudio]
G --> Q[diffusers]
Q --> J
A --> R[Uvicorn]
R --> S[ASGI Server]
A --> T[API Endpoint: /v1/completions]
T --> U[Pipes.get_response]
U --> V{completion_type}
V -->|completion| W[LLM.completion]
V -->|chat| X[LLM.chat]
X --> Y[LLM.generate]
W --> Y
Y --> Z[LLM.create_completion]
Z --> AA[Return response]
AA --> AB{stream}
AB -->|True| AC[StreamingResponse]
AB -->|False| AD[JSON response]
U --> AE[Audio transcription]
AE --> AF{audio_format}
AF -->|Exists| AG[Transcribe audio]
AG --> E
AF -->|None| AH[Skip transcription]
U --> AI[Audio generation]
AI --> AJ{voice}
AJ -->|Exists| AK[Generate audio]
AK --> F
AK --> AL{stream}
AL -->|True| AM[StreamingResponse]
AL -->|False| AN[JSON response with audio URL]
AJ -->|None| AO[Skip audio generation]
U --> AP[Image generation]
AP --> AQ{IMG enabled}
AQ -->|True| AR[Generate image]
AR --> G
AR --> AS[Append image URL to response]
AQ -->|False| AT[Skip image generation]
A --> AU[API Endpoint: /v1/chat/completions]
AU --> U
A --> AV[API Endpoint: /v1/embeddings]
AV --> AW[LLM.embedding]
AW --> AX[LLM.create_embedding]
AX --> AY[Return embedding]
A --> AZ[API Endpoint: /v1/audio/transcriptions]
AZ --> BA[STT.transcribe_audio]
BA --> BB[Return transcription]
A --> BC[API Endpoint: /v1/audio/generation]
BC --> BD[CTTS.generate]
BD --> BE[Return audio URL or base64 audio]
A --> BF[API Endpoint: /v1/models]
BF --> BG[LLM.models]
BG --> BH[Return available models]
A --> BI[CORS Middleware]
BJ[.env] --> BK[Environment Variables]
BK --> A
BL[setup.py] --> BM[ezlocalai package]
BM --> BN[LLM]
BM --> BO[STT]
BM --> BP[CTTS]
BM --> BQ[IMG]
A --> BR[API Key Verification]
BR --> BS[verify_api_key]
A --> BT[Static Files]
BT --> BU[API Endpoint: /outputs]
A --> BV[Ngrok]
BV --> BW[Public URL]
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for ezlocalai
Similar Open Source Tools

ezlocalai
ezlocalai is an artificial intelligence server that simplifies running multimodal AI models locally. It handles model downloading and server configuration based on hardware specs. It offers OpenAI Style endpoints for integration, voice cloning, text-to-speech, voice-to-text, and offline image generation. Users can modify environment variables for customization. Supports NVIDIA GPU and CPU setups. Provides demo UI and workflow visualization for easy usage.

comfy-cli
Comfy-cli is a command line tool designed to facilitate the installation and management of ComfyUI, an open-source machine learning framework. Users can easily set up ComfyUI, install packages, and manage custom nodes directly from the terminal. The tool offers features such as easy installation, seamless package management, custom node management, checkpoint downloads, cross-platform compatibility, and comprehensive documentation. Comfy-cli simplifies the process of working with ComfyUI, making it convenient for users to handle various tasks related to the framework.

gpt-cli
gpt-cli is a command-line interface tool for interacting with various chat language models like ChatGPT, Claude, and others. It supports model customization, usage tracking, keyboard shortcuts, multi-line input, markdown support, predefined messages, and multiple assistants. Users can easily switch between different assistants, define custom assistants, and configure model parameters and API keys in a YAML file for easy customization and management.

openai_trtllm
OpenAI-compatible API for TensorRT-LLM and NVIDIA Triton Inference Server, which allows you to integrate with langchain

vim-ollama
The 'vim-ollama' plugin for Vim adds Copilot-like code completion support using Ollama as a backend, enabling intelligent AI-based code completion and integrated chat support for code reviews. It does not rely on cloud services, preserving user privacy. The plugin communicates with Ollama via Python scripts for code completion and interactive chat, supporting Vim only. Users can configure LLM models for code completion tasks and interactive conversations, with detailed installation and usage instructions provided in the README.

EuroEval
EuroEval is a robust European language model benchmark tool, formerly known as ScandEval. It provides a platform to benchmark pretrained models on various tasks across different languages. Users can evaluate models, datasets, and metrics both online and offline. The tool supports benchmarking from the command line, script, and Docker. Additionally, users can reproduce datasets used in the project using provided scripts. EuroEval welcomes contributions and offers guidelines for general contributions and adding new datasets.

termax
Termax is an LLM agent in your terminal that converts natural language to commands. It is featured by: - Personalized Experience: Optimize the command generation with RAG. - Various LLMs Support: OpenAI GPT, Anthropic Claude, Google Gemini, Mistral AI, and more. - Shell Extensions: Plugin with popular shells like `zsh`, `bash` and `fish`. - Cross Platform: Able to run on Windows, macOS, and Linux.

languagemodels
Language Models is a Python package that provides building blocks to explore large language models with as little as 512MB of RAM. It simplifies the usage of large language models from Python, ensuring all inference is performed locally to keep data private. The package includes features such as text completions, chat capabilities, code completions, external text retrieval, semantic search, and more. It outperforms Hugging Face transformers for CPU inference and offers sensible default models with varying parameters based on memory constraints. The package is suitable for learners and educators exploring the intersection of large language models with modern software development.

fish-ai
fish-ai is a tool that adds AI functionality to Fish shell. It can be integrated with various AI providers like OpenAI, Azure OpenAI, Google, Hugging Face, Mistral, or a self-hosted LLM. Users can transform comments into commands, autocomplete commands, and suggest fixes. The tool allows customization through configuration files and supports switching between contexts. Data privacy is maintained by redacting sensitive information before submission to the AI models. Development features include debug logging, testing, and creating releases.

expo-stable-diffusion
The `expo-stable-diffusion` repository provides a tool for generating images using Stable Diffusion natively on iOS devices within Expo and React Native apps. Users can install and configure the module to create images based on prompts. The repository includes information on updating iOS deployment targets, enabling increased memory limits, and building iOS apps. Additionally, users can obtain Stable Diffusion models from various sources. The repository also addresses troubleshooting tips related to model load times and image generation durations. The developer seeks sponsorship to further enhance the project, including adding Android support.

HuggingFaceGuidedTourForMac
HuggingFaceGuidedTourForMac is a guided tour on how to install optimized pytorch and optionally Apple's new MLX, JAX, and TensorFlow on Apple Silicon Macs. The repository provides steps to install homebrew, pytorch with MPS support, MLX, JAX, TensorFlow, and Jupyter lab. It also includes instructions on running large language models using HuggingFace transformers. The repository aims to help users set up their Macs for deep learning experiments with optimized performance.

browser
Lightpanda Browser is an open-source headless browser designed for fast web automation, AI agents, LLM training, scraping, and testing. It features ultra-low memory footprint, exceptionally fast execution, and compatibility with Playwright and Puppeteer through CDP. Built for performance, Lightpanda offers Javascript execution, support for Web APIs, and is optimized for minimal memory usage. It is a modern solution for web scraping and automation tasks, providing a lightweight alternative to traditional browsers like Chrome.

abliteration
Abliteration is a tool that allows users to create abliterated models using transformers quickly and easily. It is not a tool for uncensorship, but rather for making models that will not explicitly refuse users. Users can clone the repository, install dependencies, and make abliterations using the provided commands. The tool supports adjusting parameters for stubborn models and offers various options for customization. Abliteration can be used for creating modified models for specific tasks or topics.

AirspeedVelocity.jl
AirspeedVelocity.jl is a tool designed to simplify benchmarking of Julia packages over their lifetime. It provides a CLI to generate benchmarks, compare commits/tags/branches, plot benchmarks, and run benchmark comparisons for every submitted PR as a GitHub action. The tool freezes the benchmark script at a specific revision to prevent old history from affecting benchmarks. Users can configure options using CLI flags and visualize benchmark results. AirspeedVelocity.jl can be used to benchmark any Julia package and offers features like generating tables and plots of benchmark results. It also supports custom benchmarks and can be integrated into GitHub actions for automated benchmarking of PRs.

deepeval
DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.

magic-cli
Magic CLI is a command line utility that leverages Large Language Models (LLMs) to enhance command line efficiency. It is inspired by projects like Amazon Q and GitHub Copilot for CLI. The tool allows users to suggest commands, search across command history, and generate commands for specific tasks using local or remote LLM providers. Magic CLI also provides configuration options for LLM selection and response generation. The project is still in early development, so users should expect breaking changes and bugs.
For similar tasks

ezlocalai
ezlocalai is an artificial intelligence server that simplifies running multimodal AI models locally. It handles model downloading and server configuration based on hardware specs. It offers OpenAI Style endpoints for integration, voice cloning, text-to-speech, voice-to-text, and offline image generation. Users can modify environment variables for customization. Supports NVIDIA GPU and CPU setups. Provides demo UI and workflow visualization for easy usage.

BotSharp-UI
BotSharp UI is a web app for managing agents and conversations. It allows users to build new AI assistants quickly using a Node-based Agent building experience. The project is written in SvelteKit v2 and utilizes BotSharp as the LLM services.

stable-diffusion-webui
Stable Diffusion WebUI Docker Image allows users to run Automatic1111 WebUI in a docker container locally or in the cloud. The images do not bundle models or third-party configurations, requiring users to use a provisioning script for container configuration. It supports NVIDIA CUDA, AMD ROCm, and CPU platforms, with additional environment variables for customization and pre-configured templates for Vast.ai and Runpod.io. The service is password protected by default, with options for version pinning, startup flags, and service management using supervisorctl.

ai-accelerator
The AI Accelerator project source code is designed to initialize an OpenShift cluster with a recommended set of operators and components for training, deploying, serving, and monitoring Machine Learning models. It provides core OpenShift features for Data Science environments and can be customized for specific scenarios. The project automates IT infrastructure using GitOps practices, including Git, code review, and CI/CD. ArgoCD Application objects are used to manage the installation of operators on the cluster.

AirGym
AirGym is an open source Python quadrotor simulator based on IsaacGym, providing a high-fidelity dynamics and Deep Reinforcement Learning (DRL) framework for quadrotor robot learning research. It offers a lightweight and customizable platform with strict alignment with PX4 logic, multiple control modes, and Sim-to-Real toolkits. Users can perform tasks such as Hovering, Balloon, Tracking, Avoid, and Planning, with the ability to create customized environments and tasks. The tool also supports training from scratch, visual encoding approaches, playing and testing of trained models, and customization of new tasks and assets.

CVPR2024-Papers-with-Code-Demo
This repository contains a collection of papers and code for the CVPR 2024 conference. The papers cover a wide range of topics in computer vision, including object detection, image segmentation, image generation, and video analysis. The code provides implementations of the algorithms described in the papers, making it easy for researchers and practitioners to reproduce the results and build upon the work of others. The repository is maintained by a team of researchers at the University of California, Berkeley.

ms-copilot-play
Microsoft Copilot Play is a Cloudflare Worker service that accelerates Microsoft Copilot functionalities in China. It allows high-speed access to Microsoft Copilot features like chatting, notebook, plugins, image generation, and sharing. The service filters out meaningless requests used for statistics, saving up to 80% of Cloudflare Worker requests. Users can deploy the service easily with Cloudflare Worker, ensuring fast and unlimited access with no additional operations. The service leverages the power of Microsoft Copilot, based on OpenAI GPT-4, and utilizes Bing search to answer questions.

generative-ai-dart
The Google Generative AI SDK for Dart enables developers to utilize cutting-edge Large Language Models (LLMs) for creating language applications. It provides access to the Gemini API for generating content using state-of-the-art models. Developers can integrate the SDK into their Dart or Flutter applications to leverage powerful AI capabilities. It is recommended to use the SDK for server-side API calls to ensure the security of API keys and protect against potential key exposure in mobile or web apps.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.