lollms
Lord of LLMS
Stars: 266
LoLLMs Server is a text generation server based on large language models. It provides a Flask-based API for generating text using various pre-trained language models. This server is designed to be easy to install and use, allowing developers to integrate powerful text generation capabilities into their applications.
README:
Lord of Large Language Models (LoLLMs) Server is a text generation server based on large language models. It provides a Flask-based API for generating text using various pre-trained language models. This server is designed to be easy to install and use, allowing developers to integrate powerful text generation capabilities into their applications.
- Fully integrated library with access to bindings, personalities and helper tools.
- Generate text using large language models.
- Supports multiple personalities for generating text with different styles and tones.
- Real-time text generation with WebSocket-based communication.
- RESTful API for listing personalities and adding new personalities.
- Easy integration with various applications and frameworks.
- Possibility to send files to personalities
- Possibility to run on multiple nodes and provide a generation service to many outputs at once.
- Data stays local even in the remote version. Only generations are sent to the host node. The logs, data and discussion history are kept in your local disucssion folder.
You can install LoLLMs using pip, the Python package manager. Open your terminal or command prompt and run the following command:
pip install --upgrade lollms
Or if you want to get the latest version from the git:
pip install --upgrade git+https://github.com/ParisNeo/lollms.git
If you want to use cuda. Either install it directly or use conda to install everything:
conda create --name lollms python=3.10
Activate the environment
conda activate lollms
Install cudatoolkit
conda install -c anaconda cudatoolkit
Install lollms
pip install --upgrade lollms
Now you are ready.
To simply configure your environment run the settings app:
lollms-settings
The tool is intuitive and will guide you through configuration process.
The first time you will be prompted to select a binding.
Once the binding is selected, you have to install at least a model. You have two options:
1- install from internet. Just give the link to a model on hugging face. For example. if you select the default llamacpp python bindings (7), you can install this model:
https://huggingface.co/TheBloke/airoboros-7b-gpt4-GGML/resolve/main/airoboros-7b-gpt4.ggmlv3.q4_1.bin
2- install from local drive. Just give the path to a model on your pc. The model will not be copied. We only create a reference to the model. This is useful if you use multiple clients so that you can mutualize your models with other tools.
Now you are ready to use the server.
Here is the smallest possible example that allows you to use the full potential of the tool with nearly no code
from lollms.console import Conversation
cv = Conversation(None)
cv.start_conversation()
Now you can reimplement the start_conversation method to do the things you want:
from lollms.console import Conversation
class MyConversation(Conversation):
def __init__(self, cfg=None):
super().__init__(cfg, show_welcome_message=False)
def start_conversation(self):
prompt = "Once apon a time"
def callback(text, type=None):
print(text, end="", flush=True)
return True
print(prompt, end="", flush=True)
output = self.safe_generate(prompt, callback=callback)
if __name__ == '__main__':
cv = MyConversation()
cv.start_conversation()
Or if you want here is a conversation tool written in few lines
from lollms.console import Conversation
class MyConversation(Conversation):
def __init__(self, cfg=None):
super().__init__(cfg, show_welcome_message=False)
def start_conversation(self):
full_discussion=""
while True:
prompt = input("You: ")
if prompt=="exit":
return
if prompt=="menu":
self.menu.main_menu()
full_discussion += self.personality.user_message_prefix+prompt+self.personality.link_text
full_discussion += self.personality.ai_message_prefix
def callback(text, type=None):
print(text, end="", flush=True)
return True
print(self.personality.name+": ",end="",flush=True)
output = self.safe_generate(full_discussion, callback=callback)
full_discussion += output.strip()+self.personality.link_text
print()
if __name__ == '__main__':
cv = MyConversation()
cv.start_conversation()
Here we use the safe_generate method that does all the cropping for you ,so you can chat forever and will never run out of context.
Once installed, you can start the LoLLMs Server using the lollms-server
command followed by the desired parameters.
lollms-server --host <host> --port <port> --config <config_file> --bindings_path <bindings_path> --personalities_path <personalities_path> --models_path <models_path> --binding_name <binding_name> --model_name <model_name> --personality_full_name <personality_full_name>
-
--host
: The hostname or IP address to bind the server (default: localhost). -
--port
: The port number to run the server (default: 9600). -
--config
: Path to the configuration file (default: None). -
--bindings_path
: The path to the Bindings folder (default: "./bindings_zoo"). -
--personalities_path
: The path to the personalities folder (default: "./personalities_zoo"). -
--models_path
: The path to the models folder (default: "./models"). -
--binding_name
: The default binding to be used (default: "llama_cpp_official"). -
--model_name
: The default model name (default: "Manticore-13B.ggmlv3.q4_0.bin"). -
--personality_full_name
: The full name of the default personality (default: "personality").
Start the server with default settings:
lollms-server
Start the server on a specific host and port:
lollms-server --host 0.0.0.0 --port 5000
-
connect
: Triggered when a client connects to the server. -
disconnect
: Triggered when a client disconnects from the server. -
list_personalities
: List all available personalities. -
add_personality
: Add a new personality to the server. -
generate_text
: Generate text based on the provided prompt and selected personality.
-
GET /personalities
: List all available personalities. -
POST /personalities
: Add a new personality to the server.
Sure! Here are examples of how to communicate with the LoLLMs Server using JavaScript and Python.
// Establish a WebSocket connection with the server
const socket = io.connect('http://localhost:9600');
// Event: When connected to the server
socket.on('connect', () => {
console.log('Connected to the server');
// Request the list of available personalities
socket.emit('list_personalities');
});
// Event: Receive the list of personalities from the server
socket.on('personalities_list', (data) => {
const personalities = data.personalities;
console.log('Available Personalities:', personalities);
// Select a personality and send a text generation request
const selectedPersonality = personalities[0];
const prompt = 'Once upon a time...';
socket.emit('generate_text', { personality: selectedPersonality, prompt: prompt });
});
// Event: Receive the generated text from the server
socket.on('text_generated', (data) => {
const generatedText = data.text;
console.log('Generated Text:', generatedText);
// Do something with the generated text
});
// Event: When disconnected from the server
socket.on('disconnect', () => {
console.log('Disconnected from the server');
});
import socketio
# Create a SocketIO client
sio = socketio.Client()
# Event: When connected to the server
@sio.on('connect')
def on_connect():
print('Connected to the server')
# Request the list of available personalities
sio.emit('list_personalities')
# Event: Receive the list of personalities from the server
@sio.on('personalities_list')
def on_personalities_list(data):
personalities = data['personalities']
print('Available Personalities:', personalities)
# Select a personality and send a text generation request
selected_personality = personalities[0]
prompt = 'Once upon a time...'
sio.emit('generate_text', {'personality': selected_personality, 'prompt': prompt})
# Event: Receive the generated text from the server
@sio.on('text_generated')
def on_text_generated(data):
generated_text = data['text']
print('Generated Text:', generated_text)
# Do something with the generated text
# Event: When disconnected from the server
@sio.on('disconnect')
def on_disconnect():
print('Disconnected from the server')
# Connect to the server
sio.connect('http://localhost:9600')
# Keep the client running
sio.wait()
Make sure to have the necessary dependencies installed for the JavaScript and Python examples. For JavaScript, you need the socket.io-client
package, and for Python, you need the python-socketio
package.
Contributions to the LoLLMs Server project are welcome and appreciated. If you would like to contribute, please follow the guidelines outlined in the CONTRIBUTING.md file.
LoLLMs Server is licensed under the Apache 2.0 License. See the LICENSE file for more information.
The source code for LoLLMs Server can be found on GitHub
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for lollms
Similar Open Source Tools
lollms
LoLLMs Server is a text generation server based on large language models. It provides a Flask-based API for generating text using various pre-trained language models. This server is designed to be easy to install and use, allowing developers to integrate powerful text generation capabilities into their applications.
hugging-chat-api
Unofficial HuggingChat Python API for creating chatbots, supporting features like image generation, web search, memorizing context, and changing LLMs. Users can log in, chat with the ChatBot, perform web searches, create new conversations, manage conversations, switch models, get conversation info, use assistants, and delete conversations. The API also includes a CLI mode with various commands for interacting with the tool. Users are advised not to use the application for high-stakes decisions or advice and to avoid high-frequency requests to preserve server resources.
IntelliNode
IntelliNode is a javascript module that integrates cutting-edge AI models like ChatGPT, LLaMA, WaveNet, Gemini, and Stable diffusion into projects. It offers functions for generating text, speech, and images, as well as semantic search, multi-model evaluation, and chatbot capabilities. The module provides a wrapper layer for low-level model access, a controller layer for unified input handling, and a function layer for abstract functionality tailored to various use cases.
langserve
LangServe helps developers deploy `LangChain` runnables and chains as a REST API. This library is integrated with FastAPI and uses pydantic for data validation. In addition, it provides a client that can be used to call into runnables deployed on a server. A JavaScript client is available in LangChain.js.
aimeos-laravel
Aimeos Laravel is a professional, full-featured, and ultra-fast Laravel ecommerce package that can be easily integrated into existing Laravel applications. It offers a wide range of features including multi-vendor, multi-channel, and multi-warehouse support, fast performance, support for various product types, subscriptions with recurring payments, multiple payment gateways, full RTL support, flexible pricing options, admin backend, REST and GraphQL APIs, modular structure, SEO optimization, multi-language support, AI-based text translation, mobile optimization, and high-quality source code. The package is highly configurable and extensible, making it suitable for e-commerce SaaS solutions, marketplaces, and online shops with millions of vendors.
generative-ai
The 'Generative AI' repository provides a C# library for interacting with Google's Generative AI models, specifically the Gemini models. It allows users to access and integrate the Gemini API into .NET applications, supporting functionalities such as listing available models, generating content, creating tuned models, working with large files, starting chat sessions, and more. The repository also includes helper classes and enums for Gemini API aspects. Authentication methods include API key, OAuth, and various authentication modes for Google AI and Vertex AI. The package offers features for both Google AI Studio and Google Cloud Vertex AI, with detailed instructions on installation, usage, and troubleshooting.
hydraai
Generate React components on-the-fly at runtime using AI. Register your components, and let Hydra choose when to show them in your App. Hydra development is still early, and patterns for different types of components and apps are still being developed. Join the discord to chat with the developers. Expects to be used in a NextJS project. Components that have function props do not work.
py-llm-core
PyLLMCore is a light-weighted interface with Large Language Models with native support for llama.cpp, OpenAI API, and Azure deployments. It offers a Pythonic API that is simple to use, with structures provided by the standard library dataclasses module. The high-level API includes the assistants module for easy swapping between models. PyLLMCore supports various models including those compatible with llama.cpp, OpenAI, and Azure APIs. It covers use cases such as parsing, summarizing, question answering, hallucinations reduction, context size management, and tokenizing. The tool allows users to interact with language models for tasks like parsing text, summarizing content, answering questions, reducing hallucinations, managing context size, and tokenizing text.
julep
Julep is an advanced platform for creating stateful and functional AI apps powered by large language models. It offers features like statefulness by design, automatic function calling, production-ready deployment, cron-like asynchronous functions, 90+ built-in tools, and the ability to switch between different LLMs easily. Users can build AI applications without the need to write code for embedding, saving, and retrieving conversation history, and can connect to third-party applications using Composio. Julep simplifies the process of getting started with AI apps, whether they are conversational, functional, or agentic.
react-native-fast-tflite
A high-performance TensorFlow Lite library for React Native that utilizes JSI for power, zero-copy ArrayBuffers for efficiency, and low-level C/C++ TensorFlow Lite core API for direct memory access. It supports swapping out TensorFlow Models at runtime and GPU-accelerated delegates like CoreML/Metal/OpenGL. Easy VisionCamera integration allows for seamless usage. Users can load TensorFlow Lite models, interpret input and output data, and utilize GPU Delegates for faster computation. The library is suitable for real-time object detection, image classification, and other machine learning tasks in React Native applications.
shortest
Shortest is an AI-powered natural language end-to-end testing framework built on Playwright. It provides a seamless testing experience by allowing users to write tests in natural language and execute them using Anthropic Claude API. The framework also offers GitHub integration with 2FA support, making it suitable for testing web applications with complex authentication flows. Shortest simplifies the testing process by enabling users to run tests locally or in CI/CD pipelines, ensuring the reliability and efficiency of web applications.
aiavatarkit
AIAvatarKit is a tool for building AI-based conversational avatars quickly. It supports various platforms like VRChat and cluster, along with real-world devices. The tool is extensible, allowing unlimited capabilities based on user needs. It requires VOICEVOX API, Google or Azure Speech Services API keys, and Python 3.10. Users can start conversations out of the box and enjoy seamless interactions with the avatars.
llmgraph
llmgraph is a tool that enables users to create knowledge graphs in GraphML, GEXF, and HTML formats by extracting world knowledge from large language models (LLMs) like ChatGPT. It supports various entity types and relationships, offers cache support for efficient graph growth, and provides insights into LLM costs. Users can customize the model used and interact with different LLM providers. The tool allows users to generate interactive graphs based on a specified entity type and Wikipedia link, making it a valuable resource for knowledge graph creation and exploration.
deepgram-js-sdk
Deepgram JavaScript SDK. Power your apps with world-class speech and Language AI models.
tgpt
tgpt is a cross-platform command-line interface (CLI) tool that allows users to interact with AI chatbots in the Terminal without needing API keys. It supports various AI providers such as KoboldAI, Phind, Llama2, Blackbox AI, and OpenAI. Users can generate text, code, and images using different flags and options. The tool can be installed on GNU/Linux, MacOS, FreeBSD, and Windows systems. It also supports proxy configurations and provides options for updating and uninstalling the tool.
HuggingFaceModelDownloader
The HuggingFace Model Downloader is a utility tool for downloading models and datasets from the HuggingFace website. It offers multithreaded downloading for LFS files and ensures the integrity of downloaded models with SHA256 checksum verification. The tool provides features such as nested file downloading, filter downloads for specific LFS model files, support for HuggingFace Access Token, and configuration file support. It can be used as a library or a single binary for easy model downloading and inference in projects.
For similar tasks
agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.
zep-python
Zep is an open-source platform for building and deploying large language model (LLM) applications. It provides a suite of tools and services that make it easy to integrate LLMs into your applications, including chat history memory, embedding, vector search, and data enrichment. Zep is designed to be scalable, reliable, and easy to use, making it a great choice for developers who want to build LLM-powered applications quickly and easily.
lollms
LoLLMs Server is a text generation server based on large language models. It provides a Flask-based API for generating text using various pre-trained language models. This server is designed to be easy to install and use, allowing developers to integrate powerful text generation capabilities into their applications.
LlamaIndexTS
LlamaIndex.TS is a data framework for your LLM application. Use your own data with large language models (LLMs, OpenAI ChatGPT and others) in Typescript and Javascript.
semantic-kernel
Semantic Kernel is an SDK that integrates Large Language Models (LLMs) like OpenAI, Azure OpenAI, and Hugging Face with conventional programming languages like C#, Python, and Java. Semantic Kernel achieves this by allowing you to define plugins that can be chained together in just a few lines of code. What makes Semantic Kernel _special_ , however, is its ability to _automatically_ orchestrate plugins with AI. With Semantic Kernel planners, you can ask an LLM to generate a plan that achieves a user's unique goal. Afterwards, Semantic Kernel will execute the plan for the user.
botpress
Botpress is a platform for building next-generation chatbots and assistants powered by OpenAI. It provides a range of tools and integrations to help developers quickly and easily create and deploy chatbots for various use cases.
BotSharp
BotSharp is an open-source machine learning framework for building AI bot platforms. It provides a comprehensive set of tools and components for developing and deploying intelligent virtual assistants. BotSharp is designed to be modular and extensible, allowing developers to easily integrate it with their existing systems and applications. With BotSharp, you can quickly and easily create AI-powered chatbots, virtual assistants, and other conversational AI applications.
qdrant
Qdrant is a vector similarity search engine and vector database. It is written in Rust, which makes it fast and reliable even under high load. Qdrant can be used for a variety of applications, including: * Semantic search * Image search * Product recommendations * Chatbots * Anomaly detection Qdrant offers a variety of features, including: * Payload storage and filtering * Hybrid search with sparse vectors * Vector quantization and on-disk storage * Distributed deployment * Highlighted features such as query planning, payload indexes, SIMD hardware acceleration, async I/O, and write-ahead logging Qdrant is available as a fully managed cloud service or as an open-source software that can be deployed on-premises.
For similar jobs
ChatFAQ
ChatFAQ is an open-source comprehensive platform for creating a wide variety of chatbots: generic ones, business-trained, or even capable of redirecting requests to human operators. It includes a specialized NLP/NLG engine based on a RAG architecture and customized chat widgets, ensuring a tailored experience for users and avoiding vendor lock-in.
agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.
anything-llm
AnythingLLM is a full-stack application that enables you to turn any document, resource, or piece of content into context that any LLM can use as references during chatting. This application allows you to pick and choose which LLM or Vector Database you want to use as well as supporting multi-user management and permissions.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.
glide
Glide is a cloud-native LLM gateway that provides a unified REST API for accessing various large language models (LLMs) from different providers. It handles LLMOps tasks such as model failover, caching, key management, and more, making it easy to integrate LLMs into applications. Glide supports popular LLM providers like OpenAI, Anthropic, Azure OpenAI, AWS Bedrock (Titan), Cohere, Google Gemini, OctoML, and Ollama. It offers high availability, performance, and observability, and provides SDKs for Python and NodeJS to simplify integration.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
onnxruntime-genai
ONNX Runtime Generative AI is a library that provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management. Users can call a high level `generate()` method, or run each iteration of the model in a loop. It supports greedy/beam search and TopP, TopK sampling to generate token sequences, has built in logits processing like repetition penalties, and allows for easy custom scoring.