solo-server

solo-server

Platform for Hardware Aware Inference

Stars: 225

Visit
 screenshot

Solo Server is a lightweight server designed for managing hardware-aware inference. It provides seamless setup through a simple CLI and HTTP servers, an open model registry for pulling models from platforms like Ollama and Hugging Face, cross-platform compatibility for effortless deployment of AI models on hardware, and a configurable framework that auto-detects hardware components (CPU, GPU, RAM) and sets optimal configurations.

README:

Solo Server

Solovision Logo

Python 3.9+ License: MIT PyPI - Downloads PyPI - Version

Solo Server is a lightweight server to manage hardware aware inference.

# Install the solo-server package using pip
pip install solo-server

# Run the solo server setup in simple mode
solo setup
SoloStart

Features

  • Seamless Setup: Manage your on device AI with a simple CLI and HTTP servers
  • Open Model Registry: Pull models from registries like Ollama & Hugging Face
  • Cross-Platform Compatibility: Deploy AI models effortlessly on your hardware
  • Configurable Framework: Auto-detect hardware (CPU, GPU, RAM) and sets configs

Table of Contents

Installation

๐Ÿ”นPrerequisites

๐Ÿ”น Install with uv (Recommended)

Install 'uv' using these docs: https://docs.astral.sh/uv/getting-started/installation/

# Install uv
# On Windows (PowerShell)
iwr https://astral.sh/uv/install.ps1 -useb | iex
# If you have admin use, consider: https://github.com/astral-sh/uv/issues/3116
powershell -ExecutionPolicy Bypass -c "pip install uv" 

# On Unix/MacOS
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment
uv venv

# Activate the virtual environment
source .venv/bin/activate  # On Unix/MacOS
# OR
.venv\Scripts\activate     # On Windows
uv pip install solo-server

Creates an isolated environment using uv for performance and stability.

Run the interactive setup to configure Solo Server:

solo setup

๐Ÿ”น Setup Features

โœ”๏ธ Detects CPU, GPU, RAM for hardware-optimized execution
โœ”๏ธ Auto-configures solo.conf with optimal settings
โœ”๏ธ Recommends the compute backend OCI (CUDA, HIP, SYCL, Vulkan, CPU, Metal)


Example Output:

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ System Information โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Operating System: Windows โ”‚
โ”‚ CPU: AMD64 Family 23 Model 96 Stepping 1, AuthenticAMD โ”‚
โ”‚ CPU Cores: 8 โ”‚
โ”‚ Memory: 15.42GB โ”‚
โ”‚ GPU: NVIDIA โ”‚
โ”‚ GPU Model: NVIDIA GeForce GTX 1660 Ti โ”‚
โ”‚ GPU Memory: 6144.0GB โ”‚
โ”‚ Compute Backend: CUDA โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
๐Ÿ”ง Starting Solo Server Setup...
๐Ÿ“Š Available Server Options:
โ€ข Ollama
โ€ข vLLM
โ€ข Llama.cpp

โœจ Ollama is recommended for your system
Choose server [ollama]:

Solo Server Block Diagram

Commands


Serve a Model

solo serve -s ollama -m llama3.2

Command Options:

โ•ญโ”€ Options โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ --server  -s      TEXT     Server type (ollama, vllm, llama.cpp) [default: ollama]                                  โ”‚
โ”‚ --model   -m      TEXT     Model name or path [default: None]                                                       โ”‚
โ”‚ --port    -p      INTEGER  Port to run the server on [default: None]                                                โ”‚
โ”‚ --help                     Show this message and exit.                                                              โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

REST API

You can now use the API endpoint created by the Solo Server to interact with the model. You can send a POST request to http://localhost:11434/api/chat with a JSON payload containing the model name and the messages you want to send to the model.

Generate a response

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt":"Why is the sky blue?"
}'

Chat with a model

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'

Check Model Status

solo status

Example Output:

๐Ÿ”น Running Models:
-------------------------------------------
| Name      | Model   | Backend | Port |
|----------|--------|---------|------|
| llama3   | Llama3 | CUDA    | 8080 |
| gptj     | GPT-J  | CPU     | 8081 |
-------------------------------------------

Stop a Model

solo stop 

Example Output:

๐Ÿ›‘ Stopping Solo Server...
โœ… Solo server stopped successfully.

โš™๏ธ Configuration (solo.json)

After setup, all settings are stored in:

~/.solo_server/solo.json

Example:

# Solo Server Configuration

{
    "hugging_face": {
        "token": ""
    },
    "system_info": {
        "os": "Windows",
        "cpu_model": "AMD64 Family 23 Model 96 Stepping 1, AuthenticAMD",
        "cpu_cores": 8,
        "memory_gb": 15.42,
        "gpu_vendor": "NVIDIA",
        "gpu_model": "NVIDIA GeForce GTX 1660 Ti",
        "gpu_memory": 6144.0,
        "compute_backend": "CUDA"
    },
    "starfish": {
        "api_key": ""
    },
    "hardware": {
        "use_gpu": true
    }
}

๐Ÿ“ Highlight Apps

Refer example_apps for sample applications.

  1. ai-chat

๐Ÿ”น To Contribute, Setup in Dev Mode

# Clone the repository
git clone https://github.com/GetSoloTech/solo-server.git

# Navigate to the directory
cd solo-server

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate  # Unix/MacOS
# OR
.venv\Scripts\activate     # Windows

# Install in editable mode
pip install -e .

๐Ÿ“ Project Inspiration

This project wouldn't be possible without the help of other projects like:

  • uv
  • llama.cpp
  • ramalama
  • ollama
  • whisper.cpp
  • vllm
  • podman
  • huggingface
  • llamafile
  • cog

Like using Solo, consider leaving us a โญ on GitHub

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for solo-server

Similar Open Source Tools

For similar tasks

For similar jobs