ramalama

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

Stars: 2148

Visit

The Ramalama project simplifies working with AI by utilizing OCI containers. It automatically detects GPU support, pulls necessary software in a container, and runs AI models. Users can list, pull, run, and serve models easily. The tool aims to support various GPUs and platforms in the future, making AI setup hassle-free.

README:

RamaLama strives to make working with AI simple, straightforward, and familiar by using OCI containers.

Description

RamaLama is an open-source tool that simplifies the local use and serving of AI models for inference from any source through the familiar approach of containers. It allows engineers to use container-centric development patterns and benefits to extend to AI use cases.

RamaLama eliminates the need to configure the host system by instead pulling a container image specific to the GPUs discovered on the host system, and allowing you to work with various models and platforms.

Eliminates the complexity for users to configure the host system for AI.
Detects and pulls an accelerated container image specific to the GPUs on the host system, handling dependencies and hardware optimization.
RamaLama supports multiple AI model registries, including OCI Container Registries.
Models are treated similarly to how Podman and Docker treat container images.
Use common container commands to work with AI models.
Run AI models securely in rootless containers, isolating the model from the underlying host.
Keep data secure by defaulting to no network access and removing all temporary data on application exits.
Interact with models via REST API or as a chatbot.

Install

Install on Fedora

RamaLama is available in Fedora and later. To install it, run:

sudo dnf install python3-ramalama

Install via PyPI

RamaLama is available via PyPI at https://pypi.org/project/ramalama

pip install ramalama

Install script (Linux and macOS)

Install RamaLama by running:

curl -fsSL https://ramalama.ai/install.sh | bash

Accelerated images

Accelerator	Image
GGML_VK_VISIBLE_DEVICES (or CPU)	quay.io/ramalama/ramalama
HIP_VISIBLE_DEVICES	quay.io/ramalama/rocm
CUDA_VISIBLE_DEVICES	quay.io/ramalama/cuda
ASAHI_VISIBLE_DEVICES	quay.io/ramalama/asahi
INTEL_VISIBLE_DEVICES	quay.io/ramalama/intel-gpu
ASCEND_VISIBLE_DEVICES	quay.io/ramalama/cann
MUSA_VISIBLE_DEVICES	quay.io/ramalama/musa

GPU support inspection

On first run, RamaLama inspects your system for GPU support, falling back to CPU if none are present. RamaLama uses container engines like Podman or Docker to pull the appropriate OCI image with all necessary software to run an AI Model for your system setup.

How does RamaLama select the right image?

After initialization, RamaLama runs AI Models within a container based on the OCI image. RamaLama pulls container images specific to the GPUs discovered on your system. These images are tied to the minor version of RamaLama.

For example, RamaLama version 1.2.3 on an NVIDIA system pulls quay.io/ramalama/cuda:1.2. To override the default image, use the --image option.

RamaLama then pulls AI Models from model registries, starting a chatbot or REST API service from a simple single command. Models are treated similarly to how Podman and Docker treat container images.

Hardware Support

Hardware	Enabled
CPU	✓
Apple Silicon GPU (Linux / Asahi)	✓
Apple Silicon GPU (macOS)	✓ llama.cpp or MLX
Apple Silicon GPU (podman-machine)	✓
Nvidia GPU (cuda)	✓ See note below
AMD GPU (rocm, vulkan)	✓
Ascend NPU (Linux)	✓
Intel ARC GPUs (Linux)	✓ See note below
Intel GPUs (vulkan / Linux)	✓
Moore Threads GPU (musa / Linux)	✓ See note below

Nvidia GPUs

On systems with NVIDIA GPUs, see ramalama-cuda documentation for the correct host system configuration.

Intel GPUs

The following Intel GPUs are auto-detected by RamaLama:

GPU ID	Description
`0xe20b`	Intel® Arc™ B580 Graphics
`0xe20c`	Intel® Arc™ B570 Graphics
`0x7d51`	Intel® Graphics - Arrow Lake-H
`0x7dd5`	Intel® Graphics - Meteor Lake
`0x7d55`	Intel® Arc™ Graphics - Meteor Lake

See the Intel hardware table for more information.

Moore Threads GPUs

On systems with Moore Threads GPUs, see ramalama-musa documentation for the correct host system configuration.

MLX Runtime (macOS only)

The MLX runtime provides optimized inference for Apple Silicon Macs. MLX requires:

macOS operating system
Apple Silicon hardware (M1, M2, M3, or later)
Usage with --nocontainer option (containers are not supported)
The mlx-lm Python package installed on the host system

To install and run Phi-4 on MLX, use either uv or pip:

uv pip install mlx-lm
# or pip:
pip install mlx-lm

ramalama --runtime=mlx serve hf://mlx-community/Unsloth-Phi-4-4bit

Default Container Engine

When both Podman and Docker are installed, RamaLama defaults to Podman. The RAMALAMA_CONTAINER_ENGINE=docker environment variable can override this behaviour. When neither are installed, RamaLama will attempt to run the model with software on the local system.

Security

Test and run your models more securely

Because RamaLama defaults to running AI models inside rootless containers using Podman or Docker, these containers isolate the AI models from information on the underlying host. With RamaLama containers, the AI model is mounted as a volume into the container in read-only mode.

This results in the process running the model (llama.cpp or vLLM) being isolated from the host. Additionally, since ramalama run uses the --network=none option, the container cannot reach the network and leak any information out of the system. Finally, containers are run with the --rm option, which means any content written during container execution is deleted when the application exits.

Here’s how RamaLama delivers a robust security footprint:

Container Isolation – AI models run within isolated containers, preventing direct access to the host system.
Read-Only Volume Mounts – The AI model is mounted in read-only mode, which means that processes inside the container cannot modify the host files.
No Network Access – ramalama run is executed with --network=none, meaning the model has no outbound connectivity for which information can be leaked.
Auto-Cleanup – Containers run with --rm, wiping out any temporary data once the session ends.
Drop All Linux Capabilities – No access to Linux capabilities to attack the underlying host.
No New Privileges – Linux Kernel feature that disables container processes from gaining additional privileges.

Transports

RamaLama supports multiple AI model registries types called transports.

Supported transports

Transports	Web Site
HuggingFace	`huggingface.co`
ModelScope	`modelscope.cn`
Ollama	`ollama.com`
OCI Container Registries	`opencontainers.org`
	Examples: `quay.io`, `Docker Hub`, `Pulp`, and `Artifactory`

Default Transport

RamaLama uses the Ollama registry transport by default

How to change transports.

Use the RAMALAMA_TRANSPORT environment variable to modify the default. export RAMALAMA_TRANSPORT=huggingface Changes RamaLama to use huggingface transport.

Individual model transports can be modified when specifying a model via the huggingface://, oci://, modelscope://, or ollama:// prefix.

Example:

ramalama pull huggingface://afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf

Transport shortnames

To make it easier for users, RamaLama uses shortname files, which contain alias names for fully specified AI Models, allowing users to refer to models using shorter names.

More information on shortnames.

RamaLama reads shortnames.conf files if they exist. These files contain a list of name-value pairs that specify the model. The following table specifies the order in which RamaLama reads the files. Any duplicate names that exist override previously defined shortnames.

Shortnames type	Path
Development	./shortnames.conf
User (Config)	$HOME/.config/ramalama/shortnames.conf
User (Local Share)	$HOME/.local/share/ramalama/shortnames.conf
Administrators	/etc/ramalama/shortnames.conf
Distribution	/usr/share/ramalama/shortnames.conf
Local Distribution	/usr/local/share/ramalama/shortnames.conf

$ cat /usr/share/ramalama/shortnames.conf
[shortnames]
  "tiny" = "ollama://tinyllama"
  "granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
  "granite:7b" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
  "ibm/granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
  "merlinite" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf"
  "merlinite:7b" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf"
  ...

Commands

`ramalama-bench`

Benchmark specified AI Model.

Benchmark specified AI Model
```
 $ ramalama bench granite3-moe
```

`ramalama-containers`

List all RamaLama containers.

List all containers running AI Models

 $ ramalama containers

Returns for example:

 CONTAINER ID  IMAGE                             COMMAND               CREATED        STATUS                    PORTS                   NAMES
 85ad75ecf866  quay.io/ramalama/ramalama:latest  /usr/bin/ramalama...  5 hours ago    Up 5 hours                0.0.0.0:8080->8080/tcp  ramalama_s3Oh6oDfOP
 85ad75ecf866  quay.io/ramalama/ramalama:latest  /usr/bin/ramalama...  4 minutes ago  Exited (0) 4 minutes ago                          granite-server

List all containers in a particular format

 $ ramalama ps --noheading --format "{{ .Names }}"

Returns for example:

 ramalama_s3Oh6oDfOP
 granite-server

`ramalama-convert`

Convert AI Model from local storage to OCI Image.

Generate an oci model out of an Ollama model.

 $ ramalama convert ollama://tinyllama:latest oci://quay.io/rhatdan/tiny:latest

Returns for example:

 Building quay.io/rhatdan/tiny:latest...
 STEP 1/2: FROM scratch
 STEP 2/2: COPY sha256:2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 /model
 --> Using cache 69db4a10191c976d2c3c24da972a2a909adec45135a69dbb9daeaaf2a3a36344
 COMMIT quay.io/rhatdan/tiny:latest
 --> 69db4a10191c
 Successfully tagged quay.io/rhatdan/tiny:latest
 69db4a10191c976d2c3c24da972a2a909adec45135a69dbb9daeaaf2a3a36344

Generate and run an OCI model with a quantized GGUF converted from Safetensors.

Generate OCI model

 $ ramalama --image quay.io/ramalama/ramalama-rag convert --gguf Q4_K_M hf://ibm-granite/granite-3.2-2b-instruct oci://quay.io/kugupta/granite-3.2-q4-k-m:latest

Returns for example:

 Converting /Users/kugupta/.local/share/ramalama/models/huggingface/ibm-granite/granite-3.2-2b-instruct to quay.io/kugupta/granite-3.2-q4-k-m:latest...
 Building quay.io/kugupta/granite-3.2-q4-k-m:latest...

Run the generated model

 $ ramalama run oci://quay.io/kugupta/granite-3.2-q4-k-m:latest

`ramalama-info`

Display RamaLama configuration information.

Info with no container engine.

 $ ramalama info

Returns for example:

 {
     "Accelerator": "cuda",
     "Engine": {
 	"Name": ""
     },
     "Image": "quay.io/ramalama/cuda:0.7",
     "Runtime": "llama.cpp",
     "Shortnames": {
 	"Names": {
 	    "cerebrum": "huggingface://froggeric/Cerebrum-1.0-7b-GGUF/Cerebrum-1.0-7b-Q4_KS.gguf",
 	    "deepseek": "ollama://deepseek-r1",
 	    "dragon": "huggingface://llmware/dragon-mistral-7b-v0/dragon-mistral-7b-q4_k_m.gguf",
 	    "gemma3": "hf://bartowski/google_gemma-3-4b-it-GGUF/google_gemma-3-4b-it-IQ2_M.gguf",
 	    "gemma3:12b": "hf://bartowski/google_gemma-3-12b-it-GGUF/google_gemma-3-12b-it-IQ2_M.gguf",
 	    "gemma3:1b": "hf://bartowski/google_gemma-3-1b-it-GGUF/google_gemma-3-1b-it-IQ2_M.gguf",
 	    "gemma3:27b": "hf://bartowski/google_gemma-3-27b-it-GGUF/google_gemma-3-27b-it-IQ2_M.gguf",
 	    "gemma3:4b": "hf://bartowski/google_gemma-3-4b-it-GGUF/google_gemma-3-4b-it-IQ2_M.gguf",
 	    "granite": "ollama://granite3.1-dense",
 	    "granite-code": "hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf",
 	    "granite-code:20b": "hf://ibm-granite/granite-20b-code-base-8k-GGUF/granite-20b-code-base.Q4_K_M.gguf",
 	    "granite-code:34b": "hf://ibm-granite/granite-34b-code-base-8k-GGUF/granite-34b-code-base.Q4_K_M.gguf",
 	    "granite-code:3b": "hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf",
 	    "granite-code:8b": "hf://ibm-granite/granite-8b-code-base-4k-GGUF/granite-8b-code-base.Q4_K_M.gguf",
 	    "granite-lab-7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
 	    "granite-lab-8b": "huggingface://ibm-granite/granite-8b-code-base-GGUF/granite-8b-code-base.Q4_K_M.gguf",
 	    "granite-lab:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
 	    "granite:2b": "ollama://granite3.1-dense:2b",
 	    "granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
 	    "granite:8b": "ollama://granite3.1-dense:8b",
 	    "hermes": "huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf",
 	    "ibm/granite": "ollama://granite3.1-dense:8b",
 	    "ibm/granite:2b": "ollama://granite3.1-dense:2b",
 	    "ibm/granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
 	    "ibm/granite:8b": "ollama://granite3.1-dense:8b",
 	    "merlinite": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
 	    "merlinite-lab-7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
 	    "merlinite-lab:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
 	    "merlinite:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
 	    "mistral": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
 	    "mistral:7b": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
 	    "mistral:7b-v1": "huggingface://TheBloke/Mistral-7B-Instruct-v0.1-GGUF/mistral-7b-instruct-v0.1.Q5_K_M.gguf",
 	    "mistral:7b-v2": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
 	    "mistral:7b-v3": "huggingface://MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3.Q4_K_M.gguf",
 	    "mistral_code_16k": "huggingface://TheBloke/Mistral-7B-Code-16K-qlora-GGUF/mistral-7b-code-16k-qlora.Q4_K_M.gguf",
 	    "mistral_codealpaca": "huggingface://TheBloke/Mistral-7B-codealpaca-lora-GGUF/mistral-7b-codealpaca-lora.Q4_K_M.gguf",
 	    "mixtao": "huggingface://MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF/MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M.gguf",
 	    "openchat": "huggingface://TheBloke/openchat-3.5-0106-GGUF/openchat-3.5-0106.Q4_K_M.gguf",
 	    "openorca": "huggingface://TheBloke/Mistral-7B-OpenOrca-GGUF/mistral-7b-openorca.Q4_K_M.gguf",
 	    "phi2": "huggingface://MaziyarPanahi/phi-2-GGUF/phi-2.Q4_K_M.gguf",
 	    "smollm:135m": "ollama://smollm:135m",
 	    "tiny": "ollama://tinyllama"
 	},
 	"Files": [
 	    "/usr/share/ramalama/shortnames.conf",
 	    "/home/dwalsh/.config/ramalama/shortnames.conf",
 	]
     },
     "Store": "/home/dwalsh/.local/share/ramalama",
     "UseContainer": true,
     "Version": "0.7.5"
 }

Info with Podman engine.

 $ ramalama info

Returns for example:

 {
     "Accelerator": "cuda",
     "Engine": {
 	"Info": {
 	    "host": {
 		"arch": "amd64",
 		"buildahVersion": "1.39.4",
 		"cgroupControllers": [
 		    "cpu",
 		    "io",
 		    "memory",
 		    "pids"
 		],
 		"cgroupManager": "systemd",
 		"cgroupVersion": "v2",
 		"conmon": {
 		    "package": "conmon-2.1.13-1.fc42.x86_64",
 		    "path": "/usr/bin/conmon",
 		    "version": "conmon version 2.1.13, commit: "
 		},
 		"cpuUtilization": {
 		    "idlePercent": 97.36,
 		    "systemPercent": 0.64,
 		    "userPercent": 2
 		},
 		"cpus": 32,
 		"databaseBackend": "sqlite",
 		"distribution": {
 		    "distribution": "fedora",
 		    "variant": "workstation",
 		    "version": "42"
 		},
 		"eventLogger": "journald",
 		"freeLocks": 2043,
 		"hostname": "danslaptop",
 		"idMappings": {
 		    "gidmap": [
 			{
 			    "container_id": 0,
 			    "host_id": 3267,
 			    "size": 1
 			},
 			{
 			    "container_id": 1,
 			    "host_id": 524288,
 			    "size": 65536
 			}
 		    ],
 		    "uidmap": [
 			{
 			    "container_id": 0,
 			    "host_id": 3267,
 			    "size": 1
 			},
 			{
 			    "container_id": 1,
 			    "host_id": 524288,
 			    "size": 65536
 			}
 		    ]
 		},
 		"kernel": "6.14.2-300.fc42.x86_64",
 		"linkmode": "dynamic",
 		"logDriver": "journald",
 		"memFree": 65281908736,
 		"memTotal": 134690979840,
 		"networkBackend": "netavark",
 		"networkBackendInfo": {
 		    "backend": "netavark",
 		    "dns": {
 			"package": "aardvark-dns-1.14.0-1.fc42.x86_64",
 			"path": "/usr/libexec/podman/aardvark-dns",
 			"version": "aardvark-dns 1.14.0"
 		    },
 		    "package": "netavark-1.14.1-1.fc42.x86_64",
 		    "path": "/usr/libexec/podman/netavark",
 		    "version": "netavark 1.14.1"
 		},
 		"ociRuntime": {
 		    "name": "crun",
 		    "package": "crun-1.21-1.fc42.x86_64",
 		    "path": "/usr/bin/crun",
 		    "version": "crun version 1.21\ncommit: 10269840aa07fb7e6b7e1acff6198692d8ff5c88\nrundir: /run/user/3267/crun\nspec: 1.0.0\n+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL"
 		},
 		"os": "linux",
 		"pasta": {
 		    "executable": "/bin/pasta",
 		    "package": "passt-0^20250415.g2340bbf-1.fc42.x86_64",
 		    "version": ""
 		},
 		"remoteSocket": {
 		    "exists": true,
 		    "path": "/run/user/3267/podman/podman.sock"
 		},
 		"rootlessNetworkCmd": "pasta",
 		"security": {
 		    "apparmorEnabled": false,
 		    "capabilities": "CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT",
 		    "rootless": true,
 		    "seccompEnabled": true,
 		    "seccompProfilePath": "/usr/share/containers/seccomp.json",
 		    "selinuxEnabled": true
 		},
 		"serviceIsRemote": false,
 		"slirp4netns": {
 		    "executable": "/bin/slirp4netns",
 		    "package": "slirp4netns-1.3.1-2.fc42.x86_64",
 		    "version": "slirp4netns version 1.3.1\ncommit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236\nlibslirp: 4.8.0\nSLIRP_CONFIG_VERSION_MAX: 5\nlibseccomp: 2.5.5"
 		},
 		"swapFree": 8589930496,
 		"swapTotal": 8589930496,
 		"uptime": "116h 35m 40.00s (Approximately 4.83 days)",
 		"variant": ""
 	    },
 	    "plugins": {
 		"authorization": null,
 		"log": [
 		    "k8s-file",
 		    "none",
 		    "passthrough",
 		    "journald"
 		],
 		"network": [
 		    "bridge",
 		    "macvlan",
 		    "ipvlan"
 		],
 		"volume": [
 		    "local"
 		]
 	    },
 	    "registries": {
 		"search": [
 		    "registry.fedoraproject.org",
 		    "registry.access.redhat.com",
 		    "docker.io"
 		]
 	    },
 	    "store": {
 		"configFile": "/home/dwalsh/.config/containers/storage.conf",
 		"containerStore": {
 		    "number": 5,
 		    "paused": 0,
 		    "running": 0,
 		    "stopped": 5
 		},
 		"graphDriverName": "overlay",
 		"graphOptions": {},
 		"graphRoot": "/home/dwalsh/.local/share/containers/storage",
 		"graphRootAllocated": 2046687182848,
 		"graphRootUsed": 399990419456,
 		"graphStatus": {
 		    "Backing Filesystem": "btrfs",
 		    "Native Overlay Diff": "true",
 		    "Supports d_type": "true",
 		    "Supports shifting": "false",
 		    "Supports volatile": "true",
 		    "Using metacopy": "false"
 		},
 		"imageCopyTmpDir": "/var/tmp",
 		"imageStore": {
 		    "number": 297
 		},
 		"runRoot": "/run/user/3267/containers",
 		"transientStore": false,
 		"volumePath": "/home/dwalsh/.local/share/containers/storage/volumes"
 	    },
 	    "version": {
 		"APIVersion": "5.4.2",
 		"BuildOrigin": "Fedora Project",
 		"Built": 1743552000,
 		"BuiltTime": "Tue Apr  1 19:00:00 2025",
 		"GitCommit": "be85287fcf4590961614ee37be65eeb315e5d9ff",
 		"GoVersion": "go1.24.1",
 		"Os": "linux",
 		"OsArch": "linux/amd64",
 		"Version": "5.4.2"
 	    }
 	},
 	"Name": "podman"
     },
     "Image": "quay.io/ramalama/cuda:0.7",
     "Runtime": "llama.cpp",
     "Shortnames": {
 	"Names": {
 	    "cerebrum": "huggingface://froggeric/Cerebrum-1.0-7b-GGUF/Cerebrum-1.0-7b-Q4_KS.gguf",
 	    "deepseek": "ollama://deepseek-r1",
 	    "dragon": "huggingface://llmware/dragon-mistral-7b-v0/dragon-mistral-7b-q4_k_m.gguf",
 	    "gemma3": "hf://bartowski/google_gemma-3-4b-it-GGUF/google_gemma-3-4b-it-IQ2_M.gguf",
 	    "gemma3:12b": "hf://bartowski/google_gemma-3-12b-it-GGUF/google_gemma-3-12b-it-IQ2_M.gguf",
 	    "gemma3:1b": "hf://bartowski/google_gemma-3-1b-it-GGUF/google_gemma-3-1b-it-IQ2_M.gguf",
 	    "gemma3:27b": "hf://bartowski/google_gemma-3-27b-it-GGUF/google_gemma-3-27b-it-IQ2_M.gguf",
 	    "gemma3:4b": "hf://bartowski/google_gemma-3-4b-it-GGUF/google_gemma-3-4b-it-IQ2_M.gguf",
 	    "granite": "ollama://granite3.1-dense",
 	    "granite-code": "hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf",
 	    "granite-code:20b": "hf://ibm-granite/granite-20b-code-base-8k-GGUF/granite-20b-code-base.Q4_K_M.gguf",
 	    "granite-code:34b": "hf://ibm-granite/granite-34b-code-base-8k-GGUF/granite-34b-code-base.Q4_K_M.gguf",
 	    "granite-code:3b": "hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf",
 	    "granite-code:8b": "hf://ibm-granite/granite-8b-code-base-4k-GGUF/granite-8b-code-base.Q4_K_M.gguf",
 	    "granite-lab-7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
 	    "granite-lab-8b": "huggingface://ibm-granite/granite-8b-code-base-GGUF/granite-8b-code-base.Q4_K_M.gguf",
 	    "granite-lab:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
 	    "granite:2b": "ollama://granite3.1-dense:2b",
 	    "granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
 	    "granite:8b": "ollama://granite3.1-dense:8b",
 	    "hermes": "huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf",
 	    "ibm/granite": "ollama://granite3.1-dense:8b",
 	    "ibm/granite:2b": "ollama://granite3.1-dense:2b",
 	    "ibm/granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
 	    "ibm/granite:8b": "ollama://granite3.1-dense:8b",
 	    "merlinite": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
 	    "merlinite-lab-7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
 	    "merlinite-lab:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
 	    "merlinite:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
 	    "mistral": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
 	    "mistral:7b": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
 	    "mistral:7b-v1": "huggingface://TheBloke/Mistral-7B-Instruct-v0.1-GGUF/mistral-7b-instruct-v0.1.Q5_K_M.gguf",
 	    "mistral:7b-v2": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
 	    "mistral:7b-v3": "huggingface://MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3.Q4_K_M.gguf",
 	    "mistral_code_16k": "huggingface://TheBloke/Mistral-7B-Code-16K-qlora-GGUF/mistral-7b-code-16k-qlora.Q4_K_M.gguf",
 	    "mistral_codealpaca": "huggingface://TheBloke/Mistral-7B-codealpaca-lora-GGUF/mistral-7b-codealpaca-lora.Q4_K_M.gguf",
 	    "mixtao": "huggingface://MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF/MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M.gguf",
 	    "openchat": "huggingface://TheBloke/openchat-3.5-0106-GGUF/openchat-3.5-0106.Q4_K_M.gguf",
 	    "openorca": "huggingface://TheBloke/Mistral-7B-OpenOrca-GGUF/mistral-7b-openorca.Q4_K_M.gguf",
 	    "phi2": "huggingface://MaziyarPanahi/phi-2-GGUF/phi-2.Q4_K_M.gguf",
 	    "smollm:135m": "ollama://smollm:135m",
 	    "tiny": "ollama://tinyllama"
 	},
 	"Files": [
 	    "/usr/share/ramalama/shortnames.conf",
 	    "/home/dwalsh/.config/ramalama/shortnames.conf",
 	]
     },
     "Store": "/home/dwalsh/.local/share/ramalama",
     "UseContainer": true,
     "Version": "0.7.5"
 }

Using jq to print specific `ramalama info` content.

 $ ramalama info |  jq .Shortnames.Names.mixtao

Returns for example:

"huggingface://MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF/MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M.gguf"

`ramalama-inspect`

Inspect the specified AI Model.

Inspect the smollm:135m model for basic information.

 $ ramalama inspect smollm:135m

Returns for example:

 smollm:135m
    Path: /var/lib/ramalama/models/ollama/smollm:135m
    Registry: ollama
    Format: GGUF
    Version: 3
    Endianness: little
    Metadata: 39 entries
    Tensors: 272 entries

Inspect the smollm:135m model for all information in json format.

 $ ramalama inspect smollm:135m --all --json

Returns for example:

 {
     "Name": "smollm:135m",
     "Path": "/home/mengel/.local/share/ramalama/models/ollama/smollm:135m",
     "Registry": "ollama",
     "Format": "GGUF",
     "Version": 3,
     "LittleEndian": true,
     "Metadata": {
 	"general.architecture": "llama",
 	"general.base_model.0.name": "SmolLM 135M",
 	"general.base_model.0.organization": "HuggingFaceTB",
 	"general.base_model.0.repo_url": "https://huggingface.co/HuggingFaceTB/SmolLM-135M",
 	...
     },
     "Tensors": [
 	{
 	    "dimensions": [
 		576,
 		49152
 	    ],
 	    "n_dimensions": 2,
 	    "name": "token_embd.weight",
 	    "offset": 0,
 	    "type": 8
 	},
 	...
     ]
 }

`ramalama-list`

List all downloaded AI Models.

You can `list` all models pulled into local storage.

 $ ramalama list

Returns for example:

 NAME                                                                    MODIFIED      SIZE
 ollama://smollm:135m                                                    16 hours ago  5.5M
 huggingface://afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf     14 hours ago  460M
 ollama://moondream:latest                                               6 days ago    791M
 ollama://phi4:latest                                                    6 days ago    8.43 GB
 ollama://tinyllama:latest                                               1 week ago    608.16 MB
 ollama://granite3-moe:3b                                                1 week ago    1.92 GB
 ollama://granite3-moe:latest                                            3 months ago  1.92 GB
 ollama://llama3.1:8b                                                    2 months ago  4.34 GB
 ollama://llama3.1:latest                                                2 months ago  4.34 GB

`ramalama-login`

Log in to a remote registry.

 $ export RAMALAMA_TRANSPORT=quay.io/username
 $ ramalama login -u username

 $ export RAMALAMA_TRANSPORT=ollama
 $ ramalama login

Log in to huggingface registry
```
 $ export RAMALAMA_TRANSPORT=huggingface
 $ ramalama login --token=XYZ
```
Logging in to Hugging Face requires the huggingface-cli tool. For installation and usage instructions, see the documentation of the Hugging Face command line interface.

`ramalama-logout`

Log out of a remote registry.

Log out from quay.io/username oci repository
```
 $ ramalama logout quay.io/username
```

Log out from ollama repository
```
 $ ramalama logout ollama
```

Log out from huggingface
```
 $ ramalama logout huggingface
```

`ramalama-perplexity`

Calculate perplexity for the specified AI Model.

Calculate the perplexity of an AI Model.

Perplexity measures how well the model can predict the next token with lower values being better
```
 $ ramalama perplexity granite3-moe
```

`ramalama-pull`

Pull the AI Model from the Model registry to local storage.

Pull a model

You can pull a model using the pull command. By default, it pulls from the Ollama registry.
```
 $ ramalama pull granite3-moe
```

`ramalama-push`

Push the AI Model from local storage to a remote registry.

Push specified AI Model (OCI-only at present)

A model can from RamaLama model storage in Huggingface, Ollama, or OCI Model format. The model can also just be a model stored on disk
```
 $ ramalama push oci://quay.io/rhatdan/tiny:latest
```

`ramalama-rag`

Generate and convert Retrieval Augmented Generation (RAG) data from provided documents into an OCI Image.

[!NOTE] this command does not work without a container engine.

Generate RAG data from provided documents and convert into an OCI Image.

This command uses a specific container image containing the docling tool to convert the specified content into a RAG vector database. If the image does not exist locally, RamaLama will pull the image down and launch a container to process the data.

**Positional arguments:**

PATH Files/Directory containing PDF, DOCX, PPTX, XLSX, HTML, AsciiDoc & Markdown formatted files to be processed. Can be specified multiple times.

IMAGE OCI Image name to contain processed rag data

```
./bin/ramalama rag ./README.md https://github.com/containers/podman/blob/main/README.md quay.io/rhatdan/myrag
100% |███████████████████████████████████████████████████████|  114.00 KB/    0.00 B 922.89 KB/s   59m 59s
Building quay.io/ramalama/myrag...
adding vectordb...
c857ebc65c641084b34e39b740fdb6a2d9d2d97be320e6aa9439ed0ab8780fe0
```

`ramalama-rm`

Remove the AI Model from local storage.

Specify one or more AI Models to be removed from local storage.
```
 $ ramalama rm ollama://tinyllama
```

Remove all AI Models from local storage.
```
 $ ramalama rm --all
```

`ramalama-run`

Run the specified AI Model as a chatbot.

Run a chatbot on a model using the run command. By default, it pulls from the Ollama registry.

Note: RamaLama will inspect your machine for native GPU support and then will use a container engine like Podman to pull an OCI container image with the appropriate code and libraries to run the AI Model. This can take a long time to setup, but only on the first run.
```
 $ ramalama run instructlab/merlinite-7b-lab
```

After the initial container image has been downloaded, you can interact with different models using the container image.
```
 $ ramalama run granite3-moe
```
Returns for example:
```
 > Write a hello world application in python

 print("Hello World")
```

In a different terminal window see the running podman container.

 $ podman ps
 CONTAINER ID  IMAGE                             COMMAND               CREATED        STATUS        PORTS       NAMES
 91df4a39a360  quay.io/ramalama/ramalama:latest  /home/dwalsh/rama...  4 minutes ago  Up 4 minutes              gifted_volhard

`ramalama-serve`

Serve REST API on the specified AI Model.

Serve a model and connect via a browser.
```
 $ ramalama serve llama3
```
When the web UI is enabled, you can connect via your browser at: 127.0.0.1:< port > The default serving port will be 8080 if available, otherwise a free random port in the range 8081-8090. If you wish, you can specify a port to use with --port/-p.

Run two AI Models at the same time. Notice both are running within Podman Containers.

 $ ramalama serve -d -p 8080 --name mymodel ollama://smollm:135m
 09b0e0d26ed28a8418fb5cd0da641376a08c435063317e89cf8f5336baf35cfa

 $ ramalama serve -d -n example --port 8081 oci://quay.io/mmortari/gguf-py-example/v1/example.gguf
 3f64927f11a5da5ded7048b226fbe1362ee399021f5e8058c73949a677b6ac9c

 $ podman ps
 CONTAINER ID  IMAGE                             COMMAND               CREATED         STATUS         PORTS                   NAMES
 09b0e0d26ed2  quay.io/ramalama/ramalama:latest  /usr/bin/ramalama...  32 seconds ago  Up 32 seconds  0.0.0.0:8081->8081/tcp  ramalama_sTLNkijNNP
 3f64927f11a5  quay.io/ramalama/ramalama:latest  /usr/bin/ramalama...  17 seconds ago  Up 17 seconds  0.0.0.0:8082->8082/tcp  ramalama_YMPQvJxN97

To disable the web UI, use the `--webui` off flag.
```
 $ ramalama serve --webui off llama3
```

`ramalama-stop`

Stop the named container that is running the AI Model.

Stop a running model if it is running in a container.
```
 $ ramalama stop mymodel
```

Stop all running models running in containers.
```
 $ ramalama stop --all
```

`ramalama-version`

Display version of the AI Model.

Print the version of RamaLama.
```
 $ ramalama version
```
Returns for example:
```
 ramalama version 1.2.3
```

Appendix

Command	Description
ramalama(1)	primary RamaLama man page
ramalama-bench(1)	benchmark specified AI Model
ramalama-chat(1)	chat with specified OpenAI REST API
ramalama-containers(1)	list all RamaLama containers
ramalama-convert(1)	convert AI Model from local storage to OCI Image
ramalama-info(1)	display RamaLama configuration information
ramalama-inspect(1)	inspect the specified AI Model
ramalama-list(1)	list all downloaded AI Models
ramalama-login(1)	login to remote registry
ramalama-logout(1)	logout from remote registry
ramalama-perplexity(1)	calculate perplexity for specified AI Model
ramalama-pull(1)	pull AI Model from Model registry to local storage
ramalama-push(1)	push AI Model from local storage to remote registry
ramalama-rag(1)	generate and convert Retrieval Augmented Generation (RAG) data from provided documents into an OCI Image
ramalama-rm(1)	remove AI Model from local storage
ramalama-run(1)	run specified AI Model as a chatbot
ramalama-serve(1)	serve REST API on specified AI Model
ramalama-stop(1)	stop named container that is running AI Model
ramalama-version(1)	display version of RamaLama

Diagram

+---------------------------+
|                           |
| ramalama run granite3-moe |
|                           |
+-------+-------------------+
	|
	|
	|           +------------------+           +------------------+
	|           | Pull inferencing |           | Pull model layer |
	+-----------| runtime (cuda)   |---------->| granite3-moe     |
		    +------------------+           +------------------+
						   | Repo options:    |
						   +-+-------+------+-+
						     |       |      |
						     v       v      v
					     +---------+ +------+ +----------+
					     | Hugging | | OCI  | | Ollama   |
					     | Face    | |      | | Registry |
					     +-------+-+ +---+--+ +-+--------+
						     |       |      |
						     v       v      v
						   +------------------+
						   | Start with       |
						   | cuda runtime     |
						   | and              |
						   | granite3-moe     |
						   +------------------+

In development

Regarding this alpha, everything is under development, so expect breaking changes, luckily it's easy to reset everything and reinstall:

rm -rf /var/lib/ramalama # only required if running as root user
rm -rf $HOME/.local/share/ramalama

and install again.

Known Issues

On certain versions of Python on macOS, certificates may not installed correctly, potentially causing SSL errors (e.g., when accessing huggingface.co). To resolve this, run the Install Certificates command, typically as follows:

/Applications/Python 3.x/Install Certificates.command

Credit where credit is due

This project wouldn't be possible without the help of other projects like:

so if you like this tool, give some of these repos a ⭐, and hey, give us a ⭐ too while you are at it.

Community

For general questions and discussion, please use RamaLama's

Matrix

For discussions around issues/bugs and features, you can use the GitHub Issues and PRs tracking system.

Contributors

Open to contributors

For Tasks:

Click tags to check more tools for each tasks

run model pull model serve chatbot list models inspect machine

For Jobs:

data scientist machine learning engineer ai researcher software developer ai solutions architect

Alternative AI tools for ramalama

Similar Open Source Tools

ramalama

github

: 2.1k

ai-manus

AI Manus is a general-purpose AI Agent system that supports running various tools and operations in a sandbox environment. It offers deployment with minimal dependencies, supports multiple tools like Terminal, Browser, File, Web Search, and messaging tools, allocates separate sandboxes for tasks, manages session history, supports stopping and interrupting conversations, file upload and download, and is multilingual. The system also provides user login and authentication. The project primarily relies on Docker for development and deployment, with model capability requirements and recommended Deepseek and GPT models.

github

: 976

koog

Koog is a Kotlin-based framework for building and running AI agents entirely in idiomatic Kotlin. It allows users to create agents that interact with tools, handle complex workflows, and communicate with users. Key features include pure Kotlin implementation, MCP integration, embedding capabilities, custom tool creation, ready-to-use components, intelligent history compression, powerful streaming API, persistent agent memory, comprehensive tracing, flexible graph workflows, modular feature system, scalable architecture, and multiplatform support.

github

: 3.2k

lemonai

LemonAI is a versatile machine learning library designed to simplify the process of building and deploying AI models. It provides a wide range of tools and algorithms for data preprocessing, model training, and evaluation. With LemonAI, users can easily experiment with different machine learning techniques and optimize their models for various tasks. The library is well-documented and beginner-friendly, making it suitable for both novice and experienced data scientists. LemonAI aims to streamline the development of AI applications and empower users to create innovative solutions using state-of-the-art machine learning methods.

github

: 994

open-ai

Open AI is a powerful tool for artificial intelligence research and development. It provides a wide range of machine learning models and algorithms, making it easier for developers to create innovative AI applications. With Open AI, users can explore cutting-edge technologies such as natural language processing, computer vision, and reinforcement learning. The platform offers a user-friendly interface and comprehensive documentation to support users in building and deploying AI solutions. Whether you are a beginner or an experienced AI practitioner, Open AI offers the tools and resources you need to accelerate your AI projects and stay ahead in the rapidly evolving field of artificial intelligence.

github

: 2.1k

traceroot

TraceRoot is a tool that helps engineers debug production issues 10× faster using AI-powered analysis of traces, logs, and code context. It accelerates the debugging process with AI-powered insights, integrates seamlessly into the development workflow, provides real-time trace and log analysis, code context understanding, and intelligent assistance. Features include ease of use, LLM flexibility, distributed services, AI debugging interface, and integration support. Users can get started with TraceRoot Cloud for a 7-day trial or self-host the tool. SDKs are available for Python and JavaScript/TypeScript.

github

: 336

GenerativeAIExamples

NVIDIA Generative AI Examples are state-of-the-art examples that are easy to deploy, test, and extend. All examples run on the high performance NVIDIA CUDA-X software stack and NVIDIA GPUs. These examples showcase the capabilities of NVIDIA's Generative AI platform, which includes tools, frameworks, and models for building and deploying generative AI applications.

github

: 3.4k

atomic-agents

The Atomic Agents framework is a modular and extensible tool designed for creating powerful applications. It leverages Pydantic for data validation and serialization. The framework follows the principles of Atomic Design, providing small and single-purpose components that can be combined. It integrates with Instructor for AI agent architecture and supports various APIs like Cohere, Anthropic, and Gemini. The tool includes documentation, examples, and testing features to ensure smooth development and usage.

github

: 2.7k

ml-retreat

ML-Retreat is a comprehensive machine learning library designed to simplify and streamline the process of building and deploying machine learning models. It provides a wide range of tools and utilities for data preprocessing, model training, evaluation, and deployment. With ML-Retreat, users can easily experiment with different algorithms, hyperparameters, and feature engineering techniques to optimize their models. The library is built with a focus on scalability, performance, and ease of use, making it suitable for both beginners and experienced machine learning practitioners.

github

: 2.2k

tools

Strands Agents Tools is a community-driven project that provides a powerful set of tools for your agents to use. It bridges the gap between large language models and practical applications by offering ready-to-use tools for file operations, system execution, API interactions, mathematical operations, and more. The tools cover a wide range of functionalities including file operations, shell integration, memory storage, web infrastructure, HTTP client, Slack client, Python execution, mathematical tools, AWS integration, image and video processing, audio output, environment management, task scheduling, advanced reasoning, swarm intelligence, dynamic MCP client, parallel tool execution, browser automation, diagram creation, RSS feed management, and computer automation.

github

: 620

llama.ui

llama.ui is an open-source desktop application that provides a beautiful, user-friendly interface for interacting with large language models powered by llama.cpp. It is designed for simplicity and privacy, allowing users to chat with powerful quantized models on their local machine without the need for cloud services. The project offers multi-provider support, conversation management with indexedDB storage, rich UI components including markdown rendering and file attachments, advanced features like PWA support and customizable generation parameters, and is privacy-focused with all data stored locally in the browser.

github

: 139

crystal

Crystal is an Electron desktop application that allows users to run, inspect, and test multiple Claude Code instances simultaneously using git worktrees. It provides features such as parallel sessions, git worktree isolation, session persistence, git integration, change tracking, notifications, and the ability to run scripts. Crystal simplifies the workflow by creating isolated sessions, iterating with Claude Code, reviewing diff changes, and squashing commits for a clean history. It is a tool designed for collaborative AI notebook editing and testing.

github

: 1.9k

BentoVLLM

BentoVLLM is an example project demonstrating how to serve and deploy open-source Large Language Models using vLLM, a high-throughput and memory-efficient inference engine. It provides a basis for advanced code customization, such as custom models, inference logic, or vLLM options. The project allows for simple LLM hosting with OpenAI compatible endpoints without the need to write any code. Users can interact with the server using Swagger UI or other methods, and the service can be deployed to BentoCloud for better management and scalability. Additionally, the repository includes integration examples for different LLM models and tools.

github

: 150

arcade-ai

Arcade AI is a developer-focused tooling and API platform designed to enhance the capabilities of LLM applications and agents. It simplifies the process of connecting agentic applications with user data and services, allowing developers to concentrate on building their applications. The platform offers prebuilt toolkits for interacting with various services, supports multiple authentication providers, and provides access to different language models. Users can also create custom toolkits and evaluate their tools using Arcade AI. Contributions are welcome, and self-hosting is possible with the provided documentation.

github

: 654

lmnr

Laminar is an all-in-one open-source platform designed for engineering AI products. It allows users to trace, evaluate, label, and analyze LLM data efficiently. The platform offers features such as automatic tracing of common AI frameworks and SDKs, local and online evaluations, simple UI for data labeling, dataset management, and scalability with gRPC communication. Laminar is built with a modern open-source stack including RabbitMQ, Postgres, Clickhouse, and Qdrant for semantic similarity search. It provides fast and beautiful dashboards for traces, evaluations, and labels, making it a comprehensive tool for AI product development.

github

: 2.3k

azure-ai-docs

Azure AI Docs is a repository that provides detailed documentation and resources for developers looking to leverage Microsoft's AI services on the Azure platform. The repository covers a wide range of topics including machine learning, natural language processing, computer vision, and more. Developers can find tutorials, code samples, best practices, and guidelines to help them integrate AI capabilities into their applications seamlessly.

github

: 104

For similar tasks

ramalama

github

: 2.1k

client-js

The Mistral JavaScript client is a library that allows you to interact with the Mistral AI API. With this client, you can perform various tasks such as listing models, chatting with streaming, chatting without streaming, and generating embeddings. To use the client, you can install it in your project using npm and then set up the client with your API key. Once the client is set up, you can use it to perform the desired tasks. For example, you can use the client to chat with a model by providing a list of messages. The client will then return the response from the model. You can also use the client to generate embeddings for a given input. The embeddings can then be used for various downstream tasks such as clustering or classification.

github

: 173

OllamaSharp

OllamaSharp is a .NET binding for the Ollama API, providing an intuitive API client to interact with Ollama. It offers support for all Ollama API endpoints, real-time streaming, progress reporting, and an API console for remote management. Users can easily set up the client, list models, pull models with progress feedback, stream completions, and build interactive chats. The project includes a demo console for exploring and managing the Ollama host.

github

: 1.1k

client

Gemini API PHP Client is a library that allows you to interact with Google's generative AI models, such as Gemini Pro and Gemini Pro Vision. It provides functionalities for basic text generation, multimodal input, chat sessions, streaming responses, tokens counting, listing models, and advanced usages like safety settings and custom HTTP client usage. The library requires an API key to access Google's Gemini API and can be installed using Composer. It supports various features like generating content, starting chat sessions, embedding content, counting tokens, and listing available models.

github

: 97

jvm-openai

jvm-openai is a minimalistic unofficial OpenAI API client for the JVM, written in Java. It serves as a Java client for OpenAI API with a focus on simplicity and minimal dependencies. The tool provides support for various OpenAI APIs and endpoints, including Audio, Chat, Embeddings, Fine-tuning, Batch, Files, Uploads, Images, Models, Moderations, Assistants, Threads, Messages, Runs, Run Steps, Vector Stores, Vector Store Files, Vector Store File Batches, Invites, Users, Projects, Project Users, Project Service Accounts, Project API Keys, and Audit Logs. Users can easily integrate this tool into their Java projects to interact with OpenAI services efficiently.

github

: 70

ollama-r

The Ollama R library provides an easy way to integrate R with Ollama for running language models locally on your machine. It supports working with standard data structures for different LLMs, offers various output formats, and enables integration with other libraries/tools. The library uses the Ollama REST API and requires the Ollama app to be installed, with GPU support for accelerating LLM inference. It is inspired by Ollama Python and JavaScript libraries, making it familiar for users of those languages. The installation process involves downloading the Ollama app, installing the 'ollamar' package, and starting the local server. Example usage includes testing connection, downloading models, generating responses, and listing available models.

github

: 89

openai-scala-client

This is a no-nonsense async Scala client for OpenAI API supporting all the available endpoints and params including streaming, chat completion, vision, and voice routines. It provides a single service called OpenAIService that supports various calls such as Models, Completions, Chat Completions, Edits, Images, Embeddings, Batches, Audio, Files, Fine-tunes, Moderations, Assistants, Threads, Thread Messages, Runs, Run Steps, Vector Stores, Vector Store Files, and Vector Store File Batches. The library aims to be self-contained with minimal dependencies and supports API-compatible providers like Azure OpenAI, Azure AI, Anthropic, Google Vertex AI, Groq, Grok, Fireworks AI, OctoAI, TogetherAI, Cerebras, Mistral, Deepseek, Ollama, FastChat, and more.

github

: 232

ai-models

The `ai-models` command is a tool used to run AI-based weather forecasting models. It provides functionalities to install, run, and manage different AI models for weather forecasting. Users can easily install and run various models, customize model settings, download assets, and manage input data from different sources such as ECMWF, CDS, and GRIB files. The tool is designed to optimize performance by running on GPUs and provides options for better organization of assets and output files. It offers a range of command line options for users to interact with the models and customize their forecasting tasks.

github

: 367

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 980

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.9k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 32.1k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675