mllm

Fast Multimodal LLM on Mobile Devices

Stars: 703

Visit

mllm is a fast and lightweight multimodal LLM inference engine for mobile and edge devices. It is a Plain C/C++ implementation without dependencies, optimized for multimodal LLMs like fuyu-8B, and supports ARM NEON and x86 AVX2. The engine offers 4-bit and 6-bit integer quantization, making it suitable for intelligent personal agents, text-based image searching/retrieval, screen VQA, and various mobile applications without compromising user privacy.

README:

mllm

fast and lightweight multimodal LLM inference engine for mobile and edge devices

| Arm CPU | X86 CPU | Qualcomm NPU(QNN) |

Plain C/C++ implementation without dependencies
Optimized for multimodal LLMs like fuyu-8B and LLaVA
Supported: ARM NEON, x86 AVX2, Qualcomm NPU (QNN), etc
Various quantization schemes
End-to-end Android app demo

Why on-device multimodal LLM? - It's a key building block for intelligent personal agent, text-based image searching/retrieval, screen VQA, and many more exciting mobile apps, without giving away your private data (chat history, screenshots, taken photos, etc).

Recent update

[2024 November 21] Support new model: Phi 3 Vision https://github.com/UbiquitousLearning/mllm/pull/186
[2024 August 30] Support new model: MiniCPM 2B https://github.com/UbiquitousLearning/mllm/pull/132
[2024 August 15] Support new model: Phi 3 mini https://github.com/UbiquitousLearning/mllm/pull/119
[2024 Aug 10] Supporting Qualcomm NPU: https://github.com/UbiquitousLearning/mllm/pull/112 | try it out | paper

Android Demo
Support models
Quick Start
Customization
Roadmap
Documentation
Contribution
Acknowledgments
License

Android Demo

Android Intent Invocation	Image Understanding

Chat CPU	Chat NPU

Support models

Language models

Model	CPU FP32	CPU INT4	Hexagon NPU INT8
LLaMA 2 7B	✔️	✔️
LLaMA 3 1B	✔️	✔️
LLaMA 3 3B	✔️	✔️
Alpaca 7B	✔️	✔️
TinyLLaMA 1.1B	✔️	✔️
LLaVA 7B	✔️	✔️
Gemma 2B	✔️	✔️
Gemma 2 2B	✔️	✔️
Qwen 1.5 0.5B	✔️	✔️
Qwen 1.5 1.8B	✔️	✔️	✔️
Qwen 2.5 1.5B	✔️	✔️
Mistral 7B	✔️	✔️
Yi 6B	✔️	✔️
StableLM 2 1.6B	✔️	✔️
OPT 1.3B	✔️	✔️
Phi 3 mini 3.8B	✔️	✔️
MiniCPM 2B	✔️	✔️
MiniCPM 3 4B	✔️	✔️
MiniCPM MoE 8x2B	✔️	✔️
SmolLM 1.7B	✔️	✔️
DCLM 1B	✔️	✔️
OpenELM 1.1B	✔️	✔️
PhoneLM 1.5B	✔️	✔️	✔️

Multimodal models

Model	CPU FP32	CPU INT4
Fuyu 8B	✔️	✔️
Vision Transformer	✔️	✔️
CLIP	✔️	✔️
ImageBind (3 modalities)	✔️	✔️
LLaVA 7B	✔️	✔️
Phi-3-Vision	✔️	✔️
Qwen2-VL 2B	✔️	✔️

Quick Start

Get the Code

git clone https://github.com/UbiquitousLearning/mllm
cd mllm

Check prerequisites

Building mllm requires following tools:

gcc(11.4+) / clang (11.0+)
CMake >= 3.18
Android NDK Toolchains >= 26

Note that building OpenMP libs on macOS may fail due to Apple LLVM compiler, so we disable OpenMP on macOS by default, you may experience slower performance on macOS. Build mllm is more recommended on Linux.

Run Qwen with Hexagon NPU accelerating using QNN

NOTE: The QNN backend is preliminary version which can do end-to-end inference. It is still under active development for better performance and more supported models.

We support running Qwen-1.5-1.8B-Chat using Qualcomm QNN to get Hexagon NPU acceleration on devices with Snapdragon 8 Gen3. The details of QNN environment set up and design is here. The prefilling stage is performered by QNN & CPU, and the inference stage is performed by CPU.

Build the target with QNN backend.

cd ../script
./build_qnn_android.sh

Download the model from here, or using the following instructions

mkdir ../models && cd ../models
# Download int8 model used by npu & q4k model used by cpu
wget https://huggingface.co/mllmTeam/qwen-1.5-1.8b-chat-mllm/resolve/main/qwen-1.5-1.8b-chat-int8.mllm?download=true  -O qwen-1.5-1.8b-chat-int8.mllm
wget https://huggingface.co/mllmTeam/qwen-1.5-1.8b-chat-mllm/resolve/main/qwen-1.5-1.8b-chat-q4k.mllm?download=true  -O qwen-1.5-1.8b-chat-q4k.mllm

Run on an android phone with at least 16GB of memory.

cd ../script
./run_qwen_npu.sh

There are two arguments in the executable. -s is for the sequence length of prefilling, the default value is 64 in the demo we provided. -c for type of QNN prefilling options, when it is set to 1, the input will be splited into many chunks of sequence 32 and be executed in a pipeline. When it is set to 0, the input will be executed in one chunk.

Result are as followed:

> ./main_qwen_npu -s 64 -c 1
[Q] <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Give me a short introduction to large language model.<|im_end|>
<|im_start|>assistant

[A] A short introduction to a large language model is a type of artificial intelligence language model that is designed to understand and generate human language text. These models are typically trained on large amounts of text data, such as books, articles, and other written materials, to learn the patterns and structures of human language. They use a combination of natural language processing (NLP)

Run with the CPU of Android

Build

export ANDROID_NDK=/path/to/your/ndk
cd scripts
./build_android.sh

Run Fuyu-8B

Download the model from here, or using the following instructions

mkdir ../models && cd ../models
# Download fuyu-8b-q4_k.mllm
wget https://huggingface.co/mllmTeam/fuyu-8b-mllm/resolve/main/fuyu-8b-q4_k.mllm?download=true  -O fuyu-8b-q4_k.mllm

Run on an android phone with at least 12GB of memory.

cd ../script
./run_fuyu.sh

Result are as followed:

> ./demo_fuyu
[Q] [../assets/bus.png]Generate a coco-style caption.

[A]  A blue bus driving down▁the▁road next▁to a sidewalk.
[Q] [../assets/two_cats.jpg]What's this?

[A]  Two cats laying on▁top▁of a pink blanket.

Run LLaMA-2-7B

Download model from here, or using the following instructions

mkdir ../models && cd ../models
# Dowload llama-2-7b-chat-q4_k.mllm
wget https://huggingface.co/mllmTeam/llama-2-7b-mllm/resolve/main/llama-2-7b-chat-q4_k.mllm?download=true  -O llama-2-7b-chat-q4_k.mllm

Run on an android phone with at least 12GB of memory.

cd ../script
./run_llama.sh

Result are as followed:

> ./demo_llama
[Q]  Hello, who are you?
[A]

I am a bot, I'm here to help answer your questions and provide information. I'm just an AI, I don't have personal experiences or feelings like humans do, but I'm here to help you with any questions you may have. Is there something specific you would like to know or talk about?
[Q]  What can you do?
[A]
I can help with a wide range of tasks, including:
  
* Answering questions on various topics, such as science, history, technology, and more.
* Providing information on different subjects, such as culture, entertainment, and lifestyle.
* Helping with language-related tasks, such as translating text or improving grammar and syntax.
* Generating text, such as articles, stories, or even entire books.
* Assisting with coding
[Q]  Please introduce Beijing University of Posts and Telecommunications.
[A]
Beijing University of Posts and Telecommunications (BUPT) is a public research university located in Beijing, China. It was established in 1955 and is one of the top universities in China, with a strong focus on engineering, technology, and telecommunications.
BUPT offers a wide range of undergraduate and graduate programs in fields such as computer science, electrical engineering, software engineering, and communication engineering. The university

Run ImageBind

Download model from here, or using the following instructions

mkdir ../models && cd ../models
# Download imagebind_huge-q4_k.mllm
wget https://huggingface.co/mllmTeam/imagebind_huge-mllm/resolve/main/imagebind_huge-q4_k.mllm?download=true -O imagebind_huge-q4_k.mllm

Run on an android phone with at least 4GB of memory.

cd ../script
./run_imagebind.sh

Result are as followed:

> ./demo_imagebind 
vision X text :
0.9985647 0.0013827 0.0000526 
0.0000365 0.9998636 0.0000999 
0.0000115 0.0083149 0.9916736 
vision X audio :
0.8054272 0.1228001 0.0717727 
0.0673458 0.8429284 0.0897258 
0.0021967 0.0015335 0.9962698

Run for Linux

Build

cd scripts
./build.sh

Run Fuyu-8B

cd ./bin
./demo_fuyu -m ../models/fuyu-8b-q4_k.mllm -v ../vocab/fuyu_vocab.mllm

Run LLaMA-2-7B

cd ./bin
./demo_llama -m ../models/llama-2-7b-chat-q4_k.mllm -v ../vocab/llama2_vocab.mllm

Run ImageBind

cd ./bin
./demo_imagebind -m ../models/imagebind_huge-q4_k.mllm -v ../vocab/clip_vocab.mllm

Customization

Convert models

You can download models from here, or you can convert a pytorch/safetensor model to mllm model by yourself.

cd tools/convertor
pip install -r ./requirements.txt

# for one file pytorch model
python converter.py --input_model=model.pth --output_model=model.mllm --type=torch

# for multi-file pytorch model
python converter.py --input_model=pytorch_model.bin.index.json --output_model=model.mllm --type=torch

# for one file safetensor model
python converter.py --input_model=model.bin --output_model=model.mllm --type=safetensor

# for multi-file safetensor model
python converter.py --input_model=model.safetensors.index.json --output_model=model.mllm --type=safetensor

Convert vocabulary

You can convert vocabulary to mllm vocabulary as followed.

cd tools/convertor
python vocab.py --input_file=tokenizer.json --output_file=vocab.mllm --type=Unigram

Quantize models

You can quantize mllm model to int4 model by yourself. mllm only support two quantize modes: Q4_0 and Q4_K.

cd bin
./quantize model.mllm model_q4_k.mllm Q4_K

Roadmap

More backends like QNN
More models like PandaGPT
More optimizations like LUT-GEMM
More..

Documentation

See the documentation here for more information

Contribution

Read the contribution before you contribute.

Acknowledgments

mllm reuses many low-level kernel implementation from ggml on ARM CPU. It also utilizes stb and wenet for pre-processing images and audios. mllm also has benefitted from following projects: llama.cpp and MNN.

License

Overall Project License

This project is licensed under the terms of the MIT License. Please see the LICENSE file in the root directory for the full text of the MIT License.

Apache 2.0 Licensed Components

Certain component(wenet) of this project is licensed under the Apache License 2.0. These component is clearly identified in their respective subdirectories along with a copy of the Apache License 2.0. For the full text of the Apache License 2.0, please refer to the LICENSE-APACHE file located in the relevant subdirectories.

Citation

@article{xu2025fast,
  title={Fast On-device LLM Inference with NPUs},
  author={Xu, Daliang and Zhang, Hao and Yang, Liming and Liu, Ruiqi and Huang, Gang and Xu, Mengwei and Liu, Xuanzhe},
  booktitle={International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)},
  year={2025}
}
@misc{yi2023mllm,
  title = {mllm: fast and lightweight multimodal LLM inference engine for mobile and edge devices},
  author = {Rongjie Yi and Xiang Li and Zhenyan Lu and Hao Zhang and Daliang Xu and Liming Yang and Weikai Xie and Chenghua Wang and Xuanzhe Liu and Mengwei Xu},
  year = {2023},
  publisher = {mllm Team},
  url = {https://github.com/UbiquitousLearning/mllm}
}

For Tasks:

Click tags to check more tools for each tasks

chatting bot image understanding screen understanding text generation language translation

For Jobs:

software developer mobile app developer ai engineer edge computing engineer machine learning engineer

Alternative AI tools for mllm

Similar Open Source Tools

mllm

github

: 703

MooER

MooER (摩耳) is an LLM-based speech recognition and translation model developed by Moore Threads. It allows users to transcribe speech into text (ASR) and translate speech into other languages (AST) in an end-to-end manner. The model was trained using 5K hours of data and is now also available with an 80K hours version. MooER is the first LLM-based speech model trained and inferred using domestic GPUs. The repository includes pretrained models, inference code, and a Gradio demo for a better user experience.

github

: 124

portkey-python-sdk

The Portkey Python SDK is a control panel for AI apps that allows seamless integration of Portkey's advanced features with OpenAI methods. It provides features such as AI gateway for unified API signature, interoperability, automated fallbacks & retries, load balancing, semantic caching, virtual keys, request timeouts, observability with logging, requests tracing, custom metadata, feedback collection, and analytics. Users can make requests to OpenAI using Portkey SDK and also use async functionality. The SDK is compatible with OpenAI SDK methods and offers Portkey-specific methods like feedback and prompts. It supports various providers and encourages contributions through Github issues or direct contact via email or Discord.

github

: 54

llava-docker

This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.

github

: 59

AiOS

AiOS is a tool for human pose and shape estimation, performing human localization and SMPL-X estimation in a progressive manner. It consists of body localization, body refinement, and whole-body refinement stages. Users can download datasets for evaluation, SMPL-X body models, and AiOS checkpoint. Installation involves creating a conda virtual environment, installing PyTorch, torchvision, Pytorch3D, MMCV, and other dependencies. Inference requires placing the video for inference and pretrained models in specific directories. Test results are provided for NMVE, NMJE, MVE, and MPJPE on datasets like BEDLAM and AGORA. Users can run scripts for AGORA validation, AGORA test leaderboard, and BEDLAM leaderboard. The tool acknowledges codes from MMHuman3D, ED-Pose, and SMPLer-X.

github

: 121

floneum

Floneum is a graph editor that makes it easy to develop your own AI workflows. It uses large language models (LLMs) to run AI models locally, without any external dependencies or even a GPU. This makes it easy to use LLMs with your own data, without worrying about privacy. Floneum also has a plugin system that allows you to improve the performance of LLMs and make them work better for your specific use case. Plugins can be used in any language that supports web assembly, and they can control the output of LLMs with a process similar to JSONformer or guidance.

github

: 1.8k

TokenPacker

TokenPacker is a novel visual projector that compresses visual tokens by 75%∼89% with high efficiency. It adopts a 'coarse-to-fine' scheme to generate condensed visual tokens, achieving comparable or better performance across diverse benchmarks. The tool includes TokenPacker for general use and TokenPacker-HD for high-resolution image understanding. It provides training scripts, checkpoints, and supports various compression ratios and patch numbers.

github

: 235

StableToolBench

StableToolBench is a new benchmark developed to address the instability of Tool Learning benchmarks. It aims to balance stability and reality by introducing features like Virtual API System, Solvable Queries, and Stable Evaluation System. The benchmark ensures consistency through a caching system and API simulators, filters queries based on solvability using LLMs, and evaluates model performance using GPT-4 with metrics like Solvable Pass Rate and Solvable Win Rate.

github

: 135

star-vector

StarVector is a multimodal vision-language model for Scalable Vector Graphics (SVG) generation. It can be used to perform image2SVG and text2SVG generation. StarVector works directly in the SVG code space, leveraging visual understanding to apply accurate SVG primitives. It achieves state-of-the-art performance in producing compact and semantically rich SVGs. The tool provides Hugging Face model checkpoints for image2SVG vectorization, with models like StarVector-8B and StarVector-1B. It also offers datasets like SVG-Stack, SVG-Fonts, SVG-Icons, SVG-Emoji, and SVG-Diagrams for evaluation. StarVector can be trained using Deepspeed or FSDP for tasks like Image2SVG and Text2SVG generation. The tool provides a demo with options for HuggingFace generation or VLLM backend for faster generation speed.

github

: 118

SimpleAICV_pytorch_training_examples

SimpleAICV_pytorch_training_examples is a repository that provides simple training and testing examples for various computer vision tasks such as image classification, object detection, semantic segmentation, instance segmentation, knowledge distillation, contrastive learning, masked image modeling, OCR text detection, OCR text recognition, human matting, salient object detection, interactive segmentation, image inpainting, and diffusion model tasks. The repository includes support for multiple datasets and networks, along with instructions on how to prepare datasets, train and test models, and use gradio demos. It also offers pretrained models and experiment records for download from huggingface or Baidu-Netdisk. The repository requires specific environments and package installations to run effectively.

github

: 429

TinyLLM

TinyLLM is a project that helps build a small locally hosted language model with a web interface using consumer-grade hardware. It supports multiple language models, builds a local OpenAI API web service, and serves a Chatbot web interface with customizable prompts. The project requires specific hardware and software configurations for optimal performance. Users can run a local language model using inference servers like vLLM, llama-cpp-python, and Ollama. The Chatbot feature allows users to interact with the language model through a web-based interface, supporting features like summarizing websites, displaying news headlines, stock prices, weather conditions, and using vector databases for queries.

github

: 224

RobustVLM

This repository contains code for the paper 'Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models'. It focuses on fine-tuning CLIP in an unsupervised manner to enhance its robustness against visual adversarial attacks. By replacing the vision encoder of large vision-language models with the fine-tuned CLIP models, it achieves state-of-the-art adversarial robustness on various vision-language tasks. The repository provides adversarially fine-tuned ViT-L/14 CLIP models and offers insights into zero-shot classification settings and clean accuracy improvements.

github

: 58

LLaVA-MORE

LLaVA-MORE is a new family of Multimodal Language Models (MLLMs) that integrates recent language models with diverse visual backbones. The repository provides a unified training protocol for fair comparisons across all architectures and releases training code and scripts for distributed training. It aims to enhance Multimodal LLM performance and offers various models for different tasks. Users can explore different visual backbones like SigLIP and methods for managing image resolutions (S2) to improve the connection between images and language. The repository is a starting point for expanding the study of Multimodal LLMs and enhancing new features in the field.

github

: 109

ramalama

The Ramalama project simplifies working with AI by utilizing OCI containers. It automatically detects GPU support, pulls necessary software in a container, and runs AI models. Users can list, pull, run, and serve models easily. The tool aims to support various GPUs and platforms in the future, making AI setup hassle-free.

github

: 1.5k

last_layer

last_layer is a security library designed to protect LLM applications from prompt injection attacks, jailbreaks, and exploits. It acts as a robust filtering layer to scrutinize prompts before they are processed by LLMs, ensuring that only safe and appropriate content is allowed through. The tool offers ultra-fast scanning with low latency, privacy-focused operation without tracking or network calls, compatibility with serverless platforms, advanced threat detection mechanisms, and regular updates to adapt to evolving security challenges. It significantly reduces the risk of prompt-based attacks and exploits but cannot guarantee complete protection against all possible threats.

github

: 79

EasyEdit

EasyEdit is a Python package for edit Large Language Models (LLM) like `GPT-J`, `Llama`, `GPT-NEO`, `GPT2`, `T5`(support models from **1B** to **65B**), the objective of which is to alter the behavior of LLMs efficiently within a specific domain without negatively impacting performance across other inputs. It is designed to be easy to use and easy to extend.

github

: 2.2k

For similar tasks

mllm

github

: 703

llama-on-lambda

This project provides a proof of concept for deploying a scalable, serverless LLM Generative AI inference engine on AWS Lambda. It leverages the llama.cpp project to enable the usage of more accessible CPU and RAM configurations instead of limited and expensive GPU capabilities. By deploying a container with the llama.cpp converted models onto AWS Lambda, this project offers the advantages of scale, minimizing cost, and maximizing compute availability. The project includes AWS CDK code to create and deploy a Lambda function leveraging your model of choice, with a FastAPI frontend accessible from a Lambda URL. It is important to note that you will need ggml quantized versions of your model and model sizes under 6GB, as your inference RAM requirements cannot exceed 9GB or your Lambda function will fail.

github

: 150

llava-docker

github

: 59

llm_interview_note

This repository provides a comprehensive overview of large language models (LLMs), covering various aspects such as their history, types, underlying architecture, training techniques, and applications. It includes detailed explanations of key concepts like Transformer models, distributed training, fine-tuning, and reinforcement learning. The repository also discusses the evaluation and limitations of LLMs, including the phenomenon of hallucinations. Additionally, it provides a list of related courses and references for further exploration.

github

: 2.1k

chat-your-doc

Chat Your Doc is an experimental project exploring various applications based on LLM technology. It goes beyond being just a chatbot project, focusing on researching LLM applications using tools like LangChain and LlamaIndex. The project delves into UX, computer vision, and offers a range of examples in the 'Lab Apps' section. It includes links to different apps, descriptions, launch commands, and demos, aiming to showcase the versatility and potential of LLM applications.

github

: 67

FlowDown-App

FlowDown is a blazing fast and smooth client app for using AI/LLM. It is lightweight and efficient with markdown support, universal compatibility, blazing fast text rendering, automated chat titles, and privacy by design. There are two editions available: FlowDown and FlowDown Community, with various features like chat with AI, fast markdown, privacy by design, bring your own LLM, offline LLM w/ MLX, visual LLM, web search, attachments, and language localization. FlowDown Community is now open-source, empowering developers to build interactive and responsive AI client apps.

github

: 363

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

haystack-core-integrations

This repository contains integrations to extend the capabilities of Haystack version 2.0 and onwards. The code in this repo is maintained by deepset, see each integration's `README` file for details around installation, usage and support.

github

: 144

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k

mllm

README:

mllm

fast and lightweight multimodal LLM inference engine for mobile and edge devices

| Arm CPU | X86 CPU | Qualcomm NPU(QNN) |

Recent update

Contents

Android Demo

Support models

Language models

Multimodal models

Quick Start

Get the Code

Check prerequisites

Run Qwen with Hexagon NPU accelerating using QNN

Run with the CPU of Android

Build

Run Fuyu-8B

Run LLaMA-2-7B

Run ImageBind

Run for Linux

Build

Run Fuyu-8B

Run LLaMA-2-7B

Run ImageBind

Customization

Convert models

Convert vocabulary

Quantize models

Roadmap

Documentation

Contribution

Acknowledgments

License

Overall Project License

Apache 2.0 Licensed Components

Citation

For Tasks:

For Jobs:

Alternative AI tools for mllm

Similar Open Source Tools

mllm

MooER

portkey-python-sdk

llava-docker

AiOS

floneum

TokenPacker

StableToolBench

star-vector

SimpleAICV_pytorch_training_examples

TinyLLM

RobustVLM

LLaVA-MORE

ramalama

last_layer

EasyEdit

For similar tasks

mllm

llama-on-lambda

llava-docker

llm_interview_note

chat-your-doc

FlowDown-App

VisionCraft

haystack-core-integrations

For similar jobs

sweep

teams-ai

ai-guide

classifai

chatbot-ui

BricksLLM

uAgents

griptape