
Native-LLM-for-Android
Demonstration of running a native LLM on Android device.
Stars: 179

This repository provides a demonstration of running a native Large Language Model (LLM) on Android devices. It supports various models such as Qwen2.5-Instruct, MiniCPM-DPO/SFT, Yuan2.0, Gemma2-it, StableLM2-Chat/Zephyr, and Phi3.5-mini-instruct. The demo models are optimized for extreme execution speed after being converted from HuggingFace or ModelScope. Users can download the demo models from the provided drive link, place them in the assets folder, and follow specific instructions for decompression and model export. The repository also includes information on quantization methods and performance benchmarks for different models on various devices.
README:
Demonstration of running a native Large Language Model (LLM) on Android devices. Currently supported models include:
- Qwen3: 0.6B, 1.7B, 4B...
- Qwen2.5-Instruct: 0.5B, 1.5B, 3B...
- Qwen2.5VL: 3B
- DeepSeek-R1-Distill-Qwen: 1.5B
- MiniCPM-DPO/SFT: 1B, 2.7B
- Gemma-3-it: 1B, 4B...
- Phi-4-mini-Instruct: 3.8B
- Llama-3.2-Instruct: 1B
- InternVL-Mono: 2B
- InternLM-3: 8B
- Seed-X: PRO-7B, Instruct-7B
- HunYuan: MT-7B
- 2025/09/07:Update HunYuan-MT.
- 2025/08/02:Update Seed-X.
- 2025/04/29:Update Qwen3.
- 2025/04/05:Update Qwen2.5, InternVL-Mono
q4f32
+dynamic_axes
. - 2025/02/22:Support loading with low memory mode:
Qwen
,QwenVL
,MiniCPM_2B_single
; Setlow_memory_mode = true
inMainActivity.java
. - 2025/02/07:DeepSeek-R1-Distill-Qwen: 1.5B (Please using
Qwen v2.5 Qwen_Export.py
)
-
Download Models:
- Quick Try: Qwen3-1.7B-Android
-
Setup Instructions:
- Place the downloaded model files into the
assets
folder. - Decompress the
*.so
files stored in thelibs/arm64-v8a
folder.
- Place the downloaded model files into the
-
Model Notes:
- Demo models are converted from HuggingFace or ModelScope and optimized for extreme execution speed.
- Inputs and outputs may differ slightly from the original models.
- For Qwen2VL / Qwen2.5VL, adjust the key variables to match the model parameters.
GLRender.java: Line 37, 38, 39
project.h: Line 14, 15, 16, 35, 36, 41, 59, 60
-
ONNX Export Considerations:
- It is recommended to use dynamic axes and q4f32 quantization.
- The
tokenizer.cpp
andtokenizer.hpp
files are sourced from the mnn-llm repository.
- Navigate to the
Export_ONNX
folder. - Follow the comments in the Python scripts to set the folder paths.
- Execute the
***_Export.py
script to export the model. - Quantize or optimize the ONNX model manually.
- Use
onnxruntime.tools.convert_onnx_models_to_ort
to convert models to*.ort
format. Note that this process automatically addsCast
operators that change FP16 multiplication to FP32. - The quantization methods are detailed in the
Do_Quantize
folder.
- Explore more projects: DakeQQ Projects
OS | Device | Backend | Model | Inference (1024 Context) |
---|---|---|---|---|
Android 13 | Nubia Z50 | 8_Gen2-CPU | Qwen-2-1.5B-Instruct q8f32 |
20 token/s |
Android 15 | Vivo x200 Pro | MediaTek_9400-CPU | Qwen-3-1.7B-Instruct q4f32 dynamic |
37 token/s |
Harmony 4 | P40 | Kirin_990_5G-CPU | Qwen-3-1.7B-Instruct q4f32 dynamic |
18.5 token/s |
Harmony 4 | P40 | Kirin_990_5G-CPU | Qwen-2.5-1.5B-Instruct q4f32 dynamic |
20.5 token/s |
Harmony 4 | P40 | Kirin_990_5G-CPU | Qwen-2-1.5B-Instruct q8f32 |
13 token/s |
Harmony 3 | 荣耀 20S | Kirin_810-CPU | Qwen-2-1.5B-Instruct q8f32 |
7 token/s |
OS | Device | Backend | Model | Inference (1024 Context) |
---|---|---|---|---|
Android 13 | Nubia Z50 | 8_Gen2-CPU | QwenVL-2-2B q8f32 |
15 token/s |
Harmony 4 | P40 | Kirin_990_5G-CPU | QwenVL-2-2B q8f32 |
9 token/s |
Harmony 4 | P40 | Kirin_990_5G-CPU | QwenVL-2.5-3B q4f32 dynamic |
9 token/s |
OS | Device | Backend | Model | Inference (1024 Context) |
---|---|---|---|---|
Android 13 | Nubia Z50 | 8_Gen2-CPU | Distill-Qwen-1.5B q4f32 dynamic |
34.5 token/s |
Harmony 4 | P40 | Kirin_990_5G-CPU | Distill-Qwen-1.5B q4f32 dynamic |
20.5 token/s |
Harmony 4 | P40 | Kirin_990_5G-CPU | Distill-Qwen-1.5B q8f32 |
13 token/s |
HyperOS 2 | Xiaomi-14T-Pro | MediaTek_9300+-CPU | Distill-Qwen-1.5B q8f32 |
22 token/s |
OS | Device | Backend | Model | Inference (1024 Context) |
---|---|---|---|---|
Android 15 | Nubia Z50 | 8_Gen2-CPU | MiniCPM4-0.5B q4f32 |
78 token/s |
Android 13 | Nubia Z50 | 8_Gen2-CPU | MiniCPM-2.7B q8f32 |
9.5 token/s |
Android 13 | Nubia Z50 | 8_Gen2-CPU | MiniCPM-1.3B q8f32 |
16.5 token/s |
Harmony 4 | P40 | Kirin_990_5G-CPU | MiniCPM-2.7B q8f32 |
6 token/s |
Harmony 4 | P40 | Kirin_990_5G-CPU | MiniCPM-1.3B q8f32 |
11 token/s |
OS | Device | Backend | Model | Inference (1024 Context) |
---|---|---|---|---|
Android 13 | Nubia Z50 | 8_Gen2-CPU | Gemma-1.1-it-2B q8f32 |
16 token/s |
OS | Device | Backend | Model | Inference (1024 Context) |
---|---|---|---|---|
Android 13 | Nubia Z50 | 8_Gen2-CPU | Phi-2-2B-Orange-V2 q8f32 |
9.5 token/s |
Harmony 4 | P40 | Kirin_990_5G-CPU | Phi-2-2B-Orange-V2 q8f32 |
5.8 token/s |
OS | Device | Backend | Model | Inference (1024 Context) |
---|---|---|---|---|
Android 13 | Nubia Z50 | 8_Gen2-CPU | Llama-3.2-1B-Instruct q8f32 |
25 token/s |
Harmony 4 | P40 | Kirin_990_5G-CPU | Llama-3.2-1B-Instruct q8f32 |
16 token/s |
OS | Device | Backend | Model | Inference (1024 Context) |
---|---|---|---|---|
Harmony 4 | P40 | Kirin_990_5G-CPU | Mono-2B-S1-3 q4f32 dynamic |
10.5 token/s |
OS | Device | Backend | Model | Inference (1024 Context) |
---|---|---|---|---|
Android 15 | Nubia Z50 | 8_Gen2-CPU | MiniCPM4-0.5B q4f32 |
78 token/s |
展示在 Android 设备上运行原生大型语言模型 (LLM) 的示范。目前支持的模型包括:
- Qwen3: 0.6B, 1.7B, 4B...
- Qwen2.5-Instruct: 0.5B, 1.5B, 3B...
- Qwen2.5VL: 3B
- DeepSeek-R1-Distill-Qwen: 1.5B
- MiniCPM-DPO/SFT: 1B, 2.7B
- Gemma-3-it: 1B, 4B...
- Phi-4-mini-Instruct: 3.8B
- Llama-3.2-Instruct: 1B
- InternVL-Mono: 2B
- InternLM-3: 8B
- Seed-X: PRO-7B, Instruct-7B
- HunYuan: MT-7B
- 2025/09/07:更新 HunYuan-MT。
- 2025/08/02:更新 Seed-X。
- 2025/04/29:更新 Qwen3。
- 2025/04/05: 更新 Qwen2.5, InternVL-Mono
q4f32
+dynamic_axes
。 - 2025/02/22:支持低内存模式加载:
Qwen
,QwenVL
,MiniCPM_2B_single
; Setlow_memory_mode = true
inMainActivity.java
. - 2025/02/07:DeepSeek-R1-Distill-Qwen: 1.5B (请使用
Qwen v2.5 Qwen_Export.py
)。
-
下载模型:
- Quick Try: Qwen3-1.7B-Android
-
设置说明:
- 将下载的模型文件放入
assets
文件夹。 - 解压存储在
libs/arm64-v8a
文件夹中的*.so
文件。
- 将下载的模型文件放入
-
模型说明:
- 演示模型是从 HuggingFace 或 ModelScope 转换而来,并针对极限执行速度进行了优化。
- 输入和输出可能与原始模型略有不同。
- 对于Qwen2VL / Qwen2.5VL,请调整关键变量以匹配模型参数。
GLRender.java: Line 37, 38, 39
project.h: Line 14, 15, 16, 35, 36, 41, 59, 60
-
ONNX 导出注意事项:
- 推荐使用动态轴以及
q4f32
量化。
- 推荐使用动态轴以及
-
tokenizer.cpp
和tokenizer.hpp
文件来源于 mnn-llm 仓库。
- 进入
Export_ONNX
文件夹。 - 按照 Python 脚本中的注释设置文件夹路径。
- 执行
***_Export.py
脚本以导出模型。 - 手动量化或优化 ONNX 模型。
- 使用
onnxruntime.tools.convert_onnx_models_to_ort
将模型转换为*.ort
格式。注意该过程会自动添加Cast
操作符,将 FP16 乘法改为 FP32。 - 量化方法详见
Do_Quantize
文件夹。
- 探索更多项目:DakeQQ Projects
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Native-LLM-for-Android
Similar Open Source Tools

Native-LLM-for-Android
This repository provides a demonstration of running a native Large Language Model (LLM) on Android devices. It supports various models such as Qwen2.5-Instruct, MiniCPM-DPO/SFT, Yuan2.0, Gemma2-it, StableLM2-Chat/Zephyr, and Phi3.5-mini-instruct. The demo models are optimized for extreme execution speed after being converted from HuggingFace or ModelScope. Users can download the demo models from the provided drive link, place them in the assets folder, and follow specific instructions for decompression and model export. The repository also includes information on quantization methods and performance benchmarks for different models on various devices.

Element-Plus-X
Element-Plus-X is an out-of-the-box enterprise-level AI component library based on Vue 3 + Element-Plus. It features built-in scenario components such as chatbots and voice interactions, seamless integration with zero configuration based on Element-Plus design system, and support for on-demand loading with Tree Shaking optimization.

LLM-TPU
LLM-TPU project aims to deploy various open-source generative AI models on the BM1684X chip, with a focus on LLM. Models are converted to bmodel using TPU-MLIR compiler and deployed to PCIe or SoC environments using C++ code. The project has deployed various open-source models such as Baichuan2-7B, ChatGLM3-6B, CodeFuse-7B, DeepSeek-6.7B, Falcon-40B, Phi-3-mini-4k, Qwen-7B, Qwen-14B, Qwen-72B, Qwen1.5-0.5B, Qwen1.5-1.8B, Llama2-7B, Llama2-13B, LWM-Text-Chat, Mistral-7B-Instruct, Stable Diffusion, Stable Diffusion XL, WizardCoder-15B, Yi-6B-chat, Yi-34B-chat. Detailed model deployment information can be found in the 'models' subdirectory of the project. For demonstrations, users can follow the 'Quick Start' section. For inquiries about the chip, users can contact SOPHGO via the official website.

rulm
This repository contains language models for the Russian language, as well as their implementation and comparison. The models are trained on a dataset of ChatGPT-generated instructions and chats in Russian. They can be used for a variety of tasks, including question answering, text generation, and translation.

agentica
Agentica is a human-centric framework for building large language model agents. It provides functionalities for planning, memory management, tool usage, and supports features like reflection, planning and execution, RAG, multi-agent, multi-role, and workflow. The tool allows users to quickly code and orchestrate agents, customize prompts, and make API calls to various services. It supports API calls to OpenAI, Azure, Deepseek, Moonshot, Claude, Ollama, and Together. Agentica aims to simplify the process of building AI agents by providing a user-friendly interface and a range of functionalities for agent development.

video-subtitle-remover
Video-subtitle-remover (VSR) is a software based on AI technology that removes hard subtitles from videos. It achieves the following functions: - Lossless resolution: Remove hard subtitles from videos, generate files with subtitles removed - Fill the region of removed subtitles using a powerful AI algorithm model (non-adjacent pixel filling and mosaic removal) - Support custom subtitle positions, only remove subtitles in defined positions (input position) - Support automatic removal of all text in the entire video (no input position required) - Support batch removal of watermark text from multiple images.

VoiceBench
VoiceBench is a repository containing code and data for benchmarking LLM-Based Voice Assistants. It includes a leaderboard with rankings of various voice assistant models based on different evaluation metrics. The repository provides setup instructions, datasets, evaluation procedures, and a curated list of awesome voice assistants. Users can submit new voice assistant results through the issue tracker for updates on the ranking list.

XiaoXinAir14IML_2019_hackintosh
XiaoXinAir14IML_2019_hackintosh is a repository dedicated to enabling macOS installation on Lenovo XiaoXin Air-14 IML 2019 laptops. The repository provides detailed information on the hardware specifications, supported systems, BIOS versions, related models, installation methods, updates, patches, and recommended settings. It also includes tools and guides for BIOS modifications, enabling high-resolution display settings, Bluetooth synchronization between macOS and Windows 10, voltage adjustments for efficiency, and experimental support for YogaSMC. The repository offers solutions for various issues like sleep support, sound card emulation, and battery information. It acknowledges the contributions of developers and tools like OpenCore, itlwm, VoodooI2C, and ALCPlugFix.

Speech-AI-Forge
Speech-AI-Forge is a project developed around TTS generation models, implementing an API Server and a WebUI based on Gradio. The project offers various ways to experience and deploy Speech-AI-Forge, including online experience on HuggingFace Spaces, one-click launch on Colab, container deployment with Docker, and local deployment. The WebUI features include TTS model functionality, speaker switch for changing voices, style control, long text support with automatic text segmentation, refiner for ChatTTS native text refinement, various tools for voice control and enhancement, support for multiple TTS models, SSML synthesis control, podcast creation tools, voice creation, voice testing, ASR tools, and post-processing tools. The API Server can be launched separately for higher API throughput. The project roadmap includes support for various TTS models, ASR models, voice clone models, and enhancer models. Model downloads can be manually initiated using provided scripts. The project aims to provide inference services and may include training-related functionalities in the future.

build_MiniLLM_from_scratch
This repository aims to build a low-parameter LLM model through pretraining, fine-tuning, model rewarding, and reinforcement learning stages to create a chat model capable of simple conversation tasks. It features using the bert4torch training framework, seamless integration with transformers package for inference, optimized file reading during training to reduce memory usage, providing complete training logs for reproducibility, and the ability to customize robot attributes. The chat model supports multi-turn conversations. The trained model currently only supports basic chat functionality due to limitations in corpus size, model scale, SFT corpus size, and quality.

angular-node-java-ai
This repository contains a project that integrates Angular frontend, Node.js backend, Java services, and AI capabilities. The project aims to demonstrate a full-stack application with modern technologies and AI features. It showcases how to build a scalable and efficient system using Angular for the frontend, Node.js for the backend, Java for services, and AI for advanced functionalities.

BlueLM
BlueLM is a large-scale pre-trained language model developed by vivo AI Global Research Institute, featuring 7B base and chat models. It includes high-quality training data with a token scale of 26 trillion, supporting both Chinese and English languages. BlueLM-7B-Chat excels in C-Eval and CMMLU evaluations, providing strong competition among open-source models of similar size. The models support 32K long texts for better context understanding while maintaining base capabilities. BlueLM welcomes developers for academic research and commercial applications.

Qwen-TensorRT-LLM
Qwen-TensorRT-LLM is a project developed for the NVIDIA TensorRT Hackathon 2023, focusing on accelerating inference for the Qwen-7B-Chat model using TRT-LLM. The project offers various functionalities such as FP16/BF16 support, INT8 and INT4 quantization options, Tensor Parallel for multi-GPU parallelism, web demo setup with gradio, Triton API deployment for maximum throughput/concurrency, fastapi integration for openai requests, CLI interaction, and langchain support. It supports models like qwen2, qwen, and qwen-vl for both base and chat models. The project also provides tutorials on Bilibili and blogs for adapting Qwen models in NVIDIA TensorRT-LLM, along with hardware requirements and quick start guides for different model types and quantization methods.

west
WeST is a Speech Recognition/Transcript tool developed in 300 lines of code, inspired by SLAM-ASR and LLaMA 3.1. The model includes a Language Model (LLM), a Speech Encoder, and a trainable Projector. It requires training data in jsonl format with 'wav' and 'txt' entries. WeST can be used for training and decoding speech recognition models.

ipex-llm
The `ipex-llm` repository is an LLM acceleration library designed for Intel GPU, NPU, and CPU. It provides seamless integration with various models and tools like llama.cpp, Ollama, HuggingFace transformers, LangChain, LlamaIndex, vLLM, Text-Generation-WebUI, DeepSpeed-AutoTP, FastChat, Axolotl, and more. The library offers optimizations for over 70 models, XPU acceleration, and support for low-bit (FP8/FP6/FP4/INT4) operations. Users can run different models on Intel GPUs, NPU, and CPUs with support for various features like finetuning, inference, serving, and benchmarking.

ZhiLight
ZhiLight is a highly optimized large language model (LLM) inference engine developed by Zhihu and ModelBest Inc. It accelerates the inference of models like Llama and its variants, especially on PCIe-based GPUs. ZhiLight offers significant performance advantages compared to mainstream open-source inference engines. It supports various features such as custom defined tensor and unified global memory management, optimized fused kernels, support for dynamic batch, flash attention prefill, prefix cache, and different quantization techniques like INT8, SmoothQuant, FP8, AWQ, and GPTQ. ZhiLight is compatible with OpenAI interface and provides high performance on mainstream NVIDIA GPUs with different model sizes and precisions.
For similar tasks

lighteval
LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron. We're releasing it with the community in the spirit of building in the open. Note that it is still very much early so don't expect 100% stability ^^' In case of problems or question, feel free to open an issue!

Firefly
Firefly is an open-source large model training project that supports pre-training, fine-tuning, and DPO of mainstream large models. It includes models like Llama3, Gemma, Qwen1.5, MiniCPM, Llama, InternLM, Baichuan, ChatGLM, Yi, Deepseek, Qwen, Orion, Ziya, Xverse, Mistral, Mixtral-8x7B, Zephyr, Vicuna, Bloom, etc. The project supports full-parameter training, LoRA, QLoRA efficient training, and various tasks such as pre-training, SFT, and DPO. Suitable for users with limited training resources, QLoRA is recommended for fine-tuning instructions. The project has achieved good results on the Open LLM Leaderboard with QLoRA training process validation. The latest version has significant updates and adaptations for different chat model templates.

Awesome-Text2SQL
Awesome Text2SQL is a curated repository containing tutorials and resources for Large Language Models, Text2SQL, Text2DSL, Text2API, Text2Vis, and more. It provides guidelines on converting natural language questions into structured SQL queries, with a focus on NL2SQL. The repository includes information on various models, datasets, evaluation metrics, fine-tuning methods, libraries, and practice projects related to Text2SQL. It serves as a comprehensive resource for individuals interested in working with Text2SQL and related technologies.

create-million-parameter-llm-from-scratch
The 'create-million-parameter-llm-from-scratch' repository provides a detailed guide on creating a Large Language Model (LLM) with 2.3 million parameters from scratch. The blog replicates the LLaMA approach, incorporating concepts like RMSNorm for pre-normalization, SwiGLU activation function, and Rotary Embeddings. The model is trained on a basic dataset to demonstrate the ease of creating a million-parameter LLM without the need for a high-end GPU.

StableToolBench
StableToolBench is a new benchmark developed to address the instability of Tool Learning benchmarks. It aims to balance stability and reality by introducing features such as a Virtual API System with caching and API simulators, a new set of solvable queries determined by LLMs, and a Stable Evaluation System using GPT-4. The Virtual API Server can be set up either by building from source or using a prebuilt Docker image. Users can test the server using provided scripts and evaluate models with Solvable Pass Rate and Solvable Win Rate metrics. The tool also includes model experiments results comparing different models' performance.

BetaML.jl
The Beta Machine Learning Toolkit is a package containing various algorithms and utilities for implementing machine learning workflows in multiple languages, including Julia, Python, and R. It offers a range of supervised and unsupervised models, data transformers, and assessment tools. The models are implemented entirely in Julia and are not wrappers for third-party models. Users can easily contribute new models or request implementations. The focus is on user-friendliness rather than computational efficiency, making it suitable for educational and research purposes.

AI-TOD
AI-TOD is a dataset for tiny object detection in aerial images, containing 700,621 object instances across 28,036 images. Objects in AI-TOD are smaller with a mean size of 12.8 pixels compared to other aerial image datasets. To use AI-TOD, download xView training set and AI-TOD_wo_xview, then generate the complete dataset using the provided synthesis tool. The dataset is publicly available for academic and research purposes under CC BY-NC-SA 4.0 license.

UMOE-Scaling-Unified-Multimodal-LLMs
Uni-MoE is a MoE-based unified multimodal model that can handle diverse modalities including audio, speech, image, text, and video. The project focuses on scaling Unified Multimodal LLMs with a Mixture of Experts framework. It offers enhanced functionality for training across multiple nodes and GPUs, as well as parallel processing at both the expert and modality levels. The model architecture involves three training stages: building connectors for multimodal understanding, developing modality-specific experts, and incorporating multiple trained experts into LLMs using the LoRA technique on mixed multimodal data. The tool provides instructions for installation, weights organization, inference, training, and evaluation on various datasets.
For similar jobs

react-native-vision-camera
VisionCamera is a powerful, high-performance Camera library for React Native. It features Photo and Video capture, QR/Barcode scanner, Customizable devices and multi-cameras ("fish-eye" zoom), Customizable resolutions and aspect-ratios (4k/8k images), Customizable FPS (30..240 FPS), Frame Processors (JS worklets to run facial recognition, AI object detection, realtime video chats, ...), Smooth zooming (Reanimated), Fast pause and resume, HDR & Night modes, Custom C++/GPU accelerated video pipeline (OpenGL).

iris_android
This repository contains an offline Android chat application based on llama.cpp example. Users can install, download models, and run the app completely offline and privately. To use the app, users need to go to the releases page, download and install the app. Building the app requires downloading Android Studio, cloning the repository, and importing it into Android Studio. The app can be run offline by following specific steps such as enabling developer options, wireless debugging, and downloading the stable LM model. The project is maintained by Nerve Sparks and contributions are welcome through creating feature branches and pull requests.

aiolauncher_scripts
AIO Launcher Scripts is a collection of Lua scripts that can be used with AIO Launcher to enhance its functionality. These scripts can be used to create widget scripts, search scripts, and side menu scripts. They provide various functions such as displaying text, buttons, progress bars, charts, and interacting with app widgets. The scripts can be used to customize the appearance and behavior of the launcher, add new features, and interact with external services.

gemini-android
Gemini Android is a repository showcasing Google's Generative AI on Android using Stream Chat SDK for Compose. It demonstrates the Gemini API for Android, implements UI elements with Jetpack Compose, utilizes Android architecture components like Hilt and AppStartup, performs background tasks with Kotlin Coroutines, and integrates chat systems with Stream Chat Compose SDK for real-time event handling. The project also provides technical content, instructions on building the project, tech stack details, architecture overview, modularization strategies, and a contribution guideline. It follows Google's official architecture guidance and offers a real-world example of app architecture implementation.

blinkid-android
The BlinkID Android SDK is a comprehensive solution for implementing secure document scanning and extraction. It offers powerful capabilities for extracting data from a wide range of identification documents. The SDK provides features for integrating document scanning into Android apps, including camera requirements, SDK resource pre-bundling, customizing the UX, changing default strings and localization, troubleshooting integration difficulties, and using the SDK through various methods. It also offers options for completely custom UX with low-level API integration. The SDK size is optimized for different processor architectures, and API documentation is available for reference. For any questions or support, users can contact the Microblink team at help.microblink.com.

react-native-airship
React Native Airship is a module designed to integrate Airship's iOS and Android SDKs into React Native applications. It provides developers with the necessary tools to incorporate Airship's push notification services seamlessly. The module offers a simple and efficient way to leverage Airship's features within React Native projects, enhancing user engagement and retention through targeted notifications.

gpt_mobile
GPT Mobile is a chat assistant for Android that allows users to chat with multiple models at once. It supports various platforms such as OpenAI GPT, Anthropic Claude, and Google Gemini. Users can customize temperature, top p (Nucleus sampling), and system prompt. The app features local chat history, Material You style UI, dark mode support, and per app language setting for Android 13+. It is built using 100% Kotlin, Jetpack Compose, and follows a modern app architecture for Android developers.

Native-LLM-for-Android
This repository provides a demonstration of running a native Large Language Model (LLM) on Android devices. It supports various models such as Qwen2.5-Instruct, MiniCPM-DPO/SFT, Yuan2.0, Gemma2-it, StableLM2-Chat/Zephyr, and Phi3.5-mini-instruct. The demo models are optimized for extreme execution speed after being converted from HuggingFace or ModelScope. Users can download the demo models from the provided drive link, place them in the assets folder, and follow specific instructions for decompression and model export. The repository also includes information on quantization methods and performance benchmarks for different models on various devices.