gemini-android

✨ Gemini Android demonstrates Google's Generative AI on Android with Stream Chat SDK for Compose.

Stars: 303

Visit

Gemini Android is a repository showcasing Google's Generative AI on Android using Stream Chat SDK for Compose. It demonstrates the Gemini API for Android, implements UI elements with Jetpack Compose, utilizes Android architecture components like Hilt and AppStartup, performs background tasks with Kotlin Coroutines, and integrates chat systems with Stream Chat Compose SDK for real-time event handling. The project also provides technical content, instructions on building the project, tech stack details, architecture overview, modularization strategies, and a contribution guideline. It follows Google's official architecture guidance and offers a real-world example of app architecture implementation.

README:

Gemini Android demonstrates Google's Generative AI on Android with Stream Chat SDK for Compose.

The purpose of this repository is to demonstrate below:

Demonstrates Gemini API for Android.
Implementing entire UI elements with Jetpack Compose.
Implementation of Android architecture components with Jetpack libraries such as Hilt and AppStartup.
Performing background tasks with Kotlin Coroutines.
Integrating chat systems with Stream Chat Compose SDK for real-time event handling.

📷 Previews

✍️ Technical Content

If you're interested in the overall architecture, each layer, Generative AI, Gemini SDK, and implementation details of this project, check out the following blog post: Build an AI Chat Android App With Google’s Generative AI

🛥 Stream Chat & Video SDK

Gemini Android is built with Stream Chat SDK for Compose to implement messaging systems. If you’re interested in building powerful real-time video/audio calling, audio room, and livestreaming, check out the Stream Video SDK for Compose!

Stream Chat

Stream Video

💻 How to build the project?

To build this project properly, you should follow the instructions below:

Go to the Stream login page.
If you have your GitHub account, click the SIGN UP WITH GITHUB button and you can sign up within a couple of seconds.

If you don't have a GitHub account, fill in the inputs and click the START FREE TRIAL button.
Go to the Dashboard and click the Create App button like the below.

Fill in the blanks like the below and click the Create App button.

You will see the Key like the image below and then copy it.

Create a new file named secrets.properties on the root directory of this Android project, and add the key to the secrets.properties file like the below:

STREAM_API_KEY=..

Go to your Dashboard again and click your App.
In the Overview menu, you can find the Authentication category by scrolling to the middle of the page.
Switch on the Disable Auth Checks option and click the Submit button like the image below.

Click the Explorer tab on the left side menu.
Click users -> Create New User button sequentially and add fill in the user like the below:

User Name: gemini
User ID: gemini

Go to Google AI Studio, login with your Google account and select the Get API key on the menu left like the image below:

Create your API key for using generative AI SDKs, and you'll get one like the image below:

Add the key to the secrets.properties file like the below:

GEMINI_API_KEY=..

Build and run the project.

🛠 Tech Stack & Open Source Libraries

Minimum SDK level 21.
100% Jetpack Compose based + Coroutines + Flow for asynchronous.
Compose Chat SDK for Messaging: The Jetpack Compose Chat Messaging SDK is built on a low-level chat client and provides modular, customizable Compose UI components that you can easily drop into your app.
Jetpack
- Compose: Android’s modern toolkit for building native UI.
- ViewModel: UI related data holder and lifecycle aware.
- App Startup: Provides a straightforward, performant way to initialize components at application startup.
- Navigation: For navigating screens and Hilt Navigation Compose for injecting dependencies.
- Room: Constructs Database by providing an abstraction layer over SQLite to allow fluent database access.
- Datastore: Store data asynchronously, consistently, and transactionally, overcoming some of the drawbacks of SharedPreferences.
- Hilt: Dependency Injection.
Landscapist Glide, animation, placeholder: Jetpack Compose image loading library that fetches and displays network images with Glide, Coil, and Fresco.
Retrofit2 & OkHttp3: Construct the REST APIs and paging network data.
Sandwich: Construct a lightweight and modern response interface to handle network payload for Android.
Moshi: A modern JSON library for Kotlin and Java.
ksp: Kotlin Symbol Processing API.
Balloon: Modernized and sophisticated tooltips, fully customizable with an arrow and animations for Android.
StreamLog: A lightweight and extensible logger library for Kotlin and Android.
Baseline Profiles: To improve app performance by including a list of classes and methods specifications in your APK that can be used by Android Runtime.

🏛️ Architecture

Gemini Android follows the Google's official architecture guidance.

Gemini Android was built with Guide to app architecture, so it would be a great sample to show how the architecture works in real-world projects.

The overall architecture is composed of two layers; UI Layer and the data layer. Each layer has dedicated components and they each have different responsibilities. The arrow means the component has a dependency on the target component following its direction.

Architecture Overview

Each layer has different responsibilities below. Basically, they follow unidirectional event/data flow.

UI Layer

The UI Layer consists of UI elements like buttons, menus, tabs that could interact with users and ViewModel that holds app states and restores data when configuration changes.

Data Layer

The data Layer consists of repositories, which include business logic, such as querying data from the local database and requesting remote data from the network. It is implemented as an offline-first source of business logic and follows the single source of truth principle.

For more information about the overall architecture, check out Build a Real-Time WhatsApp Clone With Jetpack Compose.

Modularization

Gemini Android has implemented the following modularization strategies:

Reusability: By effectively modularizing reusable code, it not only facilitates code sharing but also restricts code access across different modules.
Parallel Building: Modules are capable of being built in parallel, leading to reduced overall build time.
Decentralized Focusing: Individual development teams are allocated specific modules, allowing them to concentrate on their designated areas.

💯 MAD Score

🤝 Contribution

Most of the features are not completed except the chat feature, so anyone can contribute and improve this project following the Contributing Guideline.

Find this repository useful? 💙

Support it by joining stargazers for this repository. ⭐
Also, follow me on GitHub for my next creations! 🤩

License

Designed and developed by 2024 skydoves (Jaewoong Eum)

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

For Tasks:

Click tags to check more tools for each tasks

build chat app integrate generative ai implement real-time messaging develop android ui handle background tasks

For Jobs:

android developer mobile app developer software engineer ui/ux designer chat system developer

Alternative AI tools for gemini-android

Similar Open Source Tools

gemini-android

github

: 303

ai-chat-android

AI Chat Android demonstrates Google's Generative AI on Android with Firebase Realtime Database. It showcases Gemini API integration, Jetpack Compose UI elements, Android architecture components with Hilt, Kotlin Coroutines for background tasks, and Firebase Realtime Database integration for real-time events. The project follows Google's official architecture guidance with a modularized structure for reusability, parallel building, and decentralized focusing.

github

: 88

llm-answer-engine

This repository contains the code and instructions needed to build a sophisticated answer engine that leverages the capabilities of Groq, Mistral AI's Mixtral, Langchain.JS, Brave Search, Serper API, and OpenAI. Designed to efficiently return sources, answers, images, videos, and follow-up questions based on user queries, this project is an ideal starting point for developers interested in natural language processing and search technologies.

github

: 4.5k

TaskingAI

TaskingAI brings Firebase's simplicity to **AI-native app development**. The platform enables the creation of GPTs-like multi-tenant applications using a wide range of LLMs from various providers. It features distinct, modular functions such as Inference, Retrieval, Assistant, and Tool, seamlessly integrated to enhance the development process. TaskingAI’s cohesive design ensures an efficient, intelligent, and user-friendly experience in AI application development.

github

: 6.1k

comfyui_LLM_Polymath

github

: 54

swark

Swark is a VS Code extension that automatically generates architecture diagrams from code using large language models (LLMs). It is directly integrated with GitHub Copilot, requires no authentication or API key, and supports all languages. Swark helps users learn new codebases, review AI-generated code, improve documentation, understand legacy code, spot design flaws, and gain test coverage insights. It saves output in a 'swark-output' folder with diagram and log files. Source code is only shared with GitHub Copilot for privacy. The extension settings allow customization for file reading, file extensions, exclusion patterns, and language model selection. Swark is open source under the GNU Affero General Public License v3.0.

github

: 274

kollektiv

Kollektiv is a Retrieval-Augmented Generation (RAG) system designed to enable users to chat with their favorite documentation easily. It aims to provide LLMs with access to the most up-to-date knowledge, reducing inaccuracies and improving productivity. The system utilizes intelligent web crawling, advanced document processing, vector search, multi-query expansion, smart re-ranking, AI-powered responses, and dynamic system prompts. The technical stack includes Python/FastAPI for backend, Supabase, ChromaDB, and Redis for storage, OpenAI and Anthropic Claude 3.5 Sonnet for AI/ML, and Chainlit for UI. Kollektiv is licensed under a modified version of the Apache License 2.0, allowing free use for non-commercial purposes.

github

: 74

Director

Director is a framework to build video agents that can reason through complex video tasks like search, editing, compilation, generation, etc. It enables users to summarize videos, search for specific moments, create clips instantly, integrate GenAI projects and APIs, add overlays, generate thumbnails, and more. Built on VideoDB's 'video-as-data' infrastructure, Director is perfect for developers, creators, and teams looking to simplify media workflows and unlock new possibilities.

github

: 791

obsidian-smart-composer

Smart Composer is an Obsidian plugin that enhances note-taking and content creation by integrating AI capabilities. It allows users to efficiently write by referencing their vault content, providing contextual chat with precise context selection, multimedia context support for website links and images, document edit suggestions, and vault search for relevant notes. The plugin also offers features like custom model selection, local model support, custom system prompts, and prompt templates. Users can set up the plugin by installing it through the Obsidian community plugins, enabling it, and configuring API keys for supported providers like OpenAI, Anthropic, and Gemini. Smart Composer aims to streamline the writing process by leveraging AI technology within the Obsidian platform.

github

: 1.1k

ROSGPT_Vision

ROSGPT_Vision is a new robotic framework designed to command robots using only two prompts: a Visual Prompt for visual semantic features and an LLM Prompt to regulate robotic reactions. It is based on the Prompting Robotic Modalities (PRM) design pattern and is used to develop CarMate, a robotic application for monitoring driver distractions and providing real-time vocal notifications. The framework leverages state-of-the-art language models to facilitate advanced reasoning about image data and offers a unified platform for robots to perceive, interpret, and interact with visual data through natural language. LangChain is used for easy customization of prompts, and the implementation includes the CarMate application for driver monitoring and assistance.

github

: 74

Simplifine

Simplifine is an open-source library designed for easy LLM finetuning, enabling users to perform tasks such as supervised fine tuning, question-answer finetuning, contrastive loss for embedding tasks, multi-label classification finetuning, and more. It provides features like WandB logging, in-built evaluation tools, automated finetuning parameters, and state-of-the-art optimization techniques. The library offers bug fixes, new features, and documentation updates in its latest version. Users can install Simplifine via pip or directly from GitHub. The project welcomes contributors and provides comprehensive documentation and support for users.

github

: 65

coding-aider

Coding-Aider is a plugin for IntelliJ IDEA that seamlessly integrates Aider's AI-powered coding assistance into the IDE. It boosts productivity by offering rapid access for precision code generation and refactoring, with complete control over the context utilized by the LLM. The plugin provides various features such as AI-powered coding assistance, intuitive access through keyboard shortcuts, persistent file management, dual execution modes, Git integration, real-time progress tracking, multi-file support, web crawling, clipboard image support, and various specialized actions. It also supports structured mode and plans for managing complex features, working directory support, summarized output, and the ability to specify additional arguments for Aider commands. Coding-Aider addresses limitations in existing IntelliJ plugins by offering optimized token usage, a feature-rich terminal interface, a wide range of commands, and robust recovery mechanisms with seamless Git integration.

github

: 66

clearml-server

ClearML Server is a backend service infrastructure for ClearML, facilitating collaboration and experiment management. It includes a web app, RESTful API, and file server for storing images and models. Users can deploy ClearML Server using Docker, AWS EC2 AMI, or Kubernetes. The system design supports single IP or sub-domain configurations with specific open ports. ClearML-Agent Services container allows launching long-lasting jobs and various use cases like auto-scaler service, controllers, optimizer, and applications. Advanced functionality includes web login authentication and non-responsive experiments watchdog. Upgrading ClearML Server involves stopping containers, backing up data, downloading the latest docker-compose.yml file, configuring ClearML-Agent Services, and spinning up docker containers. Community support is available through ClearML FAQ, Stack Overflow, GitHub issues, and email contact.

github

: 364

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675

voice-pro

Voice-Pro is an integrated solution for subtitles, translation, and TTS. It offers features like multilingual subtitles, live translation, vocal remover, and supports OpenAI Whisper and Open-Source Translator. The tool provides a Studio tab for various functions, Whisper Caption tab for subtitle creation, Translate tab for translation, TTS tab for text-to-speech, Live Translation tab for real-time voice recognition, and Batch tab for processing multiple files. Users can download YouTube videos, improve voice recognition accuracy, create automatic subtitles, and produce multilingual videos with ease. The tool is easy to install with one-click and offers a Web-UI for user convenience.

github

: 233

motia

Motia is an AI agent framework designed for software engineers to create, test, and deploy production-ready AI agents quickly. It provides a code-first approach, allowing developers to write agent logic in familiar languages and visualize execution in real-time. With Motia, developers can focus on business logic rather than infrastructure, offering zero infrastructure headaches, multi-language support, composable steps, built-in observability, instant APIs, and full control over AI logic. Ideal for building sophisticated agents and intelligent automations, Motia's event-driven architecture and modular steps enable the creation of GenAI-powered workflows, decision-making systems, and data processing pipelines.

github

: 1.5k

For similar tasks

gemini-android

github

: 303

ai-chat-android

github

: 88

aiortc

aiortc is a Python library for Web Real-Time Communication (WebRTC) and Object Real-Time Communication (ORTC). It provides a simple and readable implementation for programmers to understand and tinker with WebRTC internals. The library allows for exchanging audio, video, and data channels, supports SDP generation/parsing, ICE, DTLS, SRTP, SCTP, and various audio/video codecs. It also enables creating innovative products by leveraging Python ecosystem modules, such as computer vision algorithms with OpenCV. Extensive testing ensures high code quality.

github

: 4.6k

For similar jobs

react-native-vision-camera

VisionCamera is a powerful, high-performance Camera library for React Native. It features Photo and Video capture, QR/Barcode scanner, Customizable devices and multi-cameras ("fish-eye" zoom), Customizable resolutions and aspect-ratios (4k/8k images), Customizable FPS (30..240 FPS), Frame Processors (JS worklets to run facial recognition, AI object detection, realtime video chats, ...), Smooth zooming (Reanimated), Fast pause and resume, HDR & Night modes, Custom C++/GPU accelerated video pipeline (OpenGL).

github

: 8.2k

iris_android

This repository contains an offline Android chat application based on llama.cpp example. Users can install, download models, and run the app completely offline and privately. To use the app, users need to go to the releases page, download and install the app. Building the app requires downloading Android Studio, cloning the repository, and importing it into Android Studio. The app can be run offline by following specific steps such as enabling developer options, wireless debugging, and downloading the stable LM model. The project is maintained by Nerve Sparks and contributions are welcome through creating feature branches and pull requests.

github

: 152

aiolauncher_scripts

AIO Launcher Scripts is a collection of Lua scripts that can be used with AIO Launcher to enhance its functionality. These scripts can be used to create widget scripts, search scripts, and side menu scripts. They provide various functions such as displaying text, buttons, progress bars, charts, and interacting with app widgets. The scripts can be used to customize the appearance and behavior of the launcher, add new features, and interact with external services.

github

: 81

gemini-android

github

: 303

react-native-airship

React Native Airship is a module designed to integrate Airship's iOS and Android SDKs into React Native applications. It provides developers with the necessary tools to incorporate Airship's push notification services seamlessly. The module offers a simple and efficient way to leverage Airship's features within React Native projects, enhancing user engagement and retention through targeted notifications.

github

: 86

gpt_mobile

GPT Mobile is a chat assistant for Android that allows users to chat with multiple models at once. It supports various platforms such as OpenAI GPT, Anthropic Claude, and Google Gemini. Users can customize temperature, top p (Nucleus sampling), and system prompt. The app features local chat history, Material You style UI, dark mode support, and per app language setting for Android 13+. It is built using 100% Kotlin, Jetpack Compose, and follows a modern app architecture for Android developers.

github

: 717

Native-LLM-for-Android

This repository provides a demonstration of running a native Large Language Model (LLM) on Android devices. It supports various models such as Qwen2.5-Instruct, MiniCPM-DPO/SFT, Yuan2.0, Gemma2-it, StableLM2-Chat/Zephyr, and Phi3.5-mini-instruct. The demo models are optimized for extreme execution speed after being converted from HuggingFace or ModelScope. Users can download the demo models from the provided drive link, place them in the assets folder, and follow specific instructions for decompression and model export. The repository also includes information on quantization methods and performance benchmarks for different models on various devices.

github

: 128

AIDE-Plus

AIDE-Plus is a comprehensive tool for Android app development, offering support for various Java syntax versions, Gradle and Maven build systems, ProGuard, AndroidX, CMake builds, APK/AAB generation, code coloring customization, data binding, and APK signing. It also provides features like AAPT2, D8, runtimeOnly, compileOnly, libgdxNatives, manifest merging, Shizuku installation support, and syntax auto-completion. The tool aims to streamline the development process and enhance the user experience by addressing common issues and providing advanced functionalities.

github

: 136