
J.A.R.V.I.S.2.0
open source assistant using small models (2b - 5b) , with agentic and tool calling capabilities and integration of RAG with effiecient memory.android support using adb
Stars: 123

README:
Welcome to the Jarvis AI Assistant project! đī¸ This AI-powered assistant can perform various tasks such as providing weather reports đĻī¸, summarizing news đ°, sending emails đ§ , CAG , and more, all through voice commands. Below, you'll find detailed instructions on how to set up, use, and interact with this assistant. đ§
â
Voice Activation: Say "Hey Jarvis" to activate listening mode. đ¤
â
Speech Recognition: Recognizes and processes user commands via speech input. đŖī¸
â
AI Responses: Provides responses using AI-generated text-to-speech output. đļ
â
Task Execution: Handles multiple tasks, including:
- đ§ Sending emails
- đĻī¸ Summarizing weather reports
- đ° Reading news headlines
- đŧī¸ Image generation
- đĻ Database functions
- đą Phone call automation using ADB
- đ¤ AI-based task execution
- đĄ Automate websites & applications
- đ§ Retrieval-Augmented Generation (RAG) for knowledge-based interactions
- â Timeout Handling: Automatically deactivates listening mode after 5 minutes of inactivity. âŗ
- â Automatic Input Processing: If no "stop" command is detected within 60 seconds, input is finalized and sent to the AI model for processing. âī¸
- â Multiple Function Calls: Call multiple functions simultaneously, even if their inputs and outputs are unrelated. đ
Before running the project, ensure you have the following installed:
â
Python 3.9 or later đ
â
Required libraries (listed in requirements.txt
) đ
-
Create a
.env
file in the root directory of the project. -
Add your API keys and other configuration variables to the
.env
file:Weather_api=your_weather_api_key News_api=your_news_api_key Sender_email=your_email Receiver_email=subject_email Password_email=email_password
2 . Install system requriements
bash ./intialize.sh
-
Setup API Keys & Passwords :
- đŠī¸ WEATHER API - Get weather data.
- đ° NEWS API - Fetch latest news headlines.
- đ§ GMAIL PASSWORD - Generate an app password for sending emails.
-
đ§ OLLAMA - Download models from Ollama (manual steup) .
install Models from ollama
ollama run gemma3:4b ollama run granite3.1-dense:2b ollama pull nomic-embed-text
- [portaudio] - download portaudio to work with sound.
- đŽ GEMINI AI - API access for function execution.
Model
architecture gemma3
parameters 4.3B
context length 8192
embedding length 2560
quantization Q4_K_M
Parameters
stop "<end_of_turn>"
temperature 0.1
License
Gemma Terms of Use
Last modified: February 21, 2024
Model
architecture granite
parameters 2.5B
context length 131072
embedding length 2048
quantization Q4_K_M
System
Knowledge Cutoff Date: April 2024.
You are Granite, developed by IBM.
License
Apache License
Version 2.0, January 2004
gemini-2.0-flash
Audio, images, videos, and text Text, images (experimental), and audio (coming soon) Next generation features, speed, thinking, realtime streaming, and multimodal generation
gemini-2.0-flash-lite
Audio, images, videos, and text Text A Gemini 2.0 Flash model optimized for cost efficiency and low latency
gemini-2.0-pro-exp-02-05
Audio, images, videos, and text Text Our most powerful Gemini 2.0 model
gemini-1.5-flash
Audio, images, videos, and text Text Fast and versatile performance across a diverse variety of tasks
âââ DATA
â âââ KNOWLEDGEBASE
â â âââ disaster_data_converted.md
â âââ RAWKNOWLEDGEBASE
â â âââ disaster_data.pdf
â âââ email_schema.py
â âââ msg.py
â âââ phone_details.py
â âââ tools.py
âââ device_ips.txt
âââ main.py
âââ readme.md
âââ requirements.txt
âââ src
âââ BRAIN
â âââ RAG.py
â âââ func_call.py
â âââ gemini_llm.py
â âââ lm_ai.py
â âââ text_to_info.py
âââ CONVERSATION
â âââ speech_to_text.py
â âââ t_s.py
â âââ test_speech.py
â âââ text_to_speech.py
âââ FUNCTION
â âââ Email_send.py
â âââ adb_connect.bat
â âââ adb_connect.sh
â âââ app_op.py
â âââ get_env.py
â âââ greet_time.py
â âââ incog.py
â âââ internet_search.py
â âââ link_op.py
â âââ news.py
â âââ phone_call.py
â âââ random_respon.py
â âââ run_function.py
â âââ weather.py
â âââ youtube_downloader.py
âââ KEYBOARD
â âââ key_lst.py
â âââ key_prs_lst.py
âââ VISION
âââ eye.py
11 directories, 40 files
git clone https://github.com/ganeshnikhil/J.A.R.V.I.S.2.0.git
cd J.A.R.V.I.S.2.0
pip install -r requirements.txt
python main.py
đĸ Initial Interaction:
[= =] Say 'hey jarvis' to activate, and 'stop' to deactivate. Say 'exit' to quit.
đ Transitioned to Gemini AI-powered function calling, allowing multiple function calls simultaneously for better efficiency! âī¸ If Gemini AI fails to generate function calls, the system automatically falls back to an Ollama-based model for reliable execution.Â
đš AI Model Used: Gemini AI đ§
â
Higher accuracy â
Structured data processing â
Reliable AI-driven interactions
đ Command Parsing đ
response = gemini_generate_function_call(command)
response_dic = parse_tool_call(response)
đ Dynamic Function Execution đ
if response_dic:
func_name = response_dic["name"]
response = execute_function_call(response_dic)
đ Error Handling & Fallback to Ollama đ
try:
response = execute_function_call(response_dic)
except Exception as e:
print(f"Error in Gemini AI function execution: {e}")
print("Falling back to Ollama-based function execution...")
response = ollama_generate_function_call(command)
đ Retry Mechanism đ
def send_to_ai_with_retry(prompt, retries=3, delay=2):
for _ in range(retries):
try:
return send_to_gemini(prompt)
except Exception:
time.sleep(delay)
print("Gemini AI is not responding. Switching to Ollama...")
return send_to_ollama(prompt)
đĄ Retrieval-Augmented Generation (RAG) dynamically loads relevant markdown-based knowledge files based on the queried topic, reducing hallucinations and improving response accuracy.
đš Integrated Android Debug Bridge (ADB) to enable voice-controlled phone automation! đī¸
â
Make phone calls âī¸
â
Open apps & toggle settings đ˛
â
Access phone data & remote operations đ ī¸
đ Windows
winget install --id=Google.AndroidSDKPlatformTools -e
đ Linux
sudo apt install adb
đ Mac
brew install android-platform-tools
⨠Deeper mobile integration đą
⨠Advanced AI-driven automation đ¤
⨠Improved NLP-based command execution đ§
⨠Multi-modal interactions (text + voice + image) đŧī¸
đ Stay tuned for future updates! đĨ
## Gemini Model Comparison
The following table provides a comparison of various Gemini models with respect to their rate limits:
| Model | RPM | TPM | RPD |
|------------------------------------- |-----:|----------:| -----:|
| **Gemini 2.0 Flash** | 15 | 1,000,000 | 1,500 |
| **Gemini 2.0 Flash-Lite Preview** | 30 | 1,000,000 | 1,500 |
| **Gemini 2.0 Pro Experimental 02-05** | 2 | 1,000,000 | 50 |
| **Gemini 2.0 Flash Thinking Experimental** | 10 | 4,000,000 | 1,500 |
| **Gemini 1.5 Flash** | 15 | 1,000,000 | 1,500 |
| **Gemini 1.5 Flash-8B** | 15 | 1,000,000 | 1,500 |
| **Gemini 1.5 Pro** | 2 | 32,000 | 50 |
| **Imagen 3** | -- | -- | -- |
- RPM: Requests per minute
- TPM: Tokens per minute
- RPD: Requests per day
The focus of project is mostly on using small model and free (api) models , get accurate agentic behaviours , to run these on low spec systems to.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for J.A.R.V.I.S.2.0
Similar Open Source Tools

ComfyUI-Ollama-Describer
ComfyUI-Ollama-Describer is an extension for ComfyUI that enables the use of LLM models provided by Ollama, such as Gemma, Llava (multimodal), Llama2, Llama3, or Mistral. It requires the Ollama library for interacting with large-scale language models, supporting GPUs using CUDA and AMD GPUs on Windows, Linux, and Mac. The extension allows users to run Ollama through Docker and utilize NVIDIA GPUs for faster processing. It provides nodes for image description, text description, image captioning, and text transformation, with various customizable parameters for model selection, API communication, response generation, and model memory management.

RTXZY-MD
RTXZY-MD is a bot tool that supports file hosting, QR code, pairing code, and RestApi features. Users must fill in the Apikey for the bot to function properly. It is not recommended to install the bot on platforms lacking ffmpeg, imagemagick, webp, or express.js support. The tool allows for 95% implementation of website api and supports free and premium ApiKeys. Users can join group bots and get support from Sociabuzz. The tool can be run on Heroku with specific buildpacks and is suitable for Windows/VPS/RDP users who need Git, NodeJS, FFmpeg, and ImageMagick installations.

LabelQuick
LabelQuick_V2.0 is a fast image annotation tool designed and developed by the AI Horizon team. This version has been optimized and improved based on the previous version. It provides an intuitive interface and powerful annotation and segmentation functions to efficiently complete dataset annotation work. The tool supports video object tracking annotation, quick annotation by clicking, and various video operations. It introduces the SAM2 model for accurate and efficient object detection in video frames, reducing manual intervention and improving annotation quality. The tool is designed for Windows systems and requires a minimum of 6GB of memory.

agentneo
AgentNeo is a Python package that provides functionalities for project, trace, dataset, experiment management. It allows users to authenticate, create projects, trace agents and LangGraph graphs, manage datasets, and run experiments with metrics. The tool aims to streamline AI project management and analysis by offering a comprehensive set of features.

summarize
The 'summarize' tool is designed to transcribe and summarize videos from various sources using AI models. It helps users efficiently summarize lengthy videos, take notes, and extract key insights by providing timestamps, original transcripts, and support for auto-generated captions. Users can utilize different AI models via Groq, OpenAI, or custom local models to generate grammatically correct video transcripts and extract wisdom from video content. The tool simplifies the process of summarizing video content, making it easier to remember and reference important information.

DeepBattler
DeepBattler is a tool designed for Hearthstone Battlegrounds players, providing real-time strategic advice and insights to improve gameplay experience. It integrates with the Hearthstone Deck Tracker plugin and offers voice-assisted guidance. The tool is powered by a large language model (LLM) and can match the strength of top players on EU servers. Users can set up the tool by adding dependencies, configuring the plugin path, and launching the LLM agent. DeepBattler is licensed for personal, educational, and non-commercial use, with guidelines on non-commercial distribution and acknowledgment of external contributions.

AgentNeo
AgentNeo is an advanced, open-source Agentic AI Application Observability, Monitoring, and Evaluation Framework designed to provide deep insights into AI agents, Large Language Model (LLM) calls, and tool interactions. It offers robust logging, visualization, and evaluation capabilities to help debug and optimize AI applications with ease. With features like tracing LLM calls, monitoring agents and tools, tracking interactions, detailed metrics collection, flexible data storage, simple instrumentation, interactive dashboard, project management, execution graph visualization, and evaluation tools, AgentNeo empowers users to build efficient, cost-effective, and high-quality AI-driven solutions.

ComfyUI_Yvann-Nodes
ComfyUI_Yvann-Nodes is a pack of custom nodes that enable audio reactivity within ComfyUI, allowing users to create AI-driven animations that sync with music. Users can generate audio reactive AI videos, control AI generation styles, content, and composition with any audio input. The tool is simple to use by dropping workflows in ComfyUI and specifying audio and visual inputs. It is flexible and works with existing ComfyUI AI tech and nodes like IPAdapter, AnimateDiff, and ControlNet. Users can pick workflows for Images â Video or Video â Video, download the corresponding .json file, drop it into ComfyUI, install missing custom nodes, set inputs, and generate audio-reactive animations.

llm_note
LLM notes repository contains detailed analysis on transformer models, language model compression, inference and deployment, high-performance computing, and system optimization methods. It includes discussions on various algorithms, frameworks, and performance analysis related to large language models and high-performance computing. The repository serves as a comprehensive resource for understanding and optimizing language models and computing systems.

lawglance
LawGlance is an AI-powered legal assistant that aims to bridge the gap between people and legal access. It is a free, open-source initiative designed to provide quick and accurate legal support tailored to individual needs. The project covers various laws, with plans for international expansion in the future. LawGlance utilizes AI-powered Retriever-Augmented Generation (RAG) to deliver legal guidance accessible to both laypersons and professionals. The tool is developed with support from mentors and experts at Data Science Academy and Curvelogics.

hugging-llm
HuggingLLM is a project that aims to introduce ChatGPT to a wider audience, particularly those interested in using the technology to create new products or applications. The project focuses on providing practical guidance on how to use ChatGPT-related APIs to create new features and applications. It also includes detailed background information and system design introductions for relevant tasks, as well as example code and implementation processes. The project is designed for individuals with some programming experience who are interested in using ChatGPT for practical applications, and it encourages users to experiment and create their own applications and demos.

sanic-web
Sanic-Web is a lightweight, end-to-end, and easily customizable large model application project built on technologies such as Dify, Ollama & Vllm, Sanic, and Text2SQL. It provides a one-stop solution for developing large model applications, supporting graphical data-driven Q&A using ECharts, handling table-based Q&A with CSV files, and integrating with third-party RAG systems for general knowledge Q&A. As a lightweight framework, Sanic-Web enables rapid iteration and extension to facilitate the quick implementation of large model projects.

LLM-Navigation
LLM-Navigation is a repository dedicated to documenting learning records related to large models, including basic knowledge, prompt engineering, building effective agents, model expansion capabilities, security measures against prompt injection, and applications in various fields such as AI agent control, browser automation, financial analysis, 3D modeling, and tool navigation using MCP servers. The repository aims to organize and collect information for personal learning and self-improvement through AI exploration.