J.A.R.V.I.S.2.0
open source assistant hybrid using small models (2b - 5b) and gemini , with image and agentic tool capabilities and integration of RAG with effiecient memory. android support using adb
Welcome to the **Jarvis AI Assistant** project! ๐๏ธ This AI-powered assistant can perform various tasks such as **providing weather reports ๐ฆ๏ธ, summarizing news ๐ฐ, sending emails ๐ง** , **CAG** , and more, all through **voice commands**. Below, you'll find detailed instructions on how to set up, use, and interact with this assistant. ๐ง The project is written primarily in Python, first published in 2025. Key topics include: ai, api, gemini, gemma, granite-20b-multilingual.
๐ JARVIS 2.0
J.A.R.V.I.S. 2.0 โ Judgment Augmented Reasoning for Virtual Intelligent Systems
๐ค Jarvis AI Assistant
Welcome to the Jarvis AI Assistant project! ๐๏ธ This AI-powered assistant can perform various tasks such as providing weather reports ๐ฆ๏ธ, summarizing news ๐ฐ, sending emails ๐ง , CAG , and more, all through voice commands. Below, you'll find detailed instructions on how to set up, use, and interact with this assistant. ๐ง
๐ Features
โ
Voice Activation: activate listening mode. ๐ค
โ
Speech Recognition: Recognizes and processes user commands via speech input. ๐ฃ๏ธ
โ
AI Responses: Provides responses using AI-generated text-to-speech output. ๐ถ
โ
Task Execution: Handles multiple tasks, including:
-
๐ง Sending emails
-
๐ฆ๏ธ Summarizing weather reports
-
๐ Data Analysis using csv*
-
๐ง๐ปโ๐ป Pesonalize chat
-
๐ฐ Reading news headlines
-
๐ผ๏ธ Image generation
-
๐ฆ Database functions
-
๐ฑ Phone call automation using ADB
-
๐ค AI-based task execution
-
๐ก Automate websites & applications
-
๐๏ธ Image processing Using gemini
Image Source:
Upload
URL
CameraSelect Action:
Basic Detection
Object Detection
Segmentation
Resize -
๐ง Retrieval-Augmented Generation (RAG) for knowledge-based interactions on various topics
-
โ Timeout Handling: Automatically deactivates listening mode after 5 minutes of inactivity. โณ
-
โ Automatic Input Processing: If no "stop" command is detected within 60 seconds, input is finalized and sent to the AI model for processing. โ๏ธ
-
โ Multiple Function Calls: Call multiple functions simultaneously, even if their inputs and outputs are unrelated. ๐
๐ Prerequisites
Before running the project, ensure you have the following installed:
โ
Python 3.9 or later ๐
โ
Required libraries (listed in requirements.txt) ๐
๐ ๏ธ Configuration
-
Create a
.envfile in the root directory of the project. -
Add your API keys and other configuration variables to the
.envfile:
dotenvauthor_name="ganeshnikhil124@gmail.com" weather_link="https://rapidapi.com/weatherapi/api/weatherapi-com" news_link="https://newsapi.org" name="ganeshnikhil" Rag_model="granite3.1-dense:2b" Chat_model="granite3.1-dense:2b" Function_call_model="gemma3:4b" Text_to_info_model="gemma3:4b" Image_to_text="llava:7b" Embedding_model="nomic-embed-text" genai_key="" Sender_email="ganeshnikhil124@gmail.com" Receiver_email="" Password_email="" Weather_api="" News_api="" Country="in" DEVICE_IP="" CSV_PATH="./DATA/business-employment-data-dec-2024-quarter.csv" UI="on" Yt_path="./DATA/youtube_video/"
2 . Install system requriements
installbash ./intialize.sh
-
Setup API Keys & Passwords :
- ๐ฉ๏ธ WEATHER API - Get weather data.
- ๐ฐ NEWS API - Fetch latest news headlines.
- ๐ง GMAIL PASSWORD - Generate an app password for sending emails.
- ๐ง OLLAMA - Download models from Ollama (manual steup) .
install Models from ollamaollama run gemma3:4b ollama run granite3.1-dense:2b ollama pull nomic-embed-text - portaudio - download portaudio to work with sound.
- ๐ฎ GEMINI AI - API access for function execution.
Model Details
Gemma for intellignet routing image and simple question answers.
Model
architecture gemma3
parameters 4.3B
context length 8192
embedding length 2560
quantization Q4_K_M
Parameters
stop "<end_of_turn>"
temperature 0.1
License
Gemma Terms of Use
Last modified: February 21, 2024
grantie dense has large context window ,for rag and chat.
Model
architecture granite
parameters 2.5B
context length 131072
embedding length 2048
quantization Q4_K_M
System
Knowledge Cutoff Date: April 2024.
You are Granite, developed by IBM.
License
Apache License
Version 2.0, January 2004
gemini free teir for as fallback mechanism . (only for tool calling)
gemini-2.0-flash
Audio, images, videos, and text Text, images (experimental), and audio (coming soon) Next generation features, speed, thinking, realtime streaming, and multimodal generation
gemini-2.0-flash-lite
Audio, images, videos, and text Text A Gemini 2.0 Flash model optimized for cost efficiency and low latency
gemini-2.0-pro-exp-02-05
Audio, images, videos, and text Text Our most powerful Gemini 2.0 model
gemini-1.5-flash
Audio, images, videos, and text Text Fast and versatile performance across a diverse variety of tasks

๐ป Installation
1๏ธโฃ Clone the Repository
bashgit clone https://github.com/ganeshnikhil/J.A.R.V.I.S.2.0.git cd J.A.R.V.I.S.2.0
2๏ธโฃ Install Dependencies
bashpip install -r requirements.txt
๐ Running the Application
Start the Program
bashstreamlit run ui.py
๐ Function Calling Methods
๐น Primary: Gemini AI-Based Function Execution
๐ Transitioned to Gemini AI-powered function calling, allowing multiple function calls simultaneously for better efficiency! โ๏ธ If Gemini AI fails to generate function calls, the system automatically falls back to an Ollama-based model for reliable execution.ย
๐น AI Model Used: Gemini AI ๐ง
โ
Higher accuracy โ
Structured data processing โ
Reliable AI-driven interactions
๐ RAG-Based Knowledge System
๐ก Retrieval-Augmented Generation (RAG) dynamically loads relevant markdown-based knowledge files based on the queried topic, reducing hallucinations and improving response accuracy.
๐ฑ ADB Integration for Phone Automation
๐น Integrated Android Debug Bridge (ADB) to enable voice-controlled phone automation! ๐๏ธ
โ
Make phone calls โ๏ธ
โ
Open apps & toggle settings ๐ฒ
โ
Access phone data & remote operations ๐ ๏ธ
Setting Up ADB
๐ Windows
powershellwinget install --id=Google.AndroidSDKPlatformTools -e
๐ Linux
bashsudo apt install adb
๐ Mac
bashbrew install android-platform-tools
๐ฎ Future Enhancements
โจ Deeper mobile integration ๐ฑ
โจ Advanced AI-driven automation ๐ค
โจ Improved NLP-based command execution ๐ง
โจ Multi-modal interactions (text + voice + image) ๐ผ๏ธ
๐ Stay tuned for future updates! ๐ฅ
markdown## Gemini Model Comparison The following table provides a comparison of various Gemini models with respect to their rate limits: | Model | RPM | TPM | RPD | |------------------------------------- |-----:|----------:| -----:| | **Gemini 2.0 Flash** | 15 | 1,000,000 | 1,500 | | **Gemini 2.0 Flash-Lite Preview** | 30 | 1,000,000 | 1,500 | | **Gemini 2.0 Pro Experimental 02-05** | 2 | 1,000,000 | 50 | | **Gemini 2.0 Flash Thinking Experimental** | 10 | 4,000,000 | 1,500 | | **Gemini 1.5 Flash** | 15 | 1,000,000 | 1,500 | | **Gemini 1.5 Flash-8B** | 15 | 1,000,000 | 1,500 | | **Gemini 1.5 Pro** | 2 | 32,000 | 50 | | **Imagen 3** | -- | -- | -- |
Explanation:
- RPM: Requests per minute
- TPM: Tokens per minute
- RPD: Requests per day
The focus of project is mostly on using small model and free (api) models , get accurate agentic behaviours , to run these on low spec systems to.
Contributors
Showing top 2 contributors by commit count.

