gemini-2-live-api-demo

gemini-2-live-api-demo

Vanilla JS web interface for Gemini 2.0 Flash Experimental Multimodal API with text, audio, camera, screen inputs and audio response and function calling from the LLM

Stars: 186

Visit
 screenshot

A lightweight vanilla JavaScript implementation of the Gemini 2.0 Flash Multimodal Live API client, providing real-time interaction with Gemini's API through text, audio, video, and screen sharing capabilities. Built with vanilla JavaScript, it offers features like real-time text chat, audio input/output with visualization, motion-detected video streaming, and screen sharing. Users can connect to the API, send text messages, toggle microphone for audio input, enable webcam for video streaming, share screen, and monitor real-time feedback in the logs panel. Custom tools can be added for extending functionality.

README:

Gemini 2.0 Flash Multimodal Live API Client

A lightweight vanilla JavaScript implementation of the Gemini 2.0 Flash Multimodal Live API client. This project provides real-time interaction with Gemini's API through text, audio, video, and screen sharing capabilities.

This is a simplified version of Google's original React implementation, created in response to this issue.

Live Demo on GitHub Pages

Live Demo

Key Features

  • Real-time text chat with Gemini API
  • Audio input/output with visualization
  • Motion-detected video streaming
  • Screen sharing capabilities
  • Function calling support
  • Built with vanilla JavaScript (no dependencies)

Prerequisites

  • Modern web browser with WebRTC, WebSocket, and Web Audio API support
  • Google AI Studio API key
  • Python 3.0+ OR npx http-server (for local development server)

Quick Start

  1. Clone the repository

  2. Set up your API key:

    cp js/config/config.example.js js/config/config.js
    # Edit js/config/config.js with your API key
  3. Start the development server:

    python -m http.server 8000

    or

    npx http-server 8000
  4. Access the application at http://localhost:8000

Project Structure

├── js/
│ ├── audio/ # Audio processing and management
│ ├── config/ # Configuration files
│ ├── core/ # Core functionality (WebSocket, worklets)
│ ├── tools/ # Function calling implementations
│ ├── utils/ # Utility functions
│ ├── video/ # Video and screen sharing
│ └── main.js # Application entry point
├── css/ # Styling
└── index.html # Main HTML file

Usage Guide

  1. Click "Connect" to establish API connection
  2. Use the interface to:
    • Send text messages
    • Toggle microphone for audio input
    • Enable webcam for video streaming
    • Share your screen
  3. Monitor the logs panel for real-time feedback

Development

Adding Custom Tools

Custom tools can be added to extend functionality. See js/tools/README.md for implementation details.

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

License

This project is licensed under the MIT License.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for gemini-2-live-api-demo

Similar Open Source Tools

For similar tasks

For similar jobs