june

Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkit

Stars: 640

Visit

june-va is a local voice chatbot that combines Ollama for language model capabilities, Hugging Face Transformers for speech recognition, and the Coqui TTS Toolkit for text-to-speech synthesis. It provides a flexible, privacy-focused solution for voice-assisted interactions on your local machine, ensuring that no data is sent to external servers. The tool supports various interaction modes including text input/output, voice input/text output, text input/audio output, and voice input/audio output. Users can customize the tool's behavior with a JSON configuration file and utilize voice conversion features for voice cloning. The application can be further customized using a configuration file with attributes for language model, speech-to-text model, and text-to-speech model configurations.

README:

june

Local Voice Chatbot: Ollama + HF Transformers + Coqui TTS Toolkit

OVERVIEW
INSTALLATION
USAGE
CUSTOMIZATION
FAQ

OVERVIEW

june is a local voice chatbot that combines the power of Ollama (for language model capabilities), Hugging Face Transformers (for speech recognition), and the Coqui TTS Toolkit (for text-to-speech synthesis). It provides a flexible, privacy-focused solution for voice-assisted interactions on your local machine, ensuring that no data is sent to external servers.

Interaction Modes

Text Input/Output: Provide text inputs to the assistant and receive text responses.
Voice Input/Text Output: Use your microphone to give voice inputs, and receive text responses from the assistant.
Text Input/Audio Output: Provide text inputs and receive both text and synthesised audio responses from the assistant.
Voice Input/Audio Output (Default): Use your microphone for voice inputs, and receive responses in both text and synthesised audio form.

INSTALLATION

Pre-requisites

Ollama
Python 3.10 or greater (with pip)
Python development package (e.g. apt install python3-dev for Debian) — only for GNU/Linux
PortAudio development package (e.g. apt install portaudio19-dev for Debian) — only for GNU/Linux
PortAudio (e.g. brew install portaudio using Homebrew) — only for macOS
Microsoft Visual C++ 14.0 or greater — only for Windows

From Source

Method 1: Direct Installation

To install june directly from the GitHub repository:

pip install git+https://github.com/mezbaul-h/june.git@master

Method 2: Clone and Install

Alternatively, you can clone the repository and install it locally:

git clone https://github.com/mezbaul-h/june.git
cd june
pip install .

USAGE

Pull the language model (default is llama3.1:8b-instruct-q4_0) with Ollama first, if you haven't already:

ollama pull llama3.1:8b-instruct-q4_0

Next, run the program (with default configuration):

june-va

This will use llama3.1:8b-instruct-q4_0 for LLM capabilities, openai/whisper-small.en for speech recognition, and tts_models/en/ljspeech/glow-tts for audio synthesis.

You can also customize behaviour of the program with a json configuration file:

june-va --config path/to/config.json

[!NOTE] The configuration file is optional. To learn more about the structure of the config file, see the Customization section.

CUSTOMIZATION

The application can be customised using a configuration file. The config file must be a JSON file. The default configuration is as follows:

{
    "llm": {
        "disable_chat_history": false,
        "model": "llama3.1:8b-instruct-q4_0"
    },
    "stt": {
        "device": "torch device identifier (`cuda` if available; otherwise `cpu`",
        "generation_args": {
            "batch_size": 8
        },
        "model": "openai/whisper-small.en"
    },
    "tts": {
        "device": "torch device identifier (`cuda` if available; otherwise `cpu`",
        "model": "tts_models/en/ljspeech/glow-tts"
    }
}

When you use a configuration file, it overrides the default configuration but does not overwrite it. So you can partially modify the configuration if you desire. For instance, if you do not wish to use speech recognition and only want to provide prompts through text, you can disable that by using a config file with the following configuration:

{
  "stt": null
}

Similarly, you can disable the audio synthesiser, or both, to only use the virtual assistant in text mode.

If you only want to modify the device on which you want to load a particular type of model, without changing the other default attributes of the model, you could use:

{
  "tts": {
    "device": "cpu"
  }
}

Configuration Attributes

`llm` - Language Model Configuration

llm.device: Torch device identifier (e.g., cpu, cuda, mps) on which the pipeline will be allocated.
llm.disable_chat_history: Boolean indicating whether to disable or enable chat history. Enabling chat history will make interactions more dynamic, as the model will have access to previous contexts, but it will consume more processing power. Disabling it will result in less interactive conversations but will use fewer processing resources.
llm.model: Name of the text-generation model tag on Ollama. Ensure this is a valid model tag that exists on your machine.
llm.system_prompt: Give a system prompt to the model. If the underlying model does not support a system prompt, an error will be raised.

`stt` - Speech-to-Text Model Configuration

tts.device: Torch device identifier (e.g., cpu, cuda, mps) on which the pipeline will be allocated.
stt.generation_args: Object containing generation arguments accepted by Hugging Face's speech recognition pipeline.
stt.model: Name of the speech recognition model on Hugging Face. Ensure this is a valid model ID that exists on Hugging Face.

`tts` - Text-to-Speech Model Configuration

tts.device: Torch device identifier (e.g., cpu, cuda, mps) on which the pipeline will be allocated.
tts.generation_args: Object containing generation arguments accepted by Coqui's TTS API.
tts.model: Name of the text-to-speech model supported by the Coqui's TTS Toolkit. Ensure this is a valid model ID.

Frequently Asked Questions

Q: How does the voice input work?

After seeing the [system]> Listening for sound... message, you can speak directly into the microphone. Unlike typical voice assistants, there's no wake command required. Simply start speaking, and the tool will automatically detect and process your voice input. Once you finish speaking, maintain silence for 3 seconds to allow the assistant to process your voice input.

Q: Can I clone a voice?

Many of the models (e.g., tts_models/multilingual/multi-dataset/xtts_v2) supported by Coqui's TTS Toolkit support voice cloning. You can use your own speaker profile with a small audio clip (approximately 1 minute for most models). Once you have the clip, you can instruct the assistant to use it with a custom configuration like the following:

{
  "tts": {
    "model": "tts_models/multilingual/multi-dataset/xtts_v2",
    "generation_args": {
      "language": "en",
      "speaker_wav": "/path/to/your/target/voice.wav"
    }
  }
}

Q: Can I use a remote Ollama instance with june?

Yes, you can easily integrate a remotely hosted Ollama instance with june instead of using a local instance. Here's how to do it:

Set the OLLAMA_HOST environment variable to the appropriate URL of your remote Ollama instance.
Run the program as usual.

Example:

To use a remote Ollama instance, you would use a command like this:

OLLAMA_HOST=http://localhost:11434 june-va

For Tasks:

Click tags to check more tools for each tasks

interact with voice commands generate text responses synthesize audio responses customize behavior with json utilize voice conversion

For Jobs:

voice assistant developer natural language processing engineer speech recognition specialist machine learning engineer software developer

Alternative AI tools for june

Similar Open Source Tools

june

github

: 640

bedrock-claude-chat

This repository is a sample chatbot using the Anthropic company's LLM Claude, one of the foundational models provided by Amazon Bedrock for generative AI. It allows users to have basic conversations with the chatbot, personalize it with their own instructions and external knowledge, and analyze usage for each user/bot on the administrator dashboard. The chatbot supports various languages, including English, Japanese, Korean, Chinese, French, German, and Spanish. Deployment is straightforward and can be done via the command line or by using AWS CDK. The architecture is built on AWS managed services, eliminating the need for infrastructure management and ensuring scalability, reliability, and security.

github

: 1.1k

hayhooks

Hayhooks is a tool that simplifies the deployment and serving of Haystack pipelines as REST APIs. It allows users to wrap their pipelines with custom logic and expose them via HTTP endpoints, including OpenAI-compatible chat completion endpoints. With Hayhooks, users can easily convert their Haystack pipelines into API services with minimal boilerplate code.

github

: 51

cursor-tools

cursor-tools is a CLI tool designed to enhance AI agents with advanced skills, such as web search, repository context, documentation generation, GitHub integration, Xcode tools, and browser automation. It provides features like Perplexity for web search, Gemini 2.0 for codebase context, and Stagehand for browser operations. The tool requires API keys for Perplexity AI and Google Gemini, and supports global installation for system-wide access. It offers various commands for different tasks and integrates with Cursor Composer for AI agent usage.

github

: 3.5k

llm-vscode

llm-vscode is an extension designed for all things LLM, utilizing llm-ls as its backend. It offers features such as code completion with 'ghost-text' suggestions, the ability to choose models for code generation via HTTP requests, ensuring prompt size fits within the context window, and code attribution checks. Users can configure the backend, suggestion behavior, keybindings, llm-ls settings, and tokenization options. Additionally, the extension supports testing models like Code Llama 13B, Phind/Phind-CodeLlama-34B-v2, and WizardLM/WizardCoder-Python-34B-V1.0. Development involves cloning llm-ls, building it, and setting up the llm-vscode extension for use.

github

: 1.1k

lexido

Lexido is an innovative assistant for the Linux command line, designed to boost your productivity and efficiency. Powered by Gemini Pro 1.0 and utilizing the free API, Lexido offers smart suggestions for commands based on your prompts and importantly your current environment. Whether you're installing software, managing files, or configuring system settings, Lexido streamlines the process, making it faster and more intuitive.

github

: 201

langserve

LangServe helps developers deploy `LangChain` runnables and chains as a REST API. This library is integrated with FastAPI and uses pydantic for data validation. In addition, it provides a client that can be used to call into runnables deployed on a server. A JavaScript client is available in LangChain.js.

github

: 1.9k

OpenAI-sublime-text

The OpenAI Completion plugin for Sublime Text provides first-class code assistant support within the editor. It utilizes LLM models to manipulate code, engage in chat mode, and perform various tasks. The plugin supports OpenAI, llama.cpp, and ollama models, allowing users to customize their AI assistant experience. It offers separated chat histories and assistant settings for different projects, enabling context-specific interactions. Additionally, the plugin supports Markdown syntax with code language syntax highlighting, server-side streaming for faster response times, and proxy support for secure connections. Users can configure the plugin's settings to set their OpenAI API key, adjust assistant modes, and manage chat history. Overall, the OpenAI Completion plugin enhances the Sublime Text editor with powerful AI capabilities, streamlining coding workflows and fostering collaboration with AI assistants.

github

: 267

aiconfig

AIConfig is a framework that makes it easy to build generative AI applications for production. It manages generative AI prompts, models and model parameters as JSON-serializable configs that can be version controlled, evaluated, monitored and opened in a local editor for rapid prototyping. It allows you to store and iterate on generative AI behavior separately from your application code, offering a streamlined AI development workflow.

github

: 833

kwaak

Kwaak is a tool that allows users to run a team of autonomous AI agents locally from their own machine. It enables users to write code, improve test coverage, update documentation, and enhance code quality while focusing on building innovative projects. Kwaak is designed to run multiple agents in parallel, interact with codebases, answer questions about code, find examples, write and execute code, create pull requests, and more. It is free and open-source, allowing users to bring their own API keys or models via Ollama. Kwaak is part of the bosun.ai project, aiming to be a platform for autonomous code improvement.

github

: 190

magic-cli

Magic CLI is a command line utility that leverages Large Language Models (LLMs) to enhance command line efficiency. It is inspired by projects like Amazon Q and GitHub Copilot for CLI. The tool allows users to suggest commands, search across command history, and generate commands for specific tasks using local or remote LLM providers. Magic CLI also provides configuration options for LLM selection and response generation. The project is still in early development, so users should expect breaking changes and bugs.

github

: 497

shellChatGPT

ShellChatGPT is a shell wrapper for OpenAI's ChatGPT, DALL-E, Whisper, and TTS, featuring integration with LocalAI, Ollama, Gemini, Mistral, Groq, and GitHub Models. It provides text and chat completions, vision, reasoning, and audio models, voice-in and voice-out chatting mode, text editor interface, markdown rendering support, session management, instruction prompt manager, integration with various service providers, command line completion, file picker dialogs, color scheme personalization, stdin and text file input support, and compatibility with Linux, FreeBSD, MacOS, and Termux for a responsive experience.

github

: 71

wcgw

wcgw is a shell and coding agent designed for Claude and Chatgpt. It provides full shell access with no restrictions, desktop control on Claude for screen capture and control, interactive command handling, large file editing, and REPL support. Users can use wcgw to create, execute, and iterate on tasks, such as solving problems with Python, finding code instances, setting up projects, creating web apps, editing large files, and running server commands. Additionally, wcgw supports computer use on Docker containers for desktop control. The tool can be extended with a VS Code extension for pasting context on Claude app and integrates with Chatgpt for custom GPT interactions.

github

: 341

Deep-Live-Cam

Deep-Live-Cam is a software tool designed to assist artists in tasks such as animating custom characters or using characters as models for clothing. The tool includes built-in checks to prevent unethical applications, such as working on inappropriate media. Users are expected to use the tool responsibly and adhere to local laws, especially when using real faces for deepfake content. The tool supports both CPU and GPU acceleration for faster processing and provides a user-friendly GUI for swapping faces in images or videos.

github

: 45.2k

slack-bot

The Slack Bot is a tool designed to enhance the workflow of development teams by integrating with Jenkins, GitHub, GitLab, and Jira. It allows for custom commands, macros, crons, and project-specific commands to be implemented easily. Users can interact with the bot through Slack messages, execute commands, and monitor job progress. The bot supports features like starting and monitoring Jenkins jobs, tracking pull requests, querying Jira information, creating buttons for interactions, generating images with DALL-E, playing quiz games, checking weather, defining custom commands, and more. Configuration is managed via YAML files, allowing users to set up credentials for external services, define custom commands, schedule cron jobs, and configure VCS systems like Bitbucket for automated branch lookup in Jenkins triggers.

github

: 188

mycoder

An open-source mono-repository containing the MyCoder agent and CLI. It leverages Anthropic's Claude API for intelligent decision making, has a modular architecture with various tool categories, supports parallel execution with sub-agents, can modify code by writing itself, features a smart logging system for clear output, and is human-compatible using README.md, project files, and shell commands to build its own context.

github

: 342

For similar tasks

june

github

: 640

serverless-rag-demo

The serverless-rag-demo repository showcases a solution for building a Retrieval Augmented Generation (RAG) system using Amazon Opensearch Serverless Vector DB, Amazon Bedrock, Llama2 LLM, and Falcon LLM. The solution leverages generative AI powered by large language models to generate domain-specific text outputs by incorporating external data sources. Users can augment prompts with relevant context from documents within a knowledge library, enabling the creation of AI applications without managing vector database infrastructure. The repository provides detailed instructions on deploying the RAG-based solution, including prerequisites, architecture, and step-by-step deployment process using AWS Cloudshell.

github

: 131

llm

The 'llm' package for Emacs provides an interface for interacting with Large Language Models (LLMs). It abstracts functionality to a higher level, concealing API variations and ensuring compatibility with various LLMs. Users can set up providers like OpenAI, Gemini, Vertex, Claude, Ollama, GPT4All, and a fake client for testing. The package allows for chat interactions, embeddings, token counting, and function calling. It also offers advanced prompt creation and logging capabilities. Users can handle conversations, create prompts with placeholders, and contribute by creating providers.

github

: 281

parakeet

Parakeet is a Go library for creating GenAI apps with Ollama. It enables the creation of generative AI applications that can generate text-based content. The library provides tools for simple completion, completion with context, chat completion, and more. It also supports function calling with tools and Wasm plugins. Parakeet allows users to interact with language models and create AI-powered applications easily.

github

: 96

sparkle

Sparkle is a tool that streamlines the process of building AI-driven features in applications using Large Language Models (LLMs). It guides users through creating and managing agents, defining tools, and interacting with LLM providers like OpenAI. Sparkle allows customization of LLM provider settings, model configurations, and provides a seamless integration with Sparkle Server for exposing agents via an OpenAI-compatible chat API endpoint.

github

: 56

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

agentcloud

AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.

github

: 583

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

Azure-Analytics-and-AI-Engagement

The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.

github

: 136

executorch

ExecuTorch is an end-to-end solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers. It is part of the PyTorch Edge ecosystem and enables efficient deployment of PyTorch models to edge devices. Key value propositions of ExecuTorch are: * **Portability:** Compatibility with a wide variety of computing platforms, from high-end mobile phones to highly constrained embedded systems and microcontrollers. * **Productivity:** Enabling developers to use the same toolchains and SDK from PyTorch model authoring and conversion, to debugging and deployment to a wide variety of platforms. * **Performance:** Providing end users with a seamless and high-performance experience due to a lightweight runtime and utilizing full hardware capabilities such as CPUs, NPUs, and DSPs.

github

: 2.7k

autogen

AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools.

github

: 42.5k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

june

README:

june

Local Voice Chatbot: Ollama + HF Transformers + Coqui TTS Toolkit

OVERVIEW

Interaction Modes

INSTALLATION

Pre-requisites

From Source

Method 1: Direct Installation

Method 2: Clone and Install

USAGE

CUSTOMIZATION

Configuration Attributes

llm - Language Model Configuration

stt - Speech-to-Text Model Configuration

tts - Text-to-Speech Model Configuration

Frequently Asked Questions

Q: How does the voice input work?

Q: Can I clone a voice?

Q: Can I use a remote Ollama instance with june?

Example:

For Tasks:

For Jobs:

Alternative AI tools for june

Similar Open Source Tools

june

bedrock-claude-chat

hayhooks

cursor-tools

llm-vscode

lexido

langserve

OpenAI-sublime-text

aiconfig

kwaak

magic-cli

shellChatGPT

wcgw

Deep-Live-Cam

slack-bot

mycoder

For similar tasks

june

serverless-rag-demo

llm

parakeet

sparkle

For similar jobs

weave

agentcloud

VisionCraft

kaito

Azure-Analytics-and-AI-Engagement

executorch

autogen

tabby

`llm` - Language Model Configuration

`stt` - Speech-to-Text Model Configuration

`tts` - Text-to-Speech Model Configuration