Callytics
Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers.
Stars: 63
Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers. By processing both the audio and text of each call, it provides insights such as sentiment analysis, topic detection, conflict detection, profanity word detection, and summary. These cutting-edge techniques help businesses optimize customer interactions, identify areas for improvement, and enhance overall service quality. When an audio file is placed in the .data/input directory, the entire pipeline automatically starts running, and the resulting data is inserted into the database. This is only a v1.1.0 version; many new features will be added, models will be fine-tuned or trained from scratch, and various optimization efforts will be applied.
README:
Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs)
technologies to analyze phone conversations from customer service and call centers. By processing both the
audio and text of each call, it provides insights such as sentiment analysis, topic detection, conflict detection,
profanity word detection and summary. These cutting-edge techniques help businesses optimize customer interactions,
identify areas for improvement, and enhance overall service quality.
When an audio file is placed in the .data/input directory, the entire pipeline automatically starts running, and the
resulting data is inserted into the database.
Note: This is only a v1.1.0 version; many new features will be added, models
will be fine-tuned or trained from scratch, and various optimization efforts will be applied. For more information,
you can check out the Upcoming section.
Note: If you would like to contribute to this repository, please read the CONTRIBUTING first.
- Prerequisites
- Architecture
- Math And Algorithm
- Features
- Demo
- Installation
- File Structure
- Database Structure
- Datasets
- Version Control System
- Upcoming
- Documentations
- License
- Links
- Team
- Contact
- Citation
-
Python 3.11(or above)
-
GPU (min 24GB)(or above) Hugging Face Credentials (Account, Token)-
Llama-3.2-11B-Vision-Instruct(or above)
-
GPU (min 12GB)(for other process such asfaster whisper&NeMo) - At least one of the following is required:
OpenAI Credentials (Account, API Key)Azure OpenAI Credentials (Account, API Key, API Base URL)
This section describes the mathematical models and algorithms used in the project.
Note: The mathematical concepts and algorithms specific to this repository, rather than the models used, will be
provided in this section. Please refer to the RESOURCES under the Documentations section for the
repositories and models utilized or referenced.
The silence durations are derived from the time intervals between speech segments:
$$S = {s_1, s_2, \ldots, s_n}$$
represent the set of silence durations (in seconds) between consecutive speech segments.
- A user-defined factor:
$$\text{factor} \in \mathbb{R}^{+}$$
To determine a threshold that distinguishes significant silence from trivial gaps, two statistical methods can be applied:
1. Standard Deviation-Based Threshold
- Mean:
$$\mu = \frac{1}{n}\sum_{i=1}^{n}s_i$$
- Standard Deviation:
$$ \sigma = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(s_i - \mu)^2} $$
- Threshold:
$$ T_{\text{std}} = \sigma \cdot \text{factor} $$
2. Median + Interquartile Range (IQR) Threshold
- Median:
Let:
$$ S = {s_{(1)} \leq s_{(2)} \leq \cdots \leq s_{(n)}} $$
be an ordered set.
Then:
$$ M = \text{median}(S) = \begin{cases} s_{\frac{n+1}{2}}, & \text{if } n \text{ is odd}, \\[6pt] \frac{s_{\frac{n}{2}} + s_{\frac{n}{2}+1}}{2}, & \text{if } n \text{ is even}. \end{cases} $$
- Quartiles:
$$ Q_1 = s_{(\lfloor 0.25n \rfloor)}, \quad Q_3 = s_{(\lfloor 0.75n \rfloor)} $$
- IQR:
$$ \text{IQR} = Q_3 - Q_1 $$
- Threshold:
$$ T_{\text{median\_iqr}} = M + (\text{IQR} \times \text{factor}) $$
Total Silence Above Threshold
Once the threshold
$$T$$
either
$$T_{\text{std}}$$
or
$$T_{\text{median\_iqr}}$$
is defined, we sum only those silence durations that meet or exceed this threshold:
$$ \text{TotalSilence} = \sum_{i=1}^{n} s_i \cdot \mathbf{1}(s_i \geq T) $$
where $$\mathbf{1}(s_i \geq T)$$ is an indicator function defined as:
$$ \mathbf{1}(s_i \geq T) = \begin{cases} 1 & \text{if } s_i \geq T \ 0 & \text{otherwise} \end{cases} $$
Summary:
- Identify the silence durations:
$$ S = {s_1, s_2, \ldots, s_n} $$
- Determine the threshold using either:
Standard deviation-based:
$$ T = \sigma \cdot \text{factor} $$
Median+IQR-based:
$$ T = M + (\text{IQR} \cdot \text{factor}) $$
- Compute the total silence above this threshold:
$$ \text{TotalSilence} = \sum_{i=1}^{n} s_i \cdot \mathbf{1}(s_i \geq T) $$
- [x] Speech Enhancement
- [x] Sentiment Analysis
- [x] Profanity Word Detection
- [x] Summary
- [x] Conflict Detection
- [x] Topic Detection
sudo apt update -y && sudo apt upgrade -ysudo apt install ffmpeg -ysudo apt install -y ffmpeg build-essential g++git clone https://github.com/bunyaminergen/Callyticscd Callyticsconda env create -f environment.yamlconda activate Callytics.env file sample:
# CREDENTIALS
# OPENAI
OPENAI_API_KEY=
# HUGGINGFACE
HUGGINGFACE_TOKEN=
# AZURE OPENAI
AZURE_OPENAI_API_KEY=
AZURE_OPENAI_API_BASE=
AZURE_OPENAI_API_VERSION=
# DATABASE
DB_NAME=
DB_USER=
DB_PASSWORD=
DB_HOST=
DB_PORT=
DB_URL=
In this section, an example database and tables are provided. It is a well-structured and simple design. If you
create the tables
and columns in the same structure in your remote database, you will not encounter errors in the code. However, if you
want to change the database structure, you will also need to refactor the code.
Note: Refer to the Database Structure section for the database schema and tables.
sqlite3 .db/Callytics.sqlite < src/db/sql/Schema.sqlIn this section, it is explained how to install Grafana on your local environment. Since Grafana is a third-party
open-source monitoring application, you must handle its installation yourself and connect your database. Of course, you
can also use it with Granafa Cloud instead of local environment.
sudo apt update -y && sudo apt upgrade -ysudo apt install -y apt-transport-https software-properties-common wgetwget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.listsudo apt install -y grafanasudo systemctl start grafana-server
sudo systemctl enable grafana-server
sudo systemctl daemon-reloadhttp://localhost:3000SQLite Plugin
sudo grafana-cli plugins install frser-sqlite-datasourcesudo systemctl restart grafana-serversudo systemctl daemon-reload.
├── automation
│ └── service
│ └── callytics.service
├── config
│ ├── config.yaml
│ ├── nemo
│ │ └── diar_infer_telephonic.yaml
│ └── prompt.yaml
├── .data
│ ├── example
│ │ └── LogisticsCallCenterConversation.mp3
│ └── input
├── .db
│ └── Callytics.sqlite
├── .docs
│ ├── documentation
│ │ ├── CONTRIBUTING.md
│ │ └── RESOURCES.md
│ └── img
│ ├── Callytics.drawio
│ ├── Callytics.gif
│ ├── CallyticsIcon.png
│ ├── Callytics.png
│ ├── Callytics.svg
│ └── database.png
├── .env
├── environment.yaml
├── .gitattributes
├── .github
│ └── CODEOWNERS
├── .gitignore
├── LICENSE
├── main.py
├── README.md
├── requirements.txt
└── src
├── audio
│ ├── alignment.py
│ ├── analysis.py
│ ├── effect.py
│ ├── error.py
│ ├── io.py
│ ├── metrics.py
│ ├── preprocessing.py
│ ├── processing.py
│ └── utils.py
├── db
│ ├── manager.py
│ └── sql
│ ├── AudioPropertiesInsert.sql
│ ├── Schema.sql
│ ├── TopicFetch.sql
│ ├── TopicInsert.sql
│ └── UtteranceInsert.sql
├── text
│ ├── llm.py
│ ├── model.py
│ ├── prompt.py
│ └── utils.py
└── utils
└── utils.py
19 directories, 43 files
- [ ] Speech Emotion Recognition: Develop a model to automatically detect emotions from speech data.
- [ ] New Forced Alignment Model: Train a forced alignment model from scratch.
- [ ] New Vocal Separation Model: Train a vocal separation model from scratch.
- [ ] Unit Tests: Add a comprehensive unit testing script to validate functionality.
- [ ] Logging Logic: Implement a more comprehensive and structured logging mechanism.
- [ ] Warnings: Add meaningful and detailed warning messages for better user guidance.
- [ ] Real-Time Analysis: Enable real-time analysis capabilities within the system.
- [ ] Dockerization: Containerize the repository to ensure seamless deployment and environment consistency.
- [ ] New Transcription Models: Integrate and test new transcription models suchas AIOLA’s Multi-Head Speech Recognition Model.
- [ ] Noise Reduction Model: Identify, test, and integrate a deep learning-based noise reduction model. Consider existing models like Facebook Research Denoiser, Noise2Noise, Audio Denoiser CNN. Write test scripts for evaluation, and if necessary, train a new model for optimal performance.
- [ ] Detect CSR's identity via Voice Recognition/Identification instead of Diarization and LLM.
- [ ] Transform the code structure into a pipeline for better modularity and scalability.
- [ ] Publish the repository as a Python package on PyPI for wider distribution.
- [ ] Convert the repository into a Linux package to support Linux-based systems.
- [ ] Implement a two-step processing workflow: perform diarization (speaker segmentation) first, then apply * transcription* for each identified speaker separately. This approach can improve transcription accuracy by leveraging speaker separation.
- [ ] Enable parallel processing for tasks such as diarization, transcription, and model inference to improve overall system performance and reduce processing time.
- [ ] Explore using Docker Compose for multi-container orchestration if required.
- [ ] Upload the models and relevant resources to Hugging Face for easier access, sharing, and community collaboration.
- [ ] Consider writing a Command Line Interface (CLI) to simplify user interaction and improve usability.
- [ ] Test the ability to use different language models (LLMs) for specific tasks. For instance, using BERT for profanity detection. Evaluate their performance and suitability for different use cases as a feature.
@software{ Callytics,
author = {Bunyamin Ergen},
title = {{Callytics}},
year = {2024},
month = {12},
url = {https://github.com/bunyaminergen/Callytics},
version = {v1.1.0},
}For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Callytics
Similar Open Source Tools
Callytics
Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers. By processing both the audio and text of each call, it provides insights such as sentiment analysis, topic detection, conflict detection, profanity word detection, and summary. These cutting-edge techniques help businesses optimize customer interactions, identify areas for improvement, and enhance overall service quality. When an audio file is placed in the .data/input directory, the entire pipeline automatically starts running, and the resulting data is inserted into the database. This is only a v1.1.0 version; many new features will be added, models will be fine-tuned or trained from scratch, and various optimization efforts will be applied.
botserver
General Bots is a self-hosted AI automation platform and LLM conversational platform focused on convention over configuration and code-less approaches. It serves as the core API server handling LLM orchestration, business logic, database operations, and multi-channel communication. The platform offers features like multi-vendor LLM API, MCP + LLM Tools Generation, Semantic Caching, Web Automation Engine, Enterprise Data Connectors, and Git-like Version Control. It enforces a ZERO TOLERANCE POLICY for code quality and security, with strict guidelines for error handling, performance optimization, and code patterns. The project structure includes modules for core functionalities like Rhai BASIC interpreter, security, shared types, tasks, auto task system, file operations, learning system, and LLM assistance.
kiss_ai
KISS AI is a lightweight and powerful multi-agent evolutionary framework that simplifies building AI agents. It uses native function calling for efficiency and accuracy, making building AI agents as straightforward as possible. The framework includes features like multi-agent orchestration, agent evolution and optimization, relentless coding agent for long-running tasks, output formatting, trajectory saving and visualization, GEPA for prompt optimization, KISSEvolve for algorithm discovery, self-evolving multi-agent, Docker integration, multiprocessing support, and support for various models from OpenAI, Anthropic, Gemini, Together AI, and OpenRouter.
DeepMCPAgent
DeepMCPAgent is a model-agnostic tool that enables the creation of LangChain/LangGraph agents powered by MCP tools over HTTP/SSE. It allows for dynamic discovery of tools, connection to remote MCP servers, and integration with any LangChain chat model instance. The tool provides a deep agent loop for enhanced functionality and supports typed tool arguments for validated calls. DeepMCPAgent emphasizes the importance of MCP-first approach, where agents dynamically discover and call tools rather than hardcoding them.
WebAI-to-API
This project implements a web API that offers a unified interface to Google Gemini and Claude 3. It provides a self-hosted, lightweight, and scalable solution for accessing these AI models through a streaming API. The API supports both Claude and Gemini models, allowing users to interact with them in real-time. The project includes a user-friendly web UI for configuration and documentation, making it easy to get started and explore the capabilities of the API.
GitVizz
GitVizz is an AI-powered repository analysis tool that helps developers understand and navigate codebases quickly. It transforms complex code structures into interactive documentation, dependency graphs, and intelligent conversations. With features like interactive dependency graphs, AI-powered code conversations, advanced code visualization, and automatic documentation generation, GitVizz offers instant understanding and insights for any repository. The tool is built with modern technologies like Next.js, FastAPI, and OpenAI, making it scalable and efficient for analyzing large codebases. GitVizz also provides a standalone Python library for core code analysis and dependency graph generation, offering multi-language parsing, AST analysis, dependency graphs, visualizations, and extensibility for custom applications.
tandem
Tandem is a local-first, privacy-focused AI workspace that runs entirely on your machine. It is inspired by early AI coworking research previews, open source, and provider-agnostic. Tandem offers privacy-first operation, provider agnosticism, zero trust model, true cross-platform support, open-source licensing, modern stack, and developer superpowers for everyone. It provides folder-wide intelligence, multi-step automation, visual change review, complete undo, zero telemetry, provider freedom, secure design, cross-platform support, visual permissions, full undo, long-term memory, skills system, document text extraction, workspace Python venv, rich themes, execution planning, auto-updates, multiple specialized agent modes, multi-agent orchestration, project management, and various artifacts and outputs.
astrsk
astrsk is a tool that pushes the boundaries of AI storytelling by offering advanced AI agents, customizable response formatting, and flexible prompt editing for immersive roleplaying experiences. It provides complete AI agent control, a visual flow editor for conversation flows, and ensures 100% local-first data storage. The tool is true cross-platform with support for various AI providers and modern technologies like React, TypeScript, and Tailwind CSS. Coming soon features include cross-device sync, enhanced session customization, and community features.
LangGraph-Expense-Tracker
LangGraph Expense tracker is a small project that explores the possibilities of LangGraph. It allows users to send pictures of invoices, which are then structured and categorized into expenses and stored in a database. The project includes functionalities for invoice extraction, database setup, and API configuration. It consists of various modules for categorizing expenses, creating database tables, and running the API. The database schema includes tables for categories, payment methods, and expenses, each with specific columns to track transaction details. The API documentation is available for reference, and the project utilizes LangChain for processing expense data.
one
ONE is a modern web and AI agent development toolkit that empowers developers to build AI-powered applications with high performance, beautiful UI, AI integration, responsive design, type safety, and great developer experience. It is perfect for building modern web applications, from simple landing pages to complex AI-powered platforms.
Legacy-Modernization-Agents
Legacy Modernization Agents is an open source migration framework developed to demonstrate AI Agents capabilities for converting legacy COBOL code to Java or C# .NET. The framework uses Microsoft Agent Framework with a dual-API architecture to analyze COBOL code and dependencies, then convert to either Java Quarkus or C# .NET. The web portal provides real-time visualization of migration progress, dependency graphs, and AI-powered Q&A.
mcp-memory-service
The MCP Memory Service is a universal memory service designed for AI assistants, providing semantic memory search and persistent storage. It works with various AI applications and offers fast local search using SQLite-vec and global distribution through Cloudflare. The service supports intelligent memory management, universal compatibility with AI tools, flexible storage options, and is production-ready with cross-platform support and secure connections. Users can store and recall memories, search by tags, check system health, and configure the service for Claude Desktop integration and environment variables.
Archon
Archon is an AI meta-agent designed to autonomously build, refine, and optimize other AI agents. It serves as a practical tool for developers and an educational framework showcasing the evolution of agentic systems. Through iterative development, Archon demonstrates the power of planning, feedback loops, and domain-specific knowledge in creating robust AI agents.
AutoAgents
AutoAgents is a cutting-edge multi-agent framework built in Rust that enables the creation of intelligent, autonomous agents powered by Large Language Models (LLMs) and Ractor. Designed for performance, safety, and scalability. AutoAgents provides a robust foundation for building complex AI systems that can reason, act, and collaborate. With AutoAgents you can create Cloud Native Agents, Edge Native Agents and Hybrid Models as well. It is so extensible that other ML Models can be used to create complex pipelines using Actor Framework.
vibe-remote
Vibe Remote is a tool that allows developers to code using AI agents through Slack or Discord, eliminating the need for a laptop or IDE. It provides a seamless experience for coding tasks, enabling users to interact with AI agents in real-time, delegate tasks, and monitor progress. The tool supports multiple coding agents, offers a setup wizard for easy installation, and ensures security by running locally on the user's machine. Vibe Remote enhances productivity by reducing context-switching and enabling parallel task execution within isolated workspaces.
pluely
Pluely is a versatile and user-friendly tool for managing tasks and projects. It provides a simple interface for creating, organizing, and tracking tasks, making it easy to stay on top of your work. With features like task prioritization, due date reminders, and collaboration options, Pluely helps individuals and teams streamline their workflow and boost productivity. Whether you're a student juggling assignments, a professional managing multiple projects, or a team coordinating tasks, Pluely is the perfect solution to keep you organized and efficient.
For similar tasks
Callytics
Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers. By processing both the audio and text of each call, it provides insights such as sentiment analysis, topic detection, conflict detection, profanity word detection, and summary. These cutting-edge techniques help businesses optimize customer interactions, identify areas for improvement, and enhance overall service quality. When an audio file is placed in the .data/input directory, the entire pipeline automatically starts running, and the resulting data is inserted into the database. This is only a v1.1.0 version; many new features will be added, models will be fine-tuned or trained from scratch, and various optimization efforts will be applied.
llm-memorization
The 'llm-memorization' project is a tool designed to index, archive, and search conversations with a local LLM using a SQLite database enriched with automatically extracted keywords. It aims to provide personalized context at the start of a conversation by adding memory information to the initial prompt. The tool automates queries from local LLM conversational management libraries, offers a hybrid search function, enhances prompts based on posed questions, and provides an all-in-one graphical user interface for data visualization. It supports both French and English conversations and prompts for bilingual use.
genassist
GenAssist is an AI-powered platform for managing and leveraging various AI workflows, focusing on conversation management, analytics, and agent-based interactions. It provides user management, AI agents configuration, knowledge base management, analytics, conversation management, and audit logging features. The platform is built with React, TypeScript, Vite, Tailwind CSS, FastAPI, SQLAlchemy ORM, and PostgreSQL database. GenAssist offers integration options for React, JavaScript Widget, and iOS, along with UI test automation and backend testing capabilities.
TME-AIX
The TME-AIX repository is a collaborative workspace dedicated to exploring Telco Media Entertainment use-cases using open source AI capabilities and datasets. It focuses on projects like Revenue Assurance, Service Assurance Predictions, 5G Network Fault Predictions, Sustainability, SecOps-AI, SmartGrid, IoT Security, Customer Relation Management, Anomaly Detection, Starlink Quality Predictions, and NoC AI Augmentation for OSS.
aisdk-prompt-optimizer
AISDK Prompt Optimizer is an open-source tool designed to transform AI interactions by optimizing prompts. It utilizes the GEPA reflective optimizer to evolve textual components of AI systems, providing features such as reflective prompt mutation, rich textual feedback, and Pareto-based selection. Users can teach their AI desired behaviors, collect ideal samples, run optimization to generate optimized prompts, and deploy the results in their applications. The tool leverages advanced optimization algorithms to guide AI through interactive conversations and refine prompt candidates for improved performance.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.


