
Callytics
Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers.
Stars: 63

Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers. By processing both the audio and text of each call, it provides insights such as sentiment analysis, topic detection, conflict detection, profanity word detection, and summary. These cutting-edge techniques help businesses optimize customer interactions, identify areas for improvement, and enhance overall service quality. When an audio file is placed in the .data/input directory, the entire pipeline automatically starts running, and the resulting data is inserted into the database. This is only a v1.1.0 version; many new features will be added, models will be fine-tuned or trained from scratch, and various optimization efforts will be applied.
README:

Callytics
is an advanced call analytics solution that leverages speech recognition and large language models (LLMs)
technologies to analyze phone conversations from customer service and call centers. By processing both the
audio and text of each call, it provides insights such as sentiment analysis, topic detection, conflict detection,
profanity word detection and summary. These cutting-edge techniques help businesses optimize customer interactions,
identify areas for improvement, and enhance overall service quality.
When an audio file is placed in the .data/input
directory, the entire pipeline automatically starts running, and the
resulting data is inserted into the database.
Note: This is only a v1.1.0
version; many new features will be added, models
will be fine-tuned or trained from scratch, and various optimization efforts will be applied. For more information,
you can check out the Upcoming section.
Note: If you would like to contribute to this repository, please read the CONTRIBUTING first.
- Prerequisites
- Architecture
- Math And Algorithm
- Features
- Demo
- Installation
- File Structure
- Database Structure
- Datasets
- Version Control System
- Upcoming
- Documentations
- License
- Links
- Team
- Contact
- Citation
-
Python 3.11
(or above)
-
GPU (min 24GB)
(or above) Hugging Face Credentials (Account, Token)
-
Llama-3.2-11B-Vision-Instruct
(or above)
-
GPU (min 12GB)
(for other process such asfaster whisper
&NeMo
) - At least one of the following is required:
OpenAI Credentials (Account, API Key)
Azure OpenAI Credentials (Account, API Key, API Base URL)
This section describes the mathematical models and algorithms used in the project.
Note: The mathematical concepts and algorithms specific to this repository, rather than the models used, will be
provided in this section. Please refer to the RESOURCES
under the Documentations section for the
repositories and models utilized or referenced.
The silence durations are derived from the time intervals between speech segments:
$$S = {s_1, s_2, \ldots, s_n}$$
represent the set of silence durations (in seconds) between consecutive speech segments.
- A user-defined factor:
$$\text{factor} \in \mathbb{R}^{+}$$
To determine a threshold that distinguishes significant silence from trivial gaps, two statistical methods can be applied:
1. Standard Deviation-Based Threshold
- Mean:
$$\mu = \frac{1}{n}\sum_{i=1}^{n}s_i$$
- Standard Deviation:
$$ \sigma = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(s_i - \mu)^2} $$
- Threshold:
$$ T_{\text{std}} = \sigma \cdot \text{factor} $$
2. Median + Interquartile Range (IQR) Threshold
- Median:
Let:
$$ S = {s_{(1)} \leq s_{(2)} \leq \cdots \leq s_{(n)}} $$
be an ordered set.
Then:
$$ M = \text{median}(S) = \begin{cases} s_{\frac{n+1}{2}}, & \text{if } n \text{ is odd}, \\[6pt] \frac{s_{\frac{n}{2}} + s_{\frac{n}{2}+1}}{2}, & \text{if } n \text{ is even}. \end{cases} $$
- Quartiles:
$$ Q_1 = s_{(\lfloor 0.25n \rfloor)}, \quad Q_3 = s_{(\lfloor 0.75n \rfloor)} $$
- IQR:
$$ \text{IQR} = Q_3 - Q_1 $$
- Threshold:
$$ T_{\text{median\_iqr}} = M + (\text{IQR} \times \text{factor}) $$
Total Silence Above Threshold
Once the threshold
$$T$$
either
$$T_{\text{std}}$$
or
$$T_{\text{median\_iqr}}$$
is defined, we sum only those silence durations that meet or exceed this threshold:
$$ \text{TotalSilence} = \sum_{i=1}^{n} s_i \cdot \mathbf{1}(s_i \geq T) $$
where $$\mathbf{1}(s_i \geq T)$$ is an indicator function defined as:
$$ \mathbf{1}(s_i \geq T) = \begin{cases} 1 & \text{if } s_i \geq T \ 0 & \text{otherwise} \end{cases} $$
Summary:
- Identify the silence durations:
$$ S = {s_1, s_2, \ldots, s_n} $$
- Determine the threshold using either:
Standard deviation-based:
$$ T = \sigma \cdot \text{factor} $$
Median+IQR-based:
$$ T = M + (\text{IQR} \cdot \text{factor}) $$
- Compute the total silence above this threshold:
$$ \text{TotalSilence} = \sum_{i=1}^{n} s_i \cdot \mathbf{1}(s_i \geq T) $$
- [x] Speech Enhancement
- [x] Sentiment Analysis
- [x] Profanity Word Detection
- [x] Summary
- [x] Conflict Detection
- [x] Topic Detection
sudo apt update -y && sudo apt upgrade -y
sudo apt install ffmpeg -y
sudo apt install -y ffmpeg build-essential g++
git clone https://github.com/bunyaminergen/Callytics
cd Callytics
conda env create -f environment.yaml
conda activate Callytics
.env
file sample:
# CREDENTIALS
# OPENAI
OPENAI_API_KEY=
# HUGGINGFACE
HUGGINGFACE_TOKEN=
# AZURE OPENAI
AZURE_OPENAI_API_KEY=
AZURE_OPENAI_API_BASE=
AZURE_OPENAI_API_VERSION=
# DATABASE
DB_NAME=
DB_USER=
DB_PASSWORD=
DB_HOST=
DB_PORT=
DB_URL=
In this section, an example database
and tables
are provided. It is a well-structured
and simple design
. If you
create the tables
and columns in the same structure in your remote database, you will not encounter errors in the code. However, if you
want to change the database structure, you will also need to refactor the code.
Note: Refer to the Database Structure section for the database schema and tables.
sqlite3 .db/Callytics.sqlite < src/db/sql/Schema.sql
In this section, it is explained how to install Grafana
on your local
environment. Since Grafana is a third-party
open-source monitoring application, you must handle its installation yourself and connect your database. Of course, you
can also use it with Granafa Cloud
instead of local
environment.
sudo apt update -y && sudo apt upgrade -y
sudo apt install -y apt-transport-https software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt install -y grafana
sudo systemctl start grafana-server
sudo systemctl enable grafana-server
sudo systemctl daemon-reload
http://localhost:3000
SQLite Plugin
sudo grafana-cli plugins install frser-sqlite-datasource
sudo systemctl restart grafana-server
sudo systemctl daemon-reload
.
├── automation
│ └── service
│ └── callytics.service
├── config
│ ├── config.yaml
│ ├── nemo
│ │ └── diar_infer_telephonic.yaml
│ └── prompt.yaml
├── .data
│ ├── example
│ │ └── LogisticsCallCenterConversation.mp3
│ └── input
├── .db
│ └── Callytics.sqlite
├── .docs
│ ├── documentation
│ │ ├── CONTRIBUTING.md
│ │ └── RESOURCES.md
│ └── img
│ ├── Callytics.drawio
│ ├── Callytics.gif
│ ├── CallyticsIcon.png
│ ├── Callytics.png
│ ├── Callytics.svg
│ └── database.png
├── .env
├── environment.yaml
├── .gitattributes
├── .github
│ └── CODEOWNERS
├── .gitignore
├── LICENSE
├── main.py
├── README.md
├── requirements.txt
└── src
├── audio
│ ├── alignment.py
│ ├── analysis.py
│ ├── effect.py
│ ├── error.py
│ ├── io.py
│ ├── metrics.py
│ ├── preprocessing.py
│ ├── processing.py
│ └── utils.py
├── db
│ ├── manager.py
│ └── sql
│ ├── AudioPropertiesInsert.sql
│ ├── Schema.sql
│ ├── TopicFetch.sql
│ ├── TopicInsert.sql
│ └── UtteranceInsert.sql
├── text
│ ├── llm.py
│ ├── model.py
│ ├── prompt.py
│ └── utils.py
└── utils
└── utils.py
19 directories, 43 files
- [ ] Speech Emotion Recognition: Develop a model to automatically detect emotions from speech data.
- [ ] New Forced Alignment Model: Train a forced alignment model from scratch.
- [ ] New Vocal Separation Model: Train a vocal separation model from scratch.
- [ ] Unit Tests: Add a comprehensive unit testing script to validate functionality.
- [ ] Logging Logic: Implement a more comprehensive and structured logging mechanism.
- [ ] Warnings: Add meaningful and detailed warning messages for better user guidance.
- [ ] Real-Time Analysis: Enable real-time analysis capabilities within the system.
- [ ] Dockerization: Containerize the repository to ensure seamless deployment and environment consistency.
- [ ] New Transcription Models: Integrate and test new transcription models suchas AIOLA’s Multi-Head Speech Recognition Model.
- [ ] Noise Reduction Model: Identify, test, and integrate a deep learning-based noise reduction model. Consider existing models like Facebook Research Denoiser, Noise2Noise, Audio Denoiser CNN. Write test scripts for evaluation, and if necessary, train a new model for optimal performance.
- [ ] Detect CSR's identity via Voice Recognition/Identification instead of Diarization and LLM.
- [ ] Transform the code structure into a pipeline for better modularity and scalability.
- [ ] Publish the repository as a Python package on PyPI for wider distribution.
- [ ] Convert the repository into a Linux package to support Linux-based systems.
- [ ] Implement a two-step processing workflow: perform diarization (speaker segmentation) first, then apply * transcription* for each identified speaker separately. This approach can improve transcription accuracy by leveraging speaker separation.
- [ ] Enable parallel processing for tasks such as diarization, transcription, and model inference to improve overall system performance and reduce processing time.
- [ ] Explore using Docker Compose for multi-container orchestration if required.
- [ ] Upload the models and relevant resources to Hugging Face for easier access, sharing, and community collaboration.
- [ ] Consider writing a Command Line Interface (CLI) to simplify user interaction and improve usability.
- [ ] Test the ability to use different language models (LLMs) for specific tasks. For instance, using BERT for profanity detection. Evaluate their performance and suitability for different use cases as a feature.
@software{ Callytics,
author = {Bunyamin Ergen},
title = {{Callytics}},
year = {2024},
month = {12},
url = {https://github.com/bunyaminergen/Callytics},
version = {v1.1.0},
}
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Callytics
Similar Open Source Tools

Callytics
Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers. By processing both the audio and text of each call, it provides insights such as sentiment analysis, topic detection, conflict detection, profanity word detection, and summary. These cutting-edge techniques help businesses optimize customer interactions, identify areas for improvement, and enhance overall service quality. When an audio file is placed in the .data/input directory, the entire pipeline automatically starts running, and the resulting data is inserted into the database. This is only a v1.1.0 version; many new features will be added, models will be fine-tuned or trained from scratch, and various optimization efforts will be applied.

human
AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition, Body Segmentation

layra
LAYRA is the world's first visual-native AI automation engine that sees documents like a human, preserves layout and graphical elements, and executes arbitrarily complex workflows with full Python control. It empowers users to build next-generation intelligent systems with no limits or compromises. Built for Enterprise-Grade deployment, LAYRA features a modern frontend, high-performance backend, decoupled service architecture, visual-native multimodal document understanding, and a powerful workflow engine.

ebook2audiobook
ebook2audiobook is a CPU/GPU converter tool that converts eBooks to audiobooks with chapters and metadata using tools like Calibre, ffmpeg, XTTSv2, and Fairseq. It supports voice cloning and a wide range of languages. The tool is designed to run on 4GB RAM and provides a new v2.0 Web GUI interface for user-friendly interaction. Users can convert eBooks to text format, split eBooks into chapters, and utilize high-quality text-to-speech functionalities. Supported languages include Arabic, Chinese, English, French, German, Hindi, and many more. The tool can be used for legal, non-DRM eBooks only and should be used responsibly in compliance with applicable laws.

asktube
AskTube is an AI-powered YouTube video summarizer and QA assistant that utilizes Retrieval Augmented Generation (RAG) technology. It offers a comprehensive solution with Q&A functionality and aims to provide a user-friendly experience for local machine usage. The project integrates various technologies including Python, JS, Sanic, Peewee, Pytubefix, Sentence Transformers, Sqlite, Chroma, and NuxtJs/DaisyUI. AskTube supports multiple providers for analysis, AI services, and speech-to-text conversion. The tool is designed to extract data from YouTube URLs, store embedding chapter subtitles, and facilitate interactive Q&A sessions with enriched questions. It is not intended for production use but rather for end-users on their local machines.

probe
Probe is an AI-friendly, fully local, semantic code search tool designed to power the next generation of AI coding assistants. It combines the speed of ripgrep with the code-aware parsing of tree-sitter to deliver precise results with complete code blocks, making it perfect for large codebases and AI-driven development workflows. Probe is fully local, keeping code on the user's machine without relying on external APIs. It supports multiple languages, offers various search options, and can be used in CLI mode, MCP server mode, AI chat mode, and web interface. The tool is designed to be flexible, fast, and accurate, providing developers and AI models with full context and relevant code blocks for efficient code exploration and understanding.

swift-ocr-llm-powered-pdf-to-markdown
Swift OCR is a powerful tool for extracting text from PDF files using OpenAI's GPT-4 Turbo with Vision model. It offers flexible input options, advanced OCR processing, performance optimizations, structured output, robust error handling, and scalable architecture. The tool ensures accurate text extraction, resilience against failures, and efficient handling of multiple requests.

RepoMaster
RepoMaster is an AI agent that leverages GitHub repositories to solve complex real-world tasks. It transforms how coding tasks are solved by automatically finding the right GitHub tools and making them work together seamlessly. Users can describe their tasks, and RepoMaster's AI analysis leads to auto discovery and smart execution, resulting in perfect outcomes. The tool provides a web interface for beginners and a command-line interface for advanced users, along with specialized agents for deep search, general assistance, and repository tasks.

zotero-mcp
Zotero MCP is an open-source project that integrates AI capabilities with Zotero using the Model Context Protocol. It consists of a Zotero plugin and an MCP server, enabling AI assistants to search, retrieve, and cite references from Zotero library. The project features a unified architecture with an integrated MCP server, eliminating the need for a separate server process. It provides features like intelligent search, detailed reference information, filtering by tags and identifiers, aiding in academic tasks such as literature reviews and citation management.

TrustEval-toolkit
TrustEval-toolkit is a dynamic and comprehensive framework for evaluating the trustworthiness of Generative Foundation Models (GenFMs) across dimensions such as safety, fairness, robustness, privacy, and more. It offers features like dynamic dataset generation, multi-model compatibility, customizable metrics, metadata-driven pipelines, comprehensive evaluation dimensions, optimized inference, and detailed reports.

probe
Probe is an AI-friendly, fully local, semantic code search tool designed to power the next generation of AI coding assistants. It combines the speed of ripgrep with the code-aware parsing of tree-sitter to deliver precise results with complete code blocks, making it perfect for large codebases and AI-driven development workflows. Probe supports various features like AI-friendly code extraction, fully local operation without external APIs, fast scanning of large codebases, accurate code structure parsing, re-rankers and NLP methods for better search results, multi-language support, interactive AI chat mode, and flexibility to run as a CLI tool, MCP server, or interactive AI chat.

bifrost
Bifrost is a high-performance AI gateway that unifies access to multiple providers through a single OpenAI-compatible API. It offers features like automatic failover, load balancing, semantic caching, and enterprise-grade functionalities. Users can deploy Bifrost in seconds with zero configuration, benefiting from its core infrastructure, advanced features, enterprise and security capabilities, and developer experience. The repository structure is modular, allowing for maximum flexibility. Bifrost is designed for quick setup, easy configuration, and seamless integration with various AI models and tools.

evi-run
evi-run is a powerful, production-ready multi-agent AI system built on Python using the OpenAI Agents SDK. It offers instant deployment, ultimate flexibility, built-in analytics, Telegram integration, and scalable architecture. The system features memory management, knowledge integration, task scheduling, multi-agent orchestration, custom agent creation, deep research, web intelligence, document processing, image generation, DEX analytics, and Solana token swap. It supports flexible usage modes like private, free, and pay mode, with upcoming features including NSFW mode, task scheduler, and automatic limit orders. The technology stack includes Python 3.11, OpenAI Agents SDK, Telegram Bot API, PostgreSQL, Redis, and Docker & Docker Compose for deployment.

AIPex
AIPex is a revolutionary Chrome extension that transforms your browser into an intelligent automation platform. Using natural language commands and AI-powered intelligence, AIPex can automate virtually any browser task - from complex multi-step workflows to simple repetitive actions. It offers features like natural language control, AI-powered intelligence, multi-step automation, universal compatibility, smart data extraction, precision actions, form automation, visual understanding, developer-friendly with extensive API, and lightning-fast execution of automation tasks.

local-deep-research
Local Deep Research is a powerful AI-powered research assistant that performs deep, iterative analysis using multiple LLMs and web searches. It can be run locally for privacy or configured to use cloud-based LLMs for enhanced capabilities. The tool offers advanced research capabilities, flexible LLM support, rich output options, privacy-focused operation, enhanced search integration, and academic & scientific integration. It also provides a web interface, command line interface, and supports multiple LLM providers and search engines. Users can configure AI models, search engines, and research parameters for customized research experiences.
For similar tasks

Callytics
Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers. By processing both the audio and text of each call, it provides insights such as sentiment analysis, topic detection, conflict detection, profanity word detection, and summary. These cutting-edge techniques help businesses optimize customer interactions, identify areas for improvement, and enhance overall service quality. When an audio file is placed in the .data/input directory, the entire pipeline automatically starts running, and the resulting data is inserted into the database. This is only a v1.1.0 version; many new features will be added, models will be fine-tuned or trained from scratch, and various optimization efforts will be applied.

llm-memorization
The 'llm-memorization' project is a tool designed to index, archive, and search conversations with a local LLM using a SQLite database enriched with automatically extracted keywords. It aims to provide personalized context at the start of a conversation by adding memory information to the initial prompt. The tool automates queries from local LLM conversational management libraries, offers a hybrid search function, enhances prompts based on posed questions, and provides an all-in-one graphical user interface for data visualization. It supports both French and English conversations and prompts for bilingual use.

TME-AIX
The TME-AIX repository is a collaborative workspace dedicated to exploring Telco Media Entertainment use-cases using open source AI capabilities and datasets. It focuses on projects like Revenue Assurance, Service Assurance Predictions, 5G Network Fault Predictions, Sustainability, SecOps-AI, SmartGrid, IoT Security, Customer Relation Management, Anomaly Detection, Starlink Quality Predictions, and NoC AI Augmentation for OSS.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.