
Callytics
Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers.
Stars: 63

Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers. By processing both the audio and text of each call, it provides insights such as sentiment analysis, topic detection, conflict detection, profanity word detection, and summary. These cutting-edge techniques help businesses optimize customer interactions, identify areas for improvement, and enhance overall service quality. When an audio file is placed in the .data/input directory, the entire pipeline automatically starts running, and the resulting data is inserted into the database. This is only a v1.1.0 version; many new features will be added, models will be fine-tuned or trained from scratch, and various optimization efforts will be applied.
README:

Callytics
is an advanced call analytics solution that leverages speech recognition and large language models (LLMs)
technologies to analyze phone conversations from customer service and call centers. By processing both the
audio and text of each call, it provides insights such as sentiment analysis, topic detection, conflict detection,
profanity word detection and summary. These cutting-edge techniques help businesses optimize customer interactions,
identify areas for improvement, and enhance overall service quality.
When an audio file is placed in the .data/input
directory, the entire pipeline automatically starts running, and the
resulting data is inserted into the database.
Note: This is only a v1.1.0
version; many new features will be added, models
will be fine-tuned or trained from scratch, and various optimization efforts will be applied. For more information,
you can check out the Upcoming section.
Note: If you would like to contribute to this repository, please read the CONTRIBUTING first.
- Prerequisites
- Architecture
- Math And Algorithm
- Features
- Demo
- Installation
- File Structure
- Database Structure
- Datasets
- Version Control System
- Upcoming
- Documentations
- License
- Links
- Team
- Contact
- Citation
-
Python 3.11
(or above)
-
GPU (min 24GB)
(or above) Hugging Face Credentials (Account, Token)
-
Llama-3.2-11B-Vision-Instruct
(or above)
-
GPU (min 12GB)
(for other process such asfaster whisper
&NeMo
) - At least one of the following is required:
OpenAI Credentials (Account, API Key)
Azure OpenAI Credentials (Account, API Key, API Base URL)
This section describes the mathematical models and algorithms used in the project.
Note: The mathematical concepts and algorithms specific to this repository, rather than the models used, will be
provided in this section. Please refer to the RESOURCES
under the Documentations section for the
repositories and models utilized or referenced.
The silence durations are derived from the time intervals between speech segments:
$$S = {s_1, s_2, \ldots, s_n}$$
represent the set of silence durations (in seconds) between consecutive speech segments.
- A user-defined factor:
$$\text{factor} \in \mathbb{R}^{+}$$
To determine a threshold that distinguishes significant silence from trivial gaps, two statistical methods can be applied:
1. Standard Deviation-Based Threshold
- Mean:
$$\mu = \frac{1}{n}\sum_{i=1}^{n}s_i$$
- Standard Deviation:
$$ \sigma = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(s_i - \mu)^2} $$
- Threshold:
$$ T_{\text{std}} = \sigma \cdot \text{factor} $$
2. Median + Interquartile Range (IQR) Threshold
- Median:
Let:
$$ S = {s_{(1)} \leq s_{(2)} \leq \cdots \leq s_{(n)}} $$
be an ordered set.
Then:
$$ M = \text{median}(S) = \begin{cases} s_{\frac{n+1}{2}}, & \text{if } n \text{ is odd}, \\[6pt] \frac{s_{\frac{n}{2}} + s_{\frac{n}{2}+1}}{2}, & \text{if } n \text{ is even}. \end{cases} $$
- Quartiles:
$$ Q_1 = s_{(\lfloor 0.25n \rfloor)}, \quad Q_3 = s_{(\lfloor 0.75n \rfloor)} $$
- IQR:
$$ \text{IQR} = Q_3 - Q_1 $$
- Threshold:
$$ T_{\text{median\_iqr}} = M + (\text{IQR} \times \text{factor}) $$
Total Silence Above Threshold
Once the threshold
$$T$$
either
$$T_{\text{std}}$$
or
$$T_{\text{median\_iqr}}$$
is defined, we sum only those silence durations that meet or exceed this threshold:
$$ \text{TotalSilence} = \sum_{i=1}^{n} s_i \cdot \mathbf{1}(s_i \geq T) $$
where $$\mathbf{1}(s_i \geq T)$$ is an indicator function defined as:
$$ \mathbf{1}(s_i \geq T) = \begin{cases} 1 & \text{if } s_i \geq T \ 0 & \text{otherwise} \end{cases} $$
Summary:
- Identify the silence durations:
$$ S = {s_1, s_2, \ldots, s_n} $$
- Determine the threshold using either:
Standard deviation-based:
$$ T = \sigma \cdot \text{factor} $$
Median+IQR-based:
$$ T = M + (\text{IQR} \cdot \text{factor}) $$
- Compute the total silence above this threshold:
$$ \text{TotalSilence} = \sum_{i=1}^{n} s_i \cdot \mathbf{1}(s_i \geq T) $$
- [x] Speech Enhancement
- [x] Sentiment Analysis
- [x] Profanity Word Detection
- [x] Summary
- [x] Conflict Detection
- [x] Topic Detection
sudo apt update -y && sudo apt upgrade -y
sudo apt install ffmpeg -y
sudo apt install -y ffmpeg build-essential g++
git clone https://github.com/bunyaminergen/Callytics
cd Callytics
conda env create -f environment.yaml
conda activate Callytics
.env
file sample:
# CREDENTIALS
# OPENAI
OPENAI_API_KEY=
# HUGGINGFACE
HUGGINGFACE_TOKEN=
# AZURE OPENAI
AZURE_OPENAI_API_KEY=
AZURE_OPENAI_API_BASE=
AZURE_OPENAI_API_VERSION=
# DATABASE
DB_NAME=
DB_USER=
DB_PASSWORD=
DB_HOST=
DB_PORT=
DB_URL=
In this section, an example database
and tables
are provided. It is a well-structured
and simple design
. If you
create the tables
and columns in the same structure in your remote database, you will not encounter errors in the code. However, if you
want to change the database structure, you will also need to refactor the code.
Note: Refer to the Database Structure section for the database schema and tables.
sqlite3 .db/Callytics.sqlite < src/db/sql/Schema.sql
In this section, it is explained how to install Grafana
on your local
environment. Since Grafana is a third-party
open-source monitoring application, you must handle its installation yourself and connect your database. Of course, you
can also use it with Granafa Cloud
instead of local
environment.
sudo apt update -y && sudo apt upgrade -y
sudo apt install -y apt-transport-https software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt install -y grafana
sudo systemctl start grafana-server
sudo systemctl enable grafana-server
sudo systemctl daemon-reload
http://localhost:3000
SQLite Plugin
sudo grafana-cli plugins install frser-sqlite-datasource
sudo systemctl restart grafana-server
sudo systemctl daemon-reload
.
├── automation
│ └── service
│ └── callytics.service
├── config
│ ├── config.yaml
│ ├── nemo
│ │ └── diar_infer_telephonic.yaml
│ └── prompt.yaml
├── .data
│ ├── example
│ │ └── LogisticsCallCenterConversation.mp3
│ └── input
├── .db
│ └── Callytics.sqlite
├── .docs
│ ├── documentation
│ │ ├── CONTRIBUTING.md
│ │ └── RESOURCES.md
│ └── img
│ ├── Callytics.drawio
│ ├── Callytics.gif
│ ├── CallyticsIcon.png
│ ├── Callytics.png
│ ├── Callytics.svg
│ └── database.png
├── .env
├── environment.yaml
├── .gitattributes
├── .github
│ └── CODEOWNERS
├── .gitignore
├── LICENSE
├── main.py
├── README.md
├── requirements.txt
└── src
├── audio
│ ├── alignment.py
│ ├── analysis.py
│ ├── effect.py
│ ├── error.py
│ ├── io.py
│ ├── metrics.py
│ ├── preprocessing.py
│ ├── processing.py
│ └── utils.py
├── db
│ ├── manager.py
│ └── sql
│ ├── AudioPropertiesInsert.sql
│ ├── Schema.sql
│ ├── TopicFetch.sql
│ ├── TopicInsert.sql
│ └── UtteranceInsert.sql
├── text
│ ├── llm.py
│ ├── model.py
│ ├── prompt.py
│ └── utils.py
└── utils
└── utils.py
19 directories, 43 files
- [ ] Speech Emotion Recognition: Develop a model to automatically detect emotions from speech data.
- [ ] New Forced Alignment Model: Train a forced alignment model from scratch.
- [ ] New Vocal Separation Model: Train a vocal separation model from scratch.
- [ ] Unit Tests: Add a comprehensive unit testing script to validate functionality.
- [ ] Logging Logic: Implement a more comprehensive and structured logging mechanism.
- [ ] Warnings: Add meaningful and detailed warning messages for better user guidance.
- [ ] Real-Time Analysis: Enable real-time analysis capabilities within the system.
- [ ] Dockerization: Containerize the repository to ensure seamless deployment and environment consistency.
- [ ] New Transcription Models: Integrate and test new transcription models suchas AIOLA’s Multi-Head Speech Recognition Model.
- [ ] Noise Reduction Model: Identify, test, and integrate a deep learning-based noise reduction model. Consider existing models like Facebook Research Denoiser, Noise2Noise, Audio Denoiser CNN. Write test scripts for evaluation, and if necessary, train a new model for optimal performance.
- [ ] Detect CSR's identity via Voice Recognition/Identification instead of Diarization and LLM.
- [ ] Transform the code structure into a pipeline for better modularity and scalability.
- [ ] Publish the repository as a Python package on PyPI for wider distribution.
- [ ] Convert the repository into a Linux package to support Linux-based systems.
- [ ] Implement a two-step processing workflow: perform diarization (speaker segmentation) first, then apply * transcription* for each identified speaker separately. This approach can improve transcription accuracy by leveraging speaker separation.
- [ ] Enable parallel processing for tasks such as diarization, transcription, and model inference to improve overall system performance and reduce processing time.
- [ ] Explore using Docker Compose for multi-container orchestration if required.
- [ ] Upload the models and relevant resources to Hugging Face for easier access, sharing, and community collaboration.
- [ ] Consider writing a Command Line Interface (CLI) to simplify user interaction and improve usability.
- [ ] Test the ability to use different language models (LLMs) for specific tasks. For instance, using BERT for profanity detection. Evaluate their performance and suitability for different use cases as a feature.
@software{ Callytics,
author = {Bunyamin Ergen},
title = {{Callytics}},
year = {2024},
month = {12},
url = {https://github.com/bunyaminergen/Callytics},
version = {v1.1.0},
}
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Callytics
Similar Open Source Tools

Callytics
Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers. By processing both the audio and text of each call, it provides insights such as sentiment analysis, topic detection, conflict detection, profanity word detection, and summary. These cutting-edge techniques help businesses optimize customer interactions, identify areas for improvement, and enhance overall service quality. When an audio file is placed in the .data/input directory, the entire pipeline automatically starts running, and the resulting data is inserted into the database. This is only a v1.1.0 version; many new features will be added, models will be fine-tuned or trained from scratch, and various optimization efforts will be applied.

astrsk
astrsk is a tool that pushes the boundaries of AI storytelling by offering advanced AI agents, customizable response formatting, and flexible prompt editing for immersive roleplaying experiences. It provides complete AI agent control, a visual flow editor for conversation flows, and ensures 100% local-first data storage. The tool is true cross-platform with support for various AI providers and modern technologies like React, TypeScript, and Tailwind CSS. Coming soon features include cross-device sync, enhanced session customization, and community features.

human
AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition, Body Segmentation

lighteval
LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron. We're releasing it with the community in the spirit of building in the open. Note that it is still very much early so don't expect 100% stability ^^' In case of problems or question, feel free to open an issue!

layra
LAYRA is the world's first visual-native AI automation engine that sees documents like a human, preserves layout and graphical elements, and executes arbitrarily complex workflows with full Python control. It empowers users to build next-generation intelligent systems with no limits or compromises. Built for Enterprise-Grade deployment, LAYRA features a modern frontend, high-performance backend, decoupled service architecture, visual-native multimodal document understanding, and a powerful workflow engine.

ebook2audiobook
ebook2audiobook is a CPU/GPU converter tool that converts eBooks to audiobooks with chapters and metadata using tools like Calibre, ffmpeg, XTTSv2, and Fairseq. It supports voice cloning and a wide range of languages. The tool is designed to run on 4GB RAM and provides a new v2.0 Web GUI interface for user-friendly interaction. Users can convert eBooks to text format, split eBooks into chapters, and utilize high-quality text-to-speech functionalities. Supported languages include Arabic, Chinese, English, French, German, Hindi, and many more. The tool can be used for legal, non-DRM eBooks only and should be used responsibly in compliance with applicable laws.

asktube
AskTube is an AI-powered YouTube video summarizer and QA assistant that utilizes Retrieval Augmented Generation (RAG) technology. It offers a comprehensive solution with Q&A functionality and aims to provide a user-friendly experience for local machine usage. The project integrates various technologies including Python, JS, Sanic, Peewee, Pytubefix, Sentence Transformers, Sqlite, Chroma, and NuxtJs/DaisyUI. AskTube supports multiple providers for analysis, AI services, and speech-to-text conversion. The tool is designed to extract data from YouTube URLs, store embedding chapter subtitles, and facilitate interactive Q&A sessions with enriched questions. It is not intended for production use but rather for end-users on their local machines.

jan
Jan is an open-source ChatGPT alternative that runs 100% offline on your computer. It supports universal architectures, including Nvidia GPUs, Apple M-series, Apple Intel, Linux Debian, and Windows x64. Jan is currently in development, so expect breaking changes and bugs. It is lightweight and embeddable, and can be used on its own within your own projects.

probe
Probe is an AI-friendly, fully local, semantic code search tool designed to power the next generation of AI coding assistants. It combines the speed of ripgrep with the code-aware parsing of tree-sitter to deliver precise results with complete code blocks, making it perfect for large codebases and AI-driven development workflows. Probe is fully local, keeping code on the user's machine without relying on external APIs. It supports multiple languages, offers various search options, and can be used in CLI mode, MCP server mode, AI chat mode, and web interface. The tool is designed to be flexible, fast, and accurate, providing developers and AI models with full context and relevant code blocks for efficient code exploration and understanding.

swift-ocr-llm-powered-pdf-to-markdown
Swift OCR is a powerful tool for extracting text from PDF files using OpenAI's GPT-4 Turbo with Vision model. It offers flexible input options, advanced OCR processing, performance optimizations, structured output, robust error handling, and scalable architecture. The tool ensures accurate text extraction, resilience against failures, and efficient handling of multiple requests.

AutoAgents
AutoAgents is a cutting-edge multi-agent framework built in Rust that enables the creation of intelligent, autonomous agents powered by Large Language Models (LLMs) and Ractor. Designed for performance, safety, and scalability. AutoAgents provides a robust foundation for building complex AI systems that can reason, act, and collaborate. With AutoAgents you can create Cloud Native Agents, Edge Native Agents and Hybrid Models as well. It is so extensible that other ML Models can be used to create complex pipelines using Actor Framework.

RepoMaster
RepoMaster is an AI agent that leverages GitHub repositories to solve complex real-world tasks. It transforms how coding tasks are solved by automatically finding the right GitHub tools and making them work together seamlessly. Users can describe their tasks, and RepoMaster's AI analysis leads to auto discovery and smart execution, resulting in perfect outcomes. The tool provides a web interface for beginners and a command-line interface for advanced users, along with specialized agents for deep search, general assistance, and repository tasks.

TrustEval-toolkit
TrustEval-toolkit is a dynamic and comprehensive framework for evaluating the trustworthiness of Generative Foundation Models (GenFMs) across dimensions such as safety, fairness, robustness, privacy, and more. It offers features like dynamic dataset generation, multi-model compatibility, customizable metrics, metadata-driven pipelines, comprehensive evaluation dimensions, optimized inference, and detailed reports.

zotero-mcp
Zotero MCP is an open-source project that integrates AI capabilities with Zotero using the Model Context Protocol. It consists of a Zotero plugin and an MCP server, enabling AI assistants to search, retrieve, and cite references from Zotero library. The project features a unified architecture with an integrated MCP server, eliminating the need for a separate server process. It provides features like intelligent search, detailed reference information, filtering by tags and identifiers, aiding in academic tasks such as literature reviews and citation management.

TranslateBookWithLLM
TranslateBookWithLLM is a Python application designed for large-scale text translation, such as entire books (.EPUB), subtitle files (.SRT), and plain text. It leverages local LLMs via the Ollama API or Gemini API. The tool offers both a web interface for ease of use and a command-line interface for advanced users. It supports multiple format translations, provides a user-friendly browser-based interface, CLI support for automation, multiple LLM providers including local Ollama models and Google Gemini API, and Docker support for easy deployment.
For similar tasks

Callytics
Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers. By processing both the audio and text of each call, it provides insights such as sentiment analysis, topic detection, conflict detection, profanity word detection, and summary. These cutting-edge techniques help businesses optimize customer interactions, identify areas for improvement, and enhance overall service quality. When an audio file is placed in the .data/input directory, the entire pipeline automatically starts running, and the resulting data is inserted into the database. This is only a v1.1.0 version; many new features will be added, models will be fine-tuned or trained from scratch, and various optimization efforts will be applied.

llm-memorization
The 'llm-memorization' project is a tool designed to index, archive, and search conversations with a local LLM using a SQLite database enriched with automatically extracted keywords. It aims to provide personalized context at the start of a conversation by adding memory information to the initial prompt. The tool automates queries from local LLM conversational management libraries, offers a hybrid search function, enhances prompts based on posed questions, and provides an all-in-one graphical user interface for data visualization. It supports both French and English conversations and prompts for bilingual use.

TME-AIX
The TME-AIX repository is a collaborative workspace dedicated to exploring Telco Media Entertainment use-cases using open source AI capabilities and datasets. It focuses on projects like Revenue Assurance, Service Assurance Predictions, 5G Network Fault Predictions, Sustainability, SecOps-AI, SmartGrid, IoT Security, Customer Relation Management, Anomaly Detection, Starlink Quality Predictions, and NoC AI Augmentation for OSS.

aisdk-prompt-optimizer
AISDK Prompt Optimizer is an open-source tool designed to transform AI interactions by optimizing prompts. It utilizes the GEPA reflective optimizer to evolve textual components of AI systems, providing features such as reflective prompt mutation, rich textual feedback, and Pareto-based selection. Users can teach their AI desired behaviors, collect ideal samples, run optimization to generate optimized prompts, and deploy the results in their applications. The tool leverages advanced optimization algorithms to guide AI through interactive conversations and refine prompt candidates for improved performance.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.