Callytics

Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers.

Stars: 63

Visit

Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers. By processing both the audio and text of each call, it provides insights such as sentiment analysis, topic detection, conflict detection, profanity word detection, and summary. These cutting-edge techniques help businesses optimize customer interactions, identify areas for improvement, and enhance overall service quality. When an audio file is placed in the .data/input directory, the entire pipeline automatically starts running, and the resulting data is inserted into the database. This is only a v1.1.0 version; many new features will be added, models will be fine-tuned or trained from scratch, and various optimization efforts will be applied.

README:

Callytics

Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers. By processing both the audio and text of each call, it provides insights such as sentiment analysis, topic detection, conflict detection, profanity word detection and summary. These cutting-edge techniques help businesses optimize customer interactions, identify areas for improvement, and enhance overall service quality.

When an audio file is placed in the .data/input directory, the entire pipeline automatically starts running, and the resulting data is inserted into the database.

Note: This is only a v1.1.0 version; many new features will be added, models will be fine-tuned or trained from scratch, and various optimization efforts will be applied. For more information, you can check out the Upcoming section.

Note: If you would like to contribute to this repository, please read the CONTRIBUTING first.

Prerequisites
Architecture
Math And Algorithm
Features
Demo
Installation
File Structure
Database Structure
Datasets
Version Control System
Upcoming
Documentations
License
Links
Team
Contact
Citation

Prerequisites

General

Python 3.11 (or above)

Llama

GPU (min 24GB) (or above)
Hugging Face Credentials (Account, Token)
Llama-3.2-11B-Vision-Instruct (or above)

OpenAI

GPU (min 12GB) (for other process such as faster whisper & NeMo)
At least one of the following is required:
- OpenAI Credentials (Account, API Key)
- Azure OpenAI Credentials (Account, API Key, API Base URL)

Architecture

Math and Algorithm

This section describes the mathematical models and algorithms used in the project.

Note: The mathematical concepts and algorithms specific to this repository, rather than the models used, will be provided in this section. Please refer to the RESOURCES under the Documentations section for the repositories and models utilized or referenced.

Silence Duration Calculation

The silence durations are derived from the time intervals between speech segments:

$$S = {s_1, s_2, \ldots, s_n}$$

represent the set of silence durations (in seconds) between consecutive speech segments.

A user-defined factor:

$$\text{factor} \in \mathbb{R}^{+}$$

To determine a threshold that distinguishes significant silence from trivial gaps, two statistical methods can be applied:

1. Standard Deviation-Based Threshold

Mean:

$$\mu = \frac{1}{n}\sum_{i=1}^{n}s_i$$

Standard Deviation:

$$ \sigma = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(s_i - \mu)^2} $$

Threshold:

$$ T_{\text{std}} = \sigma \cdot \text{factor} $$

2. Median + Interquartile Range (IQR) Threshold

Median:

Let:

$$ S = {s_{(1)} \leq s_{(2)} \leq \cdots \leq s_{(n)}} $$

be an ordered set.

Then:

$$ M = \text{median}(S) = \begin{cases} s_{\frac{n+1}{2}}, & \text{if } n \text{ is odd}, \\[6pt] \frac{s_{\frac{n}{2}} + s_{\frac{n}{2}+1}}{2}, & \text{if } n \text{ is even}. \end{cases} $$

Quartiles:

$$ Q_1 = s_{(\lfloor 0.25n \rfloor)}, \quad Q_3 = s_{(\lfloor 0.75n \rfloor)} $$

IQR:

$$ \text{IQR} = Q_3 - Q_1 $$

Threshold:

$$ T_{\text{median\_iqr}} = M + (\text{IQR} \times \text{factor}) $$

Total Silence Above Threshold

Once the threshold

$$T$$

either

$$T_{\text{std}}$$

$$T_{\text{median\_iqr}}$$

is defined, we sum only those silence durations that meet or exceed this threshold:

$$ \text{TotalSilence} = \sum_{i=1}^{n} s_i \cdot \mathbf{1}(s_i \geq T) $$

where $$\mathbf{1}(s_i \geq T)$$ is an indicator function defined as:

$$ \mathbf{1}(s_i \geq T) = \begin{cases} 1 & \text{if } s_i \geq T \ 0 & \text{otherwise} \end{cases} $$

Summary:

Identify the silence durations:

$$ S = {s_1, s_2, \ldots, s_n} $$

Determine the threshold using either:

Standard deviation-based:

$$ T = \sigma \cdot \text{factor} $$

Median+IQR-based:

$$ T = M + (\text{IQR} \cdot \text{factor}) $$

Compute the total silence above this threshold:

$$ \text{TotalSilence} = \sum_{i=1}^{n} s_i \cdot \mathbf{1}(s_i \geq T) $$

Features

[x] Speech Enhancement
[x] Sentiment Analysis
[x] Profanity Word Detection
[x] Summary
[x] Conflict Detection
[x] Topic Detection

Demo

Installation

Linux/Ubuntu

sudo apt update -y && sudo apt upgrade -y

sudo apt install ffmpeg -y

sudo apt install -y ffmpeg build-essential g++

git clone https://github.com/bunyaminergen/Callytics

cd Callytics

conda env create -f environment.yaml

conda activate Callytics

Environment

.env file sample:

# CREDENTIALS
# OPENAI
OPENAI_API_KEY=

# HUGGINGFACE
HUGGINGFACE_TOKEN=

# AZURE OPENAI
AZURE_OPENAI_API_KEY=
AZURE_OPENAI_API_BASE=
AZURE_OPENAI_API_VERSION=

# DATABASE
DB_NAME=
DB_USER=
DB_PASSWORD=
DB_HOST=
DB_PORT=
DB_URL=

Database

In this section, an example database and tables are provided. It is a well-structured and simple design. If you create the tables and columns in the same structure in your remote database, you will not encounter errors in the code. However, if you want to change the database structure, you will also need to refactor the code.

Note: Refer to the Database Structure section for the database schema and tables.

sqlite3 .db/Callytics.sqlite < src/db/sql/Schema.sql

Grafana

In this section, it is explained how to install Grafana on your local environment. Since Grafana is a third-party open-source monitoring application, you must handle its installation yourself and connect your database. Of course, you can also use it with Granafa Cloud instead of local environment.

sudo apt update -y && sudo apt upgrade -y

sudo apt install -y apt-transport-https software-properties-common wget

wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -

echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list

sudo apt install -y grafana

sudo systemctl start grafana-server
sudo systemctl enable grafana-server
sudo systemctl daemon-reload

http://localhost:3000

SQLite Plugin

sudo grafana-cli plugins install frser-sqlite-datasource

sudo systemctl restart grafana-server

sudo systemctl daemon-reload

File Structure

.
├── automation
│         └── service
│             └── callytics.service
├── config
│         ├── config.yaml
│         ├── nemo
│         │         └── diar_infer_telephonic.yaml
│         └── prompt.yaml
├── .data
│         ├── example
│         │         └── LogisticsCallCenterConversation.mp3
│         └── input
├── .db
│         └── Callytics.sqlite
├── .docs
│         ├── documentation
│         │         ├── CONTRIBUTING.md
│         │         └── RESOURCES.md
│         └── img
│             ├── Callytics.drawio
│             ├── Callytics.gif
│             ├── CallyticsIcon.png
│             ├── Callytics.png
│             ├── Callytics.svg
│            └── database.png
├── .env
├── environment.yaml
├── .gitattributes
├── .github
│         └── CODEOWNERS
├── .gitignore
├── LICENSE
├── main.py
├── README.md
├── requirements.txt
└── src
    ├── audio
    │         ├── alignment.py
    │         ├── analysis.py
    │         ├── effect.py
    │         ├── error.py
    │         ├── io.py
    │         ├── metrics.py
    │         ├── preprocessing.py
    │         ├── processing.py
    │         └── utils.py
    ├── db
    │         ├── manager.py
    │         └── sql
    │             ├── AudioPropertiesInsert.sql
    │             ├── Schema.sql
    │             ├── TopicFetch.sql
    │             ├── TopicInsert.sql
    │             └── UtteranceInsert.sql
    ├── text
    │         ├── llm.py
    │         ├── model.py
    │         ├── prompt.py
    │         └── utils.py
    └── utils
        └── utils.py

19 directories, 43 files

Database Structure

Datasets

Callytics Speaker Verification Dataset (CSVD)

Version Control System

Releases

v1.0.0 .zip
v1.0.0 .tar.gz
v1.1.0 .zip
v1.1.0 .tar.gz

Branches

main
develop

Upcoming

[ ] Speech Emotion Recognition: Develop a model to automatically detect emotions from speech data.
[ ] New Forced Alignment Model: Train a forced alignment model from scratch.
[ ] New Vocal Separation Model: Train a vocal separation model from scratch.
[ ] Unit Tests: Add a comprehensive unit testing script to validate functionality.
[ ] Logging Logic: Implement a more comprehensive and structured logging mechanism.
[ ] Warnings: Add meaningful and detailed warning messages for better user guidance.
[ ] Real-Time Analysis: Enable real-time analysis capabilities within the system.
[ ] Dockerization: Containerize the repository to ensure seamless deployment and environment consistency.
[ ] New Transcription Models: Integrate and test new transcription models suchas AIOLA’s Multi-Head Speech Recognition Model.
[ ] Noise Reduction Model: Identify, test, and integrate a deep learning-based noise reduction model. Consider existing models like Facebook Research Denoiser, Noise2Noise, Audio Denoiser CNN. Write test scripts for evaluation, and if necessary, train a new model for optimal performance.

Considerations

[ ] Detect CSR's identity via Voice Recognition/Identification instead of Diarization and LLM.
[ ] Transform the code structure into a pipeline for better modularity and scalability.
[ ] Publish the repository as a Python package on PyPI for wider distribution.
[ ] Convert the repository into a Linux package to support Linux-based systems.
[ ] Implement a two-step processing workflow: perform diarization (speaker segmentation) first, then apply * transcription* for each identified speaker separately. This approach can improve transcription accuracy by leveraging speaker separation.
[ ] Enable parallel processing for tasks such as diarization, transcription, and model inference to improve overall system performance and reduce processing time.
[ ] Explore using Docker Compose for multi-container orchestration if required.
[ ] Upload the models and relevant resources to Hugging Face for easier access, sharing, and community collaboration.
[ ] Consider writing a Command Line Interface (CLI) to simplify user interaction and improve usability.
[ ] Test the ability to use different language models (LLMs) for specific tasks. For instance, using BERT for profanity detection. Evaluate their performance and suitability for different use cases as a feature.

Documentations

Citation

@software{       Callytics,
  author       = {Bunyamin Ergen},
  title        = {{Callytics}},
  year         = {2024},
  month        = {12},
  url          = {https://github.com/bunyaminergen/Callytics},
  version      = {v1.1.0},
}

For Tasks:

Click tags to check more tools for each tasks

analyze conversations optimize interactions detect conflicts improve service quality summarize calls

For Jobs:

customer service representative call center agent data analyst business analyst ai engineer

Alternative AI tools for Callytics

Similar Open Source Tools

Callytics

github

: 63

astrsk

astrsk is a tool that pushes the boundaries of AI storytelling by offering advanced AI agents, customizable response formatting, and flexible prompt editing for immersive roleplaying experiences. It provides complete AI agent control, a visual flow editor for conversation flows, and ensures 100% local-first data storage. The tool is true cross-platform with support for various AI providers and modern technologies like React, TypeScript, and Tailwind CSS. Coming soon features include cross-device sync, enhanced session customization, and community features.

github

: 106

human

AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition, Body Segmentation

github

: 2.0k

lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron. We're releasing it with the community in the spirit of building in the open. Note that it is still very much early so don't expect 100% stability ^^' In case of problems or question, feel free to open an issue!

github

: 2.0k

layra

LAYRA is the world's first visual-native AI automation engine that sees documents like a human, preserves layout and graphical elements, and executes arbitrarily complex workflows with full Python control. It empowers users to build next-generation intelligent systems with no limits or compromises. Built for Enterprise-Grade deployment, LAYRA features a modern frontend, high-performance backend, decoupled service architecture, visual-native multimodal document understanding, and a powerful workflow engine.

github

: 817

ebook2audiobook

ebook2audiobook is a CPU/GPU converter tool that converts eBooks to audiobooks with chapters and metadata using tools like Calibre, ffmpeg, XTTSv2, and Fairseq. It supports voice cloning and a wide range of languages. The tool is designed to run on 4GB RAM and provides a new v2.0 Web GUI interface for user-friendly interaction. Users can convert eBooks to text format, split eBooks into chapters, and utilize high-quality text-to-speech functionalities. Supported languages include Arabic, Chinese, English, French, German, Hindi, and many more. The tool can be used for legal, non-DRM eBooks only and should be used responsibly in compliance with applicable laws.

github

: 9.2k

asktube

AskTube is an AI-powered YouTube video summarizer and QA assistant that utilizes Retrieval Augmented Generation (RAG) technology. It offers a comprehensive solution with Q&A functionality and aims to provide a user-friendly experience for local machine usage. The project integrates various technologies including Python, JS, Sanic, Peewee, Pytubefix, Sentence Transformers, Sqlite, Chroma, and NuxtJs/DaisyUI. AskTube supports multiple providers for analysis, AI services, and speech-to-text conversion. The tool is designed to extract data from YouTube URLs, store embedding chapter subtitles, and facilitate interactive Q&A sessions with enriched questions. It is not intended for production use but rather for end-users on their local machines.

github

: 65

jan

Jan is an open-source ChatGPT alternative that runs 100% offline on your computer. It supports universal architectures, including Nvidia GPUs, Apple M-series, Apple Intel, Linux Debian, and Windows x64. Jan is currently in development, so expect breaking changes and bugs. It is lightweight and embeddable, and can be used on its own within your own projects.

github

: 24.8k

probe

Probe is an AI-friendly, fully local, semantic code search tool designed to power the next generation of AI coding assistants. It combines the speed of ripgrep with the code-aware parsing of tree-sitter to deliver precise results with complete code blocks, making it perfect for large codebases and AI-driven development workflows. Probe is fully local, keeping code on the user's machine without relying on external APIs. It supports multiple languages, offers various search options, and can be used in CLI mode, MCP server mode, AI chat mode, and web interface. The tool is designed to be flexible, fast, and accurate, providing developers and AI models with full context and relevant code blocks for efficient code exploration and understanding.

github

: 110

swift-ocr-llm-powered-pdf-to-markdown

Swift OCR is a powerful tool for extracting text from PDF files using OpenAI's GPT-4 Turbo with Vision model. It offers flexible input options, advanced OCR processing, performance optimizations, structured output, robust error handling, and scalable architecture. The tool ensures accurate text extraction, resilience against failures, and efficient handling of multiple requests.

github

: 219

AutoAgents

AutoAgents is a cutting-edge multi-agent framework built in Rust that enables the creation of intelligent, autonomous agents powered by Large Language Models (LLMs) and Ractor. Designed for performance, safety, and scalability. AutoAgents provides a robust foundation for building complex AI systems that can reason, act, and collaborate. With AutoAgents you can create Cloud Native Agents, Edge Native Agents and Hybrid Models as well. It is so extensible that other ML Models can be used to create complex pipelines using Actor Framework.

github

: 65

RepoMaster

RepoMaster is an AI agent that leverages GitHub repositories to solve complex real-world tasks. It transforms how coding tasks are solved by automatically finding the right GitHub tools and making them work together seamlessly. Users can describe their tasks, and RepoMaster's AI analysis leads to auto discovery and smart execution, resulting in perfect outcomes. The tool provides a web interface for beginners and a command-line interface for advanced users, along with specialized agents for deep search, general assistance, and repository tasks.

github

: 167

TrustEval-toolkit

TrustEval-toolkit is a dynamic and comprehensive framework for evaluating the trustworthiness of Generative Foundation Models (GenFMs) across dimensions such as safety, fairness, robustness, privacy, and more. It offers features like dynamic dataset generation, multi-model compatibility, customizable metrics, metadata-driven pipelines, comprehensive evaluation dimensions, optimized inference, and detailed reports.

github

: 95

Rystem.OpenAi

github

: 94

zotero-mcp

Zotero MCP is an open-source project that integrates AI capabilities with Zotero using the Model Context Protocol. It consists of a Zotero plugin and an MCP server, enabling AI assistants to search, retrieve, and cite references from Zotero library. The project features a unified architecture with an integrated MCP server, eliminating the need for a separate server process. It provides features like intelligent search, detailed reference information, filtering by tags and identifiers, aiding in academic tasks such as literature reviews and citation management.

github

: 99

TranslateBookWithLLM

TranslateBookWithLLM is a Python application designed for large-scale text translation, such as entire books (.EPUB), subtitle files (.SRT), and plain text. It leverages local LLMs via the Ollama API or Gemini API. The tool offers both a web interface for ease of use and a command-line interface for advanced users. It supports multiple format translations, provides a user-friendly browser-based interface, CLI support for automation, multiple LLM providers including local Ollama models and Google Gemini API, and Docker support for easy deployment.

github

: 113

For similar tasks

Callytics

github

: 63

llm-memorization

The 'llm-memorization' project is a tool designed to index, archive, and search conversations with a local LLM using a SQLite database enriched with automatically extracted keywords. It aims to provide personalized context at the start of a conversation by adding memory information to the initial prompt. The tool automates queries from local LLM conversational management libraries, offers a hybrid search function, enhances prompts based on posed questions, and provides an all-in-one graphical user interface for data visualization. It supports both French and English conversations and prompts for bilingual use.

github

: 56

TME-AIX

The TME-AIX repository is a collaborative workspace dedicated to exploring Telco Media Entertainment use-cases using open source AI capabilities and datasets. It focuses on projects like Revenue Assurance, Service Assurance Predictions, 5G Network Fault Predictions, Sustainability, SecOps-AI, SmartGrid, IoT Security, Customer Relation Management, Anomaly Detection, Starlink Quality Predictions, and NoC AI Augmentation for OSS.

github

: 115

aisdk-prompt-optimizer

AISDK Prompt Optimizer is an open-source tool designed to transform AI interactions by optimizing prompts. It utilizes the GEPA reflective optimizer to evolve textual components of AI systems, providing features such as reflective prompt mutation, rich textual feedback, and Pareto-based selection. Users can teach their AI desired behaviors, collect ideal samples, run optimization to generate optimized prompts, and deploy the results in their applications. The tool leverages advanced optimization algorithms to guide AI through interactive conversations and refine prompt candidates for improved performance.

github

: 83

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 668

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k

Callytics

README:

Callytics

Table of Contents

Prerequisites

General

Llama

OpenAI

Architecture

Math and Algorithm

Silence Duration Calculation

Features

Demo

Installation

Linux/Ubuntu

Environment

Database

Grafana

File Structure

Database Structure

Datasets

Version Control System

Releases

Branches

Upcoming

Considerations

Documentations

Licence

Links

Team

Contact

Citation

For Tasks:

For Jobs:

Alternative AI tools for Callytics

Similar Open Source Tools

Callytics

astrsk

human

lighteval

layra

ebook2audiobook

asktube

jan

probe

swift-ocr-llm-powered-pdf-to-markdown

AutoAgents

RepoMaster

TrustEval-toolkit

Rystem.OpenAi

zotero-mcp

TranslateBookWithLLM

For similar tasks

Callytics

llm-memorization

TME-AIX

aisdk-prompt-optimizer

For similar jobs

sweep

teams-ai

ai-guide

classifai

chatbot-ui

BricksLLM

uAgents

griptape