
Callytics
Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers.
Stars: 63

Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers. By processing both the audio and text of each call, it provides insights such as sentiment analysis, topic detection, conflict detection, profanity word detection, and summary. These cutting-edge techniques help businesses optimize customer interactions, identify areas for improvement, and enhance overall service quality. When an audio file is placed in the .data/input directory, the entire pipeline automatically starts running, and the resulting data is inserted into the database. This is only a v1.1.0 version; many new features will be added, models will be fine-tuned or trained from scratch, and various optimization efforts will be applied.
README:

Callytics
is an advanced call analytics solution that leverages speech recognition and large language models (LLMs)
technologies to analyze phone conversations from customer service and call centers. By processing both the
audio and text of each call, it provides insights such as sentiment analysis, topic detection, conflict detection,
profanity word detection and summary. These cutting-edge techniques help businesses optimize customer interactions,
identify areas for improvement, and enhance overall service quality.
When an audio file is placed in the .data/input
directory, the entire pipeline automatically starts running, and the
resulting data is inserted into the database.
Note: This is only a v1.1.0
version; many new features will be added, models
will be fine-tuned or trained from scratch, and various optimization efforts will be applied. For more information,
you can check out the Upcoming section.
Note: If you would like to contribute to this repository, please read the CONTRIBUTING first.
- Prerequisites
- Architecture
- Math And Algorithm
- Features
- Demo
- Installation
- File Structure
- Database Structure
- Datasets
- Version Control System
- Upcoming
- Documentations
- License
- Links
- Team
- Contact
- Citation
-
Python 3.11
(or above)
-
GPU (min 24GB)
(or above) Hugging Face Credentials (Account, Token)
-
Llama-3.2-11B-Vision-Instruct
(or above)
-
GPU (min 12GB)
(for other process such asfaster whisper
&NeMo
) - At least one of the following is required:
OpenAI Credentials (Account, API Key)
Azure OpenAI Credentials (Account, API Key, API Base URL)
This section describes the mathematical models and algorithms used in the project.
Note: The mathematical concepts and algorithms specific to this repository, rather than the models used, will be
provided in this section. Please refer to the RESOURCES
under the Documentations section for the
repositories and models utilized or referenced.
The silence durations are derived from the time intervals between speech segments:
$$S = {s_1, s_2, \ldots, s_n}$$
represent the set of silence durations (in seconds) between consecutive speech segments.
- A user-defined factor:
$$\text{factor} \in \mathbb{R}^{+}$$
To determine a threshold that distinguishes significant silence from trivial gaps, two statistical methods can be applied:
1. Standard Deviation-Based Threshold
- Mean:
$$\mu = \frac{1}{n}\sum_{i=1}^{n}s_i$$
- Standard Deviation:
$$ \sigma = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(s_i - \mu)^2} $$
- Threshold:
$$ T_{\text{std}} = \sigma \cdot \text{factor} $$
2. Median + Interquartile Range (IQR) Threshold
- Median:
Let:
$$ S = {s_{(1)} \leq s_{(2)} \leq \cdots \leq s_{(n)}} $$
be an ordered set.
Then:
$$ M = \text{median}(S) = \begin{cases} s_{\frac{n+1}{2}}, & \text{if } n \text{ is odd}, \\[6pt] \frac{s_{\frac{n}{2}} + s_{\frac{n}{2}+1}}{2}, & \text{if } n \text{ is even}. \end{cases} $$
- Quartiles:
$$ Q_1 = s_{(\lfloor 0.25n \rfloor)}, \quad Q_3 = s_{(\lfloor 0.75n \rfloor)} $$
- IQR:
$$ \text{IQR} = Q_3 - Q_1 $$
- Threshold:
$$ T_{\text{median\_iqr}} = M + (\text{IQR} \times \text{factor}) $$
Total Silence Above Threshold
Once the threshold
$$T$$
either
$$T_{\text{std}}$$
or
$$T_{\text{median\_iqr}}$$
is defined, we sum only those silence durations that meet or exceed this threshold:
$$ \text{TotalSilence} = \sum_{i=1}^{n} s_i \cdot \mathbf{1}(s_i \geq T) $$
where $$\mathbf{1}(s_i \geq T)$$ is an indicator function defined as:
$$ \mathbf{1}(s_i \geq T) = \begin{cases} 1 & \text{if } s_i \geq T \ 0 & \text{otherwise} \end{cases} $$
Summary:
- Identify the silence durations:
$$ S = {s_1, s_2, \ldots, s_n} $$
- Determine the threshold using either:
Standard deviation-based:
$$ T = \sigma \cdot \text{factor} $$
Median+IQR-based:
$$ T = M + (\text{IQR} \cdot \text{factor}) $$
- Compute the total silence above this threshold:
$$ \text{TotalSilence} = \sum_{i=1}^{n} s_i \cdot \mathbf{1}(s_i \geq T) $$
- [x] Speech Enhancement
- [x] Sentiment Analysis
- [x] Profanity Word Detection
- [x] Summary
- [x] Conflict Detection
- [x] Topic Detection
sudo apt update -y && sudo apt upgrade -y
sudo apt install ffmpeg -y
sudo apt install -y ffmpeg build-essential g++
git clone https://github.com/bunyaminergen/Callytics
cd Callytics
conda env create -f environment.yaml
conda activate Callytics
.env
file sample:
# CREDENTIALS
# OPENAI
OPENAI_API_KEY=
# HUGGINGFACE
HUGGINGFACE_TOKEN=
# AZURE OPENAI
AZURE_OPENAI_API_KEY=
AZURE_OPENAI_API_BASE=
AZURE_OPENAI_API_VERSION=
# DATABASE
DB_NAME=
DB_USER=
DB_PASSWORD=
DB_HOST=
DB_PORT=
DB_URL=
In this section, an example database
and tables
are provided. It is a well-structured
and simple design
. If you
create the tables
and columns in the same structure in your remote database, you will not encounter errors in the code. However, if you
want to change the database structure, you will also need to refactor the code.
Note: Refer to the Database Structure section for the database schema and tables.
sqlite3 .db/Callytics.sqlite < src/db/sql/Schema.sql
In this section, it is explained how to install Grafana
on your local
environment. Since Grafana is a third-party
open-source monitoring application, you must handle its installation yourself and connect your database. Of course, you
can also use it with Granafa Cloud
instead of local
environment.
sudo apt update -y && sudo apt upgrade -y
sudo apt install -y apt-transport-https software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt install -y grafana
sudo systemctl start grafana-server
sudo systemctl enable grafana-server
sudo systemctl daemon-reload
http://localhost:3000
SQLite Plugin
sudo grafana-cli plugins install frser-sqlite-datasource
sudo systemctl restart grafana-server
sudo systemctl daemon-reload
.
├── automation
│ └── service
│ └── callytics.service
├── config
│ ├── config.yaml
│ ├── nemo
│ │ └── diar_infer_telephonic.yaml
│ └── prompt.yaml
├── .data
│ ├── example
│ │ └── LogisticsCallCenterConversation.mp3
│ └── input
├── .db
│ └── Callytics.sqlite
├── .docs
│ ├── documentation
│ │ ├── CONTRIBUTING.md
│ │ └── RESOURCES.md
│ └── img
│ ├── Callytics.drawio
│ ├── Callytics.gif
│ ├── CallyticsIcon.png
│ ├── Callytics.png
│ ├── Callytics.svg
│ └── database.png
├── .env
├── environment.yaml
├── .gitattributes
├── .github
│ └── CODEOWNERS
├── .gitignore
├── LICENSE
├── main.py
├── README.md
├── requirements.txt
└── src
├── audio
│ ├── alignment.py
│ ├── analysis.py
│ ├── effect.py
│ ├── error.py
│ ├── io.py
│ ├── metrics.py
│ ├── preprocessing.py
│ ├── processing.py
│ └── utils.py
├── db
│ ├── manager.py
│ └── sql
│ ├── AudioPropertiesInsert.sql
│ ├── Schema.sql
│ ├── TopicFetch.sql
│ ├── TopicInsert.sql
│ └── UtteranceInsert.sql
├── text
│ ├── llm.py
│ ├── model.py
│ ├── prompt.py
│ └── utils.py
└── utils
└── utils.py
19 directories, 43 files
- [ ] Speech Emotion Recognition: Develop a model to automatically detect emotions from speech data.
- [ ] New Forced Alignment Model: Train a forced alignment model from scratch.
- [ ] New Vocal Separation Model: Train a vocal separation model from scratch.
- [ ] Unit Tests: Add a comprehensive unit testing script to validate functionality.
- [ ] Logging Logic: Implement a more comprehensive and structured logging mechanism.
- [ ] Warnings: Add meaningful and detailed warning messages for better user guidance.
- [ ] Real-Time Analysis: Enable real-time analysis capabilities within the system.
- [ ] Dockerization: Containerize the repository to ensure seamless deployment and environment consistency.
- [ ] New Transcription Models: Integrate and test new transcription models suchas AIOLA’s Multi-Head Speech Recognition Model.
- [ ] Noise Reduction Model: Identify, test, and integrate a deep learning-based noise reduction model. Consider existing models like Facebook Research Denoiser, Noise2Noise, Audio Denoiser CNN. Write test scripts for evaluation, and if necessary, train a new model for optimal performance.
- [ ] Detect CSR's identity via Voice Recognition/Identification instead of Diarization and LLM.
- [ ] Transform the code structure into a pipeline for better modularity and scalability.
- [ ] Publish the repository as a Python package on PyPI for wider distribution.
- [ ] Convert the repository into a Linux package to support Linux-based systems.
- [ ] Implement a two-step processing workflow: perform diarization (speaker segmentation) first, then apply * transcription* for each identified speaker separately. This approach can improve transcription accuracy by leveraging speaker separation.
- [ ] Enable parallel processing for tasks such as diarization, transcription, and model inference to improve overall system performance and reduce processing time.
- [ ] Explore using Docker Compose for multi-container orchestration if required.
- [ ] Upload the models and relevant resources to Hugging Face for easier access, sharing, and community collaboration.
- [ ] Consider writing a Command Line Interface (CLI) to simplify user interaction and improve usability.
- [ ] Test the ability to use different language models (LLMs) for specific tasks. For instance, using BERT for profanity detection. Evaluate their performance and suitability for different use cases as a feature.
@software{ Callytics,
author = {Bunyamin Ergen},
title = {{Callytics}},
year = {2024},
month = {12},
url = {https://github.com/bunyaminergen/Callytics},
version = {v1.1.0},
}
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Callytics
Similar Open Source Tools

Callytics
Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers. By processing both the audio and text of each call, it provides insights such as sentiment analysis, topic detection, conflict detection, profanity word detection, and summary. These cutting-edge techniques help businesses optimize customer interactions, identify areas for improvement, and enhance overall service quality. When an audio file is placed in the .data/input directory, the entire pipeline automatically starts running, and the resulting data is inserted into the database. This is only a v1.1.0 version; many new features will be added, models will be fine-tuned or trained from scratch, and various optimization efforts will be applied.

xtuner
XTuner is an efficient, flexible, and full-featured toolkit for fine-tuning large models. It supports various LLMs (InternLM, Mixtral-8x7B, Llama 2, ChatGLM, Qwen, Baichuan, ...), VLMs (LLaVA), and various training algorithms (QLoRA, LoRA, full-parameter fine-tune). XTuner also provides tools for chatting with pretrained / fine-tuned LLMs and deploying fine-tuned LLMs with any other framework, such as LMDeploy.

human
AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition, Body Segmentation

asktube
AskTube is an AI-powered YouTube video summarizer and QA assistant that utilizes Retrieval Augmented Generation (RAG) technology. It offers a comprehensive solution with Q&A functionality and aims to provide a user-friendly experience for local machine usage. The project integrates various technologies including Python, JS, Sanic, Peewee, Pytubefix, Sentence Transformers, Sqlite, Chroma, and NuxtJs/DaisyUI. AskTube supports multiple providers for analysis, AI services, and speech-to-text conversion. The tool is designed to extract data from YouTube URLs, store embedding chapter subtitles, and facilitate interactive Q&A sessions with enriched questions. It is not intended for production use but rather for end-users on their local machines.

jan
Jan is an open-source ChatGPT alternative that runs 100% offline on your computer. It supports universal architectures, including Nvidia GPUs, Apple M-series, Apple Intel, Linux Debian, and Windows x64. Jan is currently in development, so expect breaking changes and bugs. It is lightweight and embeddable, and can be used on its own within your own projects.

swift-ocr-llm-powered-pdf-to-markdown
Swift OCR is a powerful tool for extracting text from PDF files using OpenAI's GPT-4 Turbo with Vision model. It offers flexible input options, advanced OCR processing, performance optimizations, structured output, robust error handling, and scalable architecture. The tool ensures accurate text extraction, resilience against failures, and efficient handling of multiple requests.

TrustEval-toolkit
TrustEval-toolkit is a dynamic and comprehensive framework for evaluating the trustworthiness of Generative Foundation Models (GenFMs) across dimensions such as safety, fairness, robustness, privacy, and more. It offers features like dynamic dataset generation, multi-model compatibility, customizable metrics, metadata-driven pipelines, comprehensive evaluation dimensions, optimized inference, and detailed reports.

WebAI-to-API
This project implements a web API that offers a unified interface to Google Gemini and Claude 3. It provides a self-hosted, lightweight, and scalable solution for accessing these AI models through a streaming API. The API supports both Claude and Gemini models, allowing users to interact with them in real-time. The project includes a user-friendly web UI for configuration and documentation, making it easy to get started and explore the capabilities of the API.

ComfyUI-Ollama-Describer
ComfyUI-Ollama-Describer is an extension for ComfyUI that enables the use of LLM models provided by Ollama, such as Gemma, Llava (multimodal), Llama2, Llama3, or Mistral. It requires the Ollama library for interacting with large-scale language models, supporting GPUs using CUDA and AMD GPUs on Windows, Linux, and Mac. The extension allows users to run Ollama through Docker and utilize NVIDIA GPUs for faster processing. It provides nodes for image description, text description, image captioning, and text transformation, with various customizable parameters for model selection, API communication, response generation, and model memory management.

WebMasterLog
WebMasterLog is a comprehensive repository showcasing various web development projects built with front-end and back-end technologies. It highlights interactive user interfaces, dynamic web applications, and a spectrum of web development solutions. The repository encourages contributions in areas such as adding new projects, improving existing projects, updating documentation, fixing bugs, implementing responsive design, enhancing code readability, and optimizing project functionalities. Contributors are guided to follow specific guidelines for project submissions, including directory naming conventions, README file inclusion, project screenshots, and commit practices. Pull requests are reviewed based on criteria such as proper PR template completion, originality of work, code comments for clarity, and sharing screenshots for frontend updates. The repository also participates in various open-source programs like JWOC, GSSoC, Hacktoberfest, KWOC, 24 Pull Requests, IWOC, SWOC, and DWOC, welcoming valuable contributors.

cog
Cog is an open-source tool that lets you package machine learning models in a standard, production-ready container. You can deploy your packaged model to your own infrastructure, or to Replicate.

polyfire-js
Polyfire is an all-in-one managed backend for AI apps that allows users to build AI apps directly from the frontend, eliminating the need for a separate backend. It simplifies the process by providing most backend services in just a few lines of code. With Polyfire, users can easily create chatbots, transcribe audio files to text, generate simple text, create a long-term memory, and generate images with Dall-E. The tool also offers starter guides and tutorials to help users get started quickly and efficiently.

ort
Ort is an unofficial ONNX Runtime 1.17 wrapper for Rust based on the now inactive onnxruntime-rs. ONNX Runtime accelerates ML inference on both CPU and GPU.

lyraios
LYRAIOS (LLM-based Your Reliable AI Operating System) is an advanced AI assistant platform built with FastAPI and Streamlit, designed to serve as an operating system for AI applications. It offers core features such as AI process management, memory system, and I/O system. The platform includes built-in tools like Calculator, Web Search, Financial Analysis, File Management, and Research Tools. It also provides specialized assistant teams for Python and research tasks. LYRAIOS is built on a technical architecture comprising FastAPI backend, Streamlit frontend, Vector Database, PostgreSQL storage, and Docker support. It offers features like knowledge management, process control, and security & access control. The roadmap includes enhancements in core platform, AI process management, memory system, tools & integrations, security & access control, open protocol architecture, multi-agent collaboration, and cross-platform support.

farfalle
Farfalle is an open-source AI-powered search engine that allows users to run their own local LLM or utilize the cloud. It provides a tech stack including Next.js for frontend, FastAPI for backend, Tavily for search API, Logfire for logging, and Redis for rate limiting. Users can get started by setting up prerequisites like Docker and Ollama, and obtaining API keys for Tavily, OpenAI, and Groq. The tool supports models like llama3, mistral, and gemma. Users can clone the repository, set environment variables, run containers using Docker Compose, and deploy the backend and frontend using services like Render and Vercel.
For similar tasks

Callytics
Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers. By processing both the audio and text of each call, it provides insights such as sentiment analysis, topic detection, conflict detection, profanity word detection, and summary. These cutting-edge techniques help businesses optimize customer interactions, identify areas for improvement, and enhance overall service quality. When an audio file is placed in the .data/input directory, the entire pipeline automatically starts running, and the resulting data is inserted into the database. This is only a v1.1.0 version; many new features will be added, models will be fine-tuned or trained from scratch, and various optimization efforts will be applied.

TME-AIX
The TME-AIX repository is a collaborative workspace dedicated to exploring Telco Media Entertainment use-cases using open source AI capabilities and datasets. It focuses on projects like Revenue Assurance, Service Assurance Predictions, 5G Network Fault Predictions, Sustainability, SecOps-AI, SmartGrid, IoT Security, Customer Relation Management, Anomaly Detection, Starlink Quality Predictions, and NoC AI Augmentation for OSS.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.