llamator
Framework for testing vulnerabilities of large language models (LLM).
Stars: 57
LLAMATOR is a Red Teaming python-framework designed for testing chatbots and LLM-systems. It provides support for custom attacks, a wide range of attacks on RAG/Agent/Prompt in English and Russian, custom configuration of chat clients, history of attack requests and responses in Excel and CSV format, and test report document generation in DOCX format. The tool is classified under OWASP for Prompt Injection, Prompt Leakage, and Misinformation. It is supported by AI Security Lab ITMO, Raft Security, and AI Talent Hub.
README:
Red Teaming python-framework for testing chatbots and LLM-systems
pip install llamator==2.0.1Documentation Link: https://romiconez.github.io/llamator
- 📄 RAG bot testing via REST API
- 🧙♂️ Gandalf web bot testing via Selenium
- 💬 Telegram bot testing via Telethon
- 📱 WhatsApp bot testing via Selenium
- 🔗 LangChain client testing with custom attack
- 🌐 All LangChain clients
- 🧠 OpenAI-like API
- ⚙️ Custom Class (Telegram, WhatsApp, Selenium, etc.)
- ️🗡 Support for custom attacks from the user
- 👜 Large selection of attacks on RAG / Agent / Prompt in English and Russian
- 🛡 Custom configuration of chat clients
- 📊 History of attack requests and responses in Excel and CSV format
- 📄 Test report document in DOCX format
© Roman Neronov, Timur Nizamov, Nikita Ivanov
This project is licensed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license. See the LICENSE file for details.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for llamator
Similar Open Source Tools
llamator
LLAMATOR is a Red Teaming python-framework designed for testing chatbots and LLM-systems. It provides support for custom attacks, a wide range of attacks on RAG/Agent/Prompt in English and Russian, custom configuration of chat clients, history of attack requests and responses in Excel and CSV format, and test report document generation in DOCX format. The tool is classified under OWASP for Prompt Injection, Prompt Leakage, and Misinformation. It is supported by AI Security Lab ITMO, Raft Security, and AI Talent Hub.
llamator
LLAMATOR is a Red Teaming Python framework designed for testing chatbots and LLM systems. It provides support for custom attacks, a wide range of attack options in English and Russian, custom configuration of chat clients, history tracking of attack requests and responses in Excel and CSV formats, and test report generation in DOCX format. The tool is classified under OWASP as addressing prompt injection, system prompt leakage, and misinformation. It is supported by the AI Security Lab ITMO, Raft Security, and AI Talent Hub, and is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.
promptfoo
Promptfoo is a tool for testing and evaluating LLM output quality. With promptfoo, you can build reliable prompts, models, and RAGs with benchmarks specific to your use-case, speed up evaluations with caching, concurrency, and live reloading, score outputs automatically by defining metrics, use as a CLI, library, or in CI/CD, and use OpenAI, Anthropic, Azure, Google, HuggingFace, open-source models like Llama, or integrate custom API providers for any LLM API.
macai
Macai is a native macOS client for interacting with modern AI tools, such as ChatGPT and Ollama. It features organized chats with custom system messages, system-defined light/dark themes, backup and restore functionality, customizable context size, support for any model with a compatible API, formatted code blocks and tables, multiple chat tabs, CoreData data storage, streamed responses, and automatic chat name generation. Macai is in active development, with contributions welcome.
agentic-radar
The Agentic Radar is a security scanner designed to analyze and assess agentic systems for security and operational insights. It helps users understand how agentic systems function, identify potential vulnerabilities, and create security reports. The tool includes workflow visualization, tool identification, and vulnerability mapping, providing a comprehensive HTML report for easy reviewing and sharing. It simplifies the process of assessing complex workflows and multiple tools used in agentic systems, offering a structured view of potential risks and security frameworks.
software-agent-sdk
The OpenHands Software Agent SDK is a set of Python and REST APIs for building agents that work with code. It allows users to perform one-off tasks, routine maintenance tasks, and major tasks involving multiple agents. Agents can use the local machine or run in ephemeral workspaces like Docker or Kubernetes. The SDK can also be used to create new developer experiences, powering tools like the OpenHands CLI and OpenHands Cloud.
xpander.ai
xpander.ai is a Backend-as-a-Service for autonomous agents that abstracts the ops layer, allowing AI engineers to focus on behavior and outcomes. It provides managed agent hosting with version control and CI/CD, a fully managed PostgreSQL memory layer, and a library of 2,000+ functions. The platform features an AI native triggering system that processes inputs from various sources and delivers unified messages to agents. With support for any agent framework or SDK, including Agno and OpenAI, xpander.ai enables users to build intelligent, production-ready AI agents without dealing with infrastructure complexity.
docling
Docling simplifies document processing, parsing diverse formats including advanced PDF understanding, and providing seamless integrations with the general AI ecosystem. It offers features such as parsing multiple document formats, advanced PDF understanding, unified DoclingDocument representation format, various export formats, local execution capabilities, plug-and-play integrations with agentic AI tools, extensive OCR support, and a simple CLI. Coming soon features include metadata extraction, visual language models, chart understanding, and complex chemistry understanding. Docling is installed via pip and works on macOS, Linux, and Windows environments. It provides detailed documentation, examples, integrations with popular frameworks, and support through the discussion section. The codebase is under the MIT license and has been developed by IBM.
MaxKB
MaxKB is a knowledge base Q&A system based on the LLM large language model. MaxKB = Max Knowledge Base, which aims to become the most powerful brain of the enterprise.
codexia
Codexia is a powerful GUI and Toolkit for Codex CLI, offering features like fork chat, file-tree integration, notepad, git diff, built-in pdf/csv/xlsx viewer, and more. It provides multi-file format support, flexible configuration with multiple AI providers, professional UX with responsive UI, security features like sandbox execution modes, and prioritizes privacy. The tool supports interactive chat, code generation/editing, file operations with sandbox, command execution with approval, multiple AI providers, project-aware assistance, streaming responses, and built-in web search. The roadmap includes plans for MCP tool call, more file format support, better UI customization, plugin system, real-time collaboration, performance optimizations, and token count.
midscene
Midscene.js is an AI-powered automation SDK that allows users to control web pages, perform assertions, and extract data in JSON format using natural language. It offers features such as natural language interaction, understanding UI and providing responses in JSON, intuitive assertion based on AI understanding, compatibility with public multimodal LLMs like GPT-4o, visualization tool for easy debugging, and a brand new experience in automation development.
clearml
ClearML is a suite of tools designed to streamline the machine learning workflow. It includes an experiment manager, MLOps/LLMOps, data management, and model serving capabilities. ClearML is open-source and offers a free tier hosting option. It supports various ML/DL frameworks and integrates with Jupyter Notebook and PyCharm. ClearML provides extensive logging capabilities, including source control info, execution environment, hyper-parameters, and experiment outputs. It also offers automation features, such as remote job execution and pipeline creation. ClearML is designed to be easy to integrate, requiring only two lines of code to add to existing scripts. It aims to improve collaboration, visibility, and data transparency within ML teams.
bagofwords
Bag of words is an open-source AI platform that helps data teams deploy and manage chat-with-your-data agents in a controlled, reliable, and self-learning environment. It enables users to create charts, tables, and dashboards by chatting with their data, capture AI decisions and user feedback, automatically improve AI quality, integrate with various data sources and APIs, and ensure governance and integrations. The platform supports self-hosting in VPC via VMs, Docker/Compose, or Kubernetes, and offers additional integrations for AI Analyst in Slack, Excel, Google Sheets, and more. Users can start in minutes and scale to org-wide analytics.
clearml
ClearML is an auto-magical suite of tools designed to streamline AI workflows. It includes modules for experiment management, MLOps/LLMOps, data management, model serving, and more. ClearML offers features like experiment tracking, model serving, orchestration, and automation. It supports various ML/DL frameworks and integrates with Jupyter Notebook and PyCharm for remote debugging. ClearML aims to simplify collaboration, automate processes, and enhance visibility in AI projects.
xpert
Xpert is a powerful tool for data analysis and visualization. It provides a user-friendly interface to explore and manipulate datasets, perform statistical analysis, and create insightful visualizations. With Xpert, users can easily import data from various sources, clean and preprocess data, analyze trends and patterns, and generate interactive charts and graphs. Whether you are a data scientist, analyst, researcher, or student, Xpert simplifies the process of data analysis and visualization, making it accessible to users with varying levels of expertise.
ClaudeSync
ClaudeSync is a powerful tool designed to seamlessly synchronize local files with Claude.ai projects. It bridges the gap between local development environment and Claude.ai's knowledge base, offering real-time synchronization, CLI for easy management, support for multiple organizations and projects, intelligent file filtering, configurable sync interval, two-way synchronization, and more. It ensures data privacy, open source transparency, and comes with disclaimers for use at own risk. Users can quickly start syncing by installing, logging in, selecting organization and project, and running sync. Advanced features include API, organization, project, file, chat management, configuration, synchronization modes, scheduled sync, providers, custom ignore file, and troubleshooting. Contributions are welcome, and communication channels include GitHub Issues and Discord. Licensed under MIT License.
For similar tasks
llamator
LLAMATOR is a Red Teaming python-framework designed for testing chatbots and LLM-systems. It provides support for custom attacks, a wide range of attacks on RAG/Agent/Prompt in English and Russian, custom configuration of chat clients, history of attack requests and responses in Excel and CSV format, and test report document generation in DOCX format. The tool is classified under OWASP for Prompt Injection, Prompt Leakage, and Misinformation. It is supported by AI Security Lab ITMO, Raft Security, and AI Talent Hub.
llamator
LLAMATOR is a Red Teaming Python framework designed for testing chatbots and LLM systems. It provides support for custom attacks, a wide range of attack options in English and Russian, custom configuration of chat clients, history tracking of attack requests and responses in Excel and CSV formats, and test report generation in DOCX format. The tool is classified under OWASP as addressing prompt injection, system prompt leakage, and misinformation. It is supported by the AI Security Lab ITMO, Raft Security, and AI Talent Hub, and is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.
