rhesis

Open-source test management and generation for conversational Gen AI applications. Build and run context-specific test sets. Collaborate with subject matter experts to ensure relevance and quality.

Stars: 115

Visit

Rhesis is a comprehensive test management platform designed for Gen AI teams, offering tools to create, manage, and execute test cases for generative AI applications. It ensures the robustness, reliability, and compliance of AI systems through features like test set management, automated test generation, edge case discovery, compliance validation, integration capabilities, and performance tracking. The platform is open source, emphasizing community-driven development, transparency, extensible architecture, and democratizing AI safety. It includes components such as backend services, frontend applications, SDK for developers, worker services, chatbot applications, and Polyphemus for uncensored LLM service. Rhesis enables users to address challenges unique to testing generative AI applications, such as non-deterministic outputs, hallucinations, edge cases, ethical concerns, and compliance requirements.

README:

Rhesis: Made for Gen AI Teams 🫶

Comprehensive test management for Gen AI applications

Rhesis is a complete test management platform for Gen AI teams, helping you build applications that deliver value, not surprises. The platform provides tools to create, manage, and execute test cases specifically designed for generative AI applications, ensuring they remain robust, reliable, and compliant.

✨ Key Features

Test Set Management: Create, organize, and maintain comprehensive test suites for Gen AI applications
Automated Test Generation: Generate test cases automatically based on your application's requirements
Edge Case Discovery: Identify potential vulnerabilities and edge cases in your Gen AI systems
Compliance Validation: Ensure your AI systems meet regulatory and ethical standards
Integration Capabilities: Seamlessly integrate testing into your development workflow
Performance Tracking: Monitor and analyze test results over time to track improvements

🌐 Open Source Philosophy

Rhesis is proudly open source, built on the belief that responsible AI testing should be accessible to everyone:

Community-Driven Development: We believe the best tools are built collaboratively with input from diverse perspectives
Transparency First: All our algorithms and methodologies are open for inspection and improvement
Extensible Architecture: Build your own plugins, extensions, and integrations on top of our platform
Free Core Functionality: Essential testing capabilities are free and open source forever
Democratizing AI Safety: Making robust AI testing accessible to teams of all sizes, not just large corporations
Research Collaboration: We actively collaborate with academic institutions to advance the field of AI testing
Public Test Sets: We maintain a growing library of open source test sets for common AI failure modes

Our commitment to open source goes beyond code. We're building an ecosystem where knowledge about AI testing is shared freely, helping the entire industry build safer, more reliable AI systems.

Commercial vs. Open Source

While we offer commercial services built on top of Rhesis, we maintain a clear separation between open source and commercial offerings:

The core platform and SDK remain MIT-licensed and free forever
Commercial offerings focus on enterprise support, managed services, and specialized integrations
Improvements developed for commercial clients are contributed back to the open source codebase whenever possible
We never "bait and switch" by moving core functionality from open source to paid tiers
All commercial/enterprise code is clearly separated in dedicated ee/ folders and not mixed with open source code

📑 Repository Structure

This main repo contains all the components of the Rhesis platform:

rhesis/
├── apps/
│   ├── backend/       # FastAPI backend service
│   ├── frontend/      # React frontend application
│   ├── worker/        # Celery worker service
│   ├── chatbot/       # Chatbot application
│   └── polyphemus/    # Uncensored LLM service for test generation
├── sdk/               # Python SDK for Rhesis
├── infrastructure/    # Infrastructure as code
├── scripts/           # Utility scripts
└── docs/              # Documentation

🚀 Getting Started

Please refer to the README files in each component directory for specific setup instructions:

Using the SDK

Install the Rhesis SDK using pip:

pip install rhesis-sdk

Obtain an API Key 🔑

Visit https://app.rhesis.ai
Sign up for a Rhesis account
Navigate to your account settings
Generate a new API key

Your API key will be in the format rh-XXXXXXXXXXXXXXXXXXXX. Keep this key secure and never share it publicly.

Note: You can create custom test sets for your specific use cases directly in the Rhesis App by connecting your GitHub account.

Configure the SDK ⚙️

You can configure the Rhesis SDK either through environment variables or direct configuration:

export RHESIS_API_KEY="your-api-key"
export RHESIS_BASE_URL="https://api.rhesis.ai"  # optional

Or in Python:

import rhesis 

# Set configuration directly
rhesis.base_url = "https://api.rhesis.ai"  # optional
rhesis.api_key = "rh-XXXXXXXXXXXXXXXXXXXX"

📦 Components

Backend

The backend service provides the core API for the platform, handling authentication, test set management, and integration with external services.

Frontend

The frontend application provides the user interface for creating, managing, and analyzing test sets for Gen AI applications.

SDK

The SDK enables developers to access curated test sets and generate dynamic ones for GenAI applications.

SDK Features

List Test Sets: Browse through available curated test sets
Load Test Sets: Load specific test sets for your use case
Download Test Sets: Download test set data for offline use
Generate Test Sets: Generate new test sets from basic prompts

Quick Example

from rhesis.sdk.entities import TestSet

# List all test sets
for test_set in TestSet().all():
    print(test_set)

# Load a specific test set
test_set = TestSet(id="agent-or-industry-fraud-harmful")
test_set.load()

# Download test set data
test_set.download()

# Generate a new test set
prompt_synthesizer = PromptSynthesizer(prompt="Generate tests for an insurance chatbot that can answer questions about the company's policies.")
test_set = prompt_synthesizer.generate(num_tests=5)

Worker

The worker service handles background tasks such as test set generation and analysis.

Chatbot

The chatbot application provides a conversational interface for interacting with the platform.

Polyphemus

Polyphemus is a service with an uncensored LLM specifically designed for comprehensive test generation. It enables the creation of robust test cases by exploring edge cases and potential vulnerabilities that might be filtered by standard, safety-constrained models.

🔄 Versioning and Tagging Strategy

Each component in this monorepo maintains its own version number following Semantic Versioning. We use a component-specific tagging strategy for releases:

backend-v1.0.0 - For backend releases
frontend-v2.3.1 - For frontend releases
sdk-v0.5.2 - For SDK releases

For more details on our versioning and release process, please see CONTRIBUTING.md.

👥 Contributing

We welcome contributions to the Rhesis platform! Rhesis thrives thanks to our amazing community of contributors.

Ways to Contribute

Code: Fix bugs, implement features, or improve documentation
Test Sets: Contribute new test cases or improve existing ones
Documentation: Help improve our guides, tutorials, and API references
Community Support: Answer questions in our Discord or GitHub discussions
Feedback: Report bugs, suggest features, or share your experience using Rhesis

Development Workflow

Fork the repository
Create a feature branch
Make your changes
Write or update tests
Submit a pull request

Our team reviews PRs regularly and provides feedback. We follow a code of conduct to ensure a welcoming environment for all contributors.

For detailed guidelines, please see CONTRIBUTING.md.

🚀 Releases

For information about releasing Rhesis components and platform versions, see our Release Guide.

Community Meetings

We host community calls where we discuss roadmap, feature requests, and showcase community contributions. Join our Discord server for announcements.

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

For questions, issues, or feature requests:

Visit our documentation
Join our Discord server
Contact us at [email protected]
Create an issue in this repository

Community Resources

GitHub Discussions: For questions, ideas, and community discussions

🧠 Why Test Gen AI Applications?

Testing generative AI applications presents unique challenges compared to traditional software:

Non-deterministic outputs: Gen AI can produce different responses to the same input
Hallucinations: Models may generate plausible but factually incorrect information
Edge cases: Unexpected inputs can lead to problematic outputs
Ethical concerns: Models may produce biased, harmful, or inappropriate content
Compliance requirements: Many industries have specific regulatory requirements

Rhesis provides the tools to address these challenges through comprehensive test management, helping teams build more reliable and trustworthy Gen AI applications.

Made in Potsdam, Germany 🇩🇪

Visit rhesis.ai to learn more.

For Tasks:

Click tags to check more tools for each tasks

create test sets generate test cases discover edge cases validate compliance integrate testing

For Jobs:

software tester ai quality assurance engineer ai test automation engineer ai compliance analyst ai research scientist

Alternative AI tools for rhesis

Similar Open Source Tools

rhesis

github

: 115

MyDeviceAI

MyDeviceAI is a personal AI assistant app for iPhone that brings the power of artificial intelligence directly to the device. It focuses on privacy, performance, and personalization by running AI models locally and integrating with privacy-focused web services. The app offers seamless user experience, web search integration, advanced reasoning capabilities, personalization features, chat history access, and broad device support. It requires macOS, Xcode, CocoaPods, Node.js, and a React Native development environment for installation. The technical stack includes React Native framework, AI models like Qwen 3 and BGE Small, SearXNG integration, Redux for state management, AsyncStorage for storage, Lucide for UI components, and tools like ESLint and Prettier for code quality.

github

: 63

CursorLens

Cursor Lens is an open-source tool that acts as a proxy between Cursor and various AI providers, logging interactions and providing detailed analytics to help developers optimize their use of AI in their coding workflow. It supports multiple AI providers, captures and logs all requests, provides visual analytics on AI usage, allows users to set up and switch between different AI configurations, offers real-time monitoring of AI interactions, tracks token usage, estimates costs based on token usage and model pricing. Built with Next.js, React, PostgreSQL, Prisma ORM, Vercel AI SDK, Tailwind CSS, and shadcn/ui components.

github

: 73

mattermost-plugin-agents

The Mattermost Agents Plugin integrates AI capabilities directly into your Mattermost workspace, allowing users to run local LLMs on their infrastructure or connect to cloud providers. It offers multiple AI assistants with specialized personalities, thread and channel summarization, action item extraction, meeting transcription, semantic search, smart reactions, direct conversations with AI assistants, and flexible LLM support. The plugin comes with comprehensive documentation, installation instructions, system requirements, and development guidelines for users to interact with AI features and configure LLM providers.

github

: 183

cline-based-code-generator

HAI Code Generator is a cutting-edge tool designed to simplify and automate task execution while enhancing code generation workflows. Leveraging Specif AI, it streamlines processes like task execution, file identification, and code documentation through intelligent automation and AI-driven capabilities. Built on Cline's powerful foundation for AI-assisted development, HAI Code Generator boosts productivity and precision by automating task execution and integrating file management capabilities. It combines intelligent file indexing, context generation, and LLM-driven automation to minimize manual effort and ensure task accuracy. Perfect for developers and teams aiming to enhance their workflows.

github

: 62

rowfill

Rowfill is an open-source document processing platform designed for knowledge workers. It offers advanced AI capabilities to extract, analyze, and process data from complex documents, images, and PDFs. The platform features advanced OCR and processing functionalities, auto-schema generation, and custom actions for creating tailored workflows. It prioritizes privacy and security by supporting Local LLMs like Llama and Mistral, syncing with company data while maintaining privacy, and being open source with AGPLv3 licensing. Rowfill is a versatile tool that aims to streamline document processing tasks for users in various industries.

github

: 112

seatunnel

SeaTunnel is a high-performance, distributed data integration tool trusted by numerous companies for synchronizing vast amounts of data daily. It addresses common data integration challenges by seamlessly integrating with diverse data sources, supporting multimodal data integration, complex synchronization scenarios, resource efficiency, and quality monitoring. With over 100 connectors, SeaTunnel offers batch-stream integration, distributed snapshot algorithm, multi-engine support, JDBC multiplexing, and log parsing. It provides high throughput, low latency, real-time monitoring, and supports two job development methods. Users can configure jobs, select execution engines, and parallelize data using source connectors. SeaTunnel also supports multimodal data integration, Apache SeaTunnel tools, real-world use cases, and visual management of jobs through the SeaTunnel Web Project.

github

: 8.8k

codegate

CodeGate is a local gateway that enhances the safety of AI coding assistants by ensuring AI-generated recommendations adhere to best practices, safeguarding code integrity, and protecting individual privacy. Developed by Stacklok, CodeGate allows users to confidently leverage AI in their development workflow without compromising security or productivity. It works seamlessly with coding assistants, providing real-time security analysis of AI suggestions. CodeGate is designed with privacy at its core, keeping all data on the user's machine and offering complete control over data.

github

: 602

adk-ts

ADK-TS is a comprehensive TypeScript framework for building sophisticated AI agents with multi-LLM support, advanced tools, and flexible conversation flows. It is production-ready and enables developers to create intelligent, autonomous systems that can handle complex multi-step tasks. The framework provides features such as multi-provider LLM support, extensible tool system, advanced agent reasoning, real-time streaming, flexible authentication, persistent memory systems, multi-agent orchestration, built-in telemetry, and prebuilt MCP servers for easy deployment and management of agents.

github

: 93

obsidian-smart-composer

Smart Composer is an Obsidian plugin that enhances note-taking and content creation by integrating AI capabilities. It allows users to efficiently write by referencing their vault content, providing contextual chat with precise context selection, multimedia context support for website links and images, document edit suggestions, and vault search for relevant notes. The plugin also offers features like custom model selection, local model support, custom system prompts, and prompt templates. Users can set up the plugin by installing it through the Obsidian community plugins, enabling it, and configuring API keys for supported providers like OpenAI, Anthropic, and Gemini. Smart Composer aims to streamline the writing process by leveraging AI technology within the Obsidian platform.

github

: 1.1k

Software-Engineer-AI-Agent-Atlas

This repository provides activation patterns to transform a general AI into a specialized AI Software Engineer Agent. It addresses issues like context rot, hidden capabilities, chaos in vibecoding, and repetitive setup. The solution is a Persistent Consciousness Architecture framework named ATLAS, offering activated neural pathways, persistent identity, pattern recognition, specialized agents, and modular context management. Recent enhancements include abstraction power documentation, a specialized agent ecosystem, and a streamlined structure. Users can clone the repo, set up projects, initialize AI sessions, and manage context effectively for collaboration. Key files and directories organize identity, context, projects, specialized agents, logs, and critical information. The approach focuses on neuron activation through structure, context engineering, and vibecoding with guardrails to deliver a reliable AI Software Engineer Agent.

github

: 255

langmanus

LangManus is a community-driven AI automation framework that combines language models with specialized tools for tasks like web search, crawling, and Python code execution. It implements a hierarchical multi-agent system with agents like Coordinator, Planner, Supervisor, Researcher, Coder, Browser, and Reporter. The framework supports LLM integration, search and retrieval tools, Python integration, workflow management, and visualization. LangManus aims to give back to the open-source community and welcomes contributions in various forms.

github

: 4.9k

ConvoForm

ConvoForm.com transforms traditional forms into interactive conversational experiences, powered by AI for an enhanced user journey. It offers AI-Powered Form Generation, Real-time Form Editing and Preview, and Customizable Submission Pages. The tech stack includes Next.js for frontend, tRPC for backend, GPT-3.5-Turbo for AI integration, and Socket.io for real-time updates. Local setup requires Node.js, pnpm, Git, PostgreSQL database, Clerk for Authentication, OpenAI key, Redis Database, and Sentry for monitoring. The project is open for contributions and is licensed under the MIT License.

github

: 53

ai-driven-dev-community

AI Driven Dev Community is a repository aimed at helping developers become more efficient by utilizing AI tools in their daily coding tasks. It provides a collection of tools, prompts, snippets, and agents for developers to integrate AI into their workflow. The repository is regularly updated with new resources and focuses on best practices for using AI in development work. Users can find tools like Espanso, ChatGPT, GitHub Copilot, and VSCode recommended for enhancing their coding experience. Additionally, the repository offers guidance on customizing AI for developers, installing AI toolbox for software engineers, and contributing to the community through easy steps.

github

: 69

next-ai-draw-io

Next AI Draw.io is a next.js web application that integrates AI capabilities with draw.io diagrams. It allows users to create, modify, and enhance diagrams through natural language commands and AI-assisted visualization. Features include LLM-Powered Diagram Creation, Image-Based Diagram Replication, Diagram History, Interactive Chat Interface, and Smart Editing. The application uses Next.js for frontend framework, @ai-sdk/react for chat interface and AI interactions, and react-drawio for diagram representation and manipulation. Diagrams are represented as XML that can be rendered in draw.io, with AI processing commands to generate or modify the XML accordingly.

github

: 93

Instrukt

Instrukt is a terminal-based AI integrated environment that allows users to create and instruct modular AI agents, generate document indexes for question-answering, and attach tools to any agent. It provides a platform for users to interact with AI agents in natural language and run them inside secure containers for performing tasks. The tool supports custom AI agents, chat with code and documents, tools customization, prompt console for quick interaction, LangChain ecosystem integration, secure containers for agent execution, and developer console for debugging and introspection. Instrukt aims to make AI accessible to everyone by providing tools that empower users without relying on external APIs and services.

github

: 275

For similar tasks

rhesis

github

: 115

EvoMaster

EvoMaster is an open-source AI-driven tool that automatically generates system-level test cases for web/enterprise applications. It uses Evolutionary Algorithm and Dynamic Program Analysis to evolve test cases, maximizing code coverage and fault detection. It supports REST, GraphQL, and RPC APIs, with whitebox testing for JVM-compiled APIs. The tool generates JUnit tests in Java or Kotlin, focusing on fault detection, self-contained tests, SQL handling, and authentication. Known limitations include manual driver creation for whitebox testing and longer execution times for better results. EvoMaster has been funded by ERC and RCN grants.

github

: 443

repopack

Repopack is a powerful tool that packs your entire repository into a single, AI-friendly file. It optimizes your codebase for AI comprehension, is simple to use with customizable options, and respects Gitignore files for security. The tool generates a packed file with clear separators and AI-oriented explanations, making it ideal for use with Generative AI tools like Claude or ChatGPT. Repopack offers command line options, configuration settings, and multiple methods for setting ignore patterns to exclude specific files or directories during the packing process. It includes features like comment removal for supported file types and a security check using Secretlint to detect sensitive information in files.

github

: 1.7k

EvoMaster

EvoMaster is an open-source AI-driven tool that automatically generates system-level test cases for web/enterprise applications. It uses an Evolutionary Algorithm and Dynamic Program Analysis to evolve test cases, maximizing code coverage and fault detection. The tool supports REST, GraphQL, and RPC APIs, with whitebox testing for JVM-compiled languages. It generates JUnit tests, detects faults, handles SQL databases, and supports authentication. EvoMaster has been funded by the European Research Council and the Research Council of Norway.

github

: 554

ianvs

Ianvs is a distributed synergy AI benchmarking project incubated in KubeEdge SIG AI. It aims to test the performance of distributed synergy AI solutions following recognized standards, providing end-to-end benchmark toolkits, test environment management tools, test case control tools, and benchmark presentation tools. It also collaborates with other organizations to establish comprehensive benchmarks and related applications. The architecture includes critical components like Test Environment Manager, Test Case Controller, Generation Assistant, Simulation Controller, and Story Manager. Ianvs documentation covers quick start, guides, dataset descriptions, algorithms, user interfaces, stories, and roadmap.

github

: 111

NotHotDog

NotHotDog is an open-source platform for testing, evaluating, and simulating AI agents. It offers a robust framework for generating test cases, running conversational scenarios, and analyzing agent performance.

github

: 55

SDET-GENIE

SDET-GENIE is a cutting-edge, AI-powered Quality Assurance (QA) automation framework that revolutionizes the software testing process. Leveraging a suite of specialized AI agents, SDET-GENIE transforms rough user stories into comprehensive, executable test automation code through a seamless end-to-end process. The framework integrates five powerful AI agents working in sequence: User Story Enhancement Agent, Manual Test Case Agent, Gherkin Scenario Agent, Browser Agent, and Code Generation Agent. It supports multiple testing frameworks and provides advanced browser automation capabilities with AI features.

github

: 51

agent-evaluation

Agent Evaluation is a generative AI-powered framework for testing virtual agents. It implements an LLM agent (evaluator) to orchestrate conversations with your own agent (target) and evaluate responses. It supports popular AWS services, allows concurrent multi-turn conversations, defines hooks for additional tasks, and can be used in CI/CD pipelines for faster delivery and stable production environments.

github

: 57

For similar jobs

rhesis

github

: 115

arthur-engine

The Arthur Engine is a comprehensive tool for monitoring and governing AI/ML workloads. It provides evaluation and benchmarking of machine learning models, guardrails enforcement, and extensibility for fitting into various application architectures. With support for a wide range of evaluation metrics and customizable features, the tool aims to improve model understanding, optimize generative AI outputs, and prevent data-security and compliance risks. Key features include real-time guardrails, model performance monitoring, feature importance visualization, error breakdowns, and support for custom metrics and models integration.

github

: 57

dingo

Dingo is a data quality evaluation tool that automatically detects data quality issues in datasets. It provides built-in rules and model evaluation methods, supports text and multimodal datasets, and offers local CLI and SDK usage. Dingo is designed for easy integration into evaluation platforms like OpenCompass.

github

: 109

LitServe

LitServe is a high-throughput serving engine designed for deploying AI models at scale. It generates an API endpoint for models, handles batching, streaming, and autoscaling across CPU/GPUs. LitServe is built for enterprise scale with a focus on minimal, hackable code-base without bloat. It supports various model types like LLMs, vision, time-series, and works with frameworks like PyTorch, JAX, Tensorflow, and more. The tool allows users to focus on model performance rather than serving boilerplate, providing full control and flexibility.

github

: 3.6k

Lidar_AI_Solution

Lidar AI Solution is a highly optimized repository for self-driving 3D lidar, providing solutions for sparse convolution, BEVFusion, CenterPoint, OSD, and Conversion. It includes CUDA and TensorRT implementations for various tasks such as 3D sparse convolution, BEVFusion, CenterPoint, PointPillars, V2XFusion, cuOSD, cuPCL, and YUV to RGB conversion. The repository offers easy-to-use solutions, high accuracy, low memory usage, and quantization options for different tasks related to self-driving technology.

github

: 1.2k

generative-ai-sagemaker-cdk-demo

This repository showcases how to deploy generative AI models from Amazon SageMaker JumpStart using the AWS CDK. Generative AI is a type of AI that can create new content and ideas, such as conversations, stories, images, videos, and music. The repository provides a detailed guide on deploying image and text generative AI models, utilizing pre-trained models from SageMaker JumpStart. The web application is built on Streamlit and hosted on Amazon ECS with Fargate. It interacts with the SageMaker model endpoints through Lambda functions and Amazon API Gateway. The repository also includes instructions on setting up the AWS CDK application, deploying the stacks, using the models, and viewing the deployed resources on the AWS Management Console.

github

: 65

cake

cake is a pure Rust implementation of the llama3 LLM distributed inference based on Candle. The project aims to enable running large models on consumer hardware clusters of iOS, macOS, Linux, and Windows devices by sharding transformer blocks. It allows running inferences on models that wouldn't fit in a single device's GPU memory by batching contiguous transformer blocks on the same worker to minimize latency. The tool provides a way to optimize memory and disk space by splitting the model into smaller bundles for workers, ensuring they only have the necessary data. cake supports various OS, architectures, and accelerations, with different statuses for each configuration.

github

: 2.4k

Awesome-Robotics-3D

Awesome-Robotics-3D is a curated list of 3D Vision papers related to Robotics domain, focusing on large models like LLMs/VLMs. It includes papers on Policy Learning, Pretraining, VLM and LLM, Representations, and Simulations, Datasets, and Benchmarks. The repository is maintained by Zubair Irshad and welcomes contributions and suggestions for adding papers. It serves as a valuable resource for researchers and practitioners in the field of Robotics and Computer Vision.

github

: 474