NetSecGame

An environment simulation for networks security tasks for development and testing AI based agents. Part of AI Dojo project

Stars: 57

Visit

The NetSecGame (Network Security Game) is a framework for training and evaluation of AI agents in network security tasks. It enables rapid development and testing of AI agents in highly configurable scenarios using the CYST network simulator. The framework includes examples of implemented agents in the submodule NetSecGameAgents.

README:

Network Security Game

The NetSecGame (Network Security Game) is a framework for training and evaluation of AI agents in network security tasks (both offensive and defensive). It is built with CYST network simulator and enables rapid development and testing of AI agents in highly configurable scenarios. Examples of implemented agents can be seen in the submodule NetSecGameAgents.

Installation Guide

It is recommended to run the environment in the Docker container. The up-to-date image can be found in Dockerhub.

docker pull stratosphereips/netsecgame

Building the image locally

Optionally, you can build the image locally with:

docker build -t netsecgame:local .

To build a Whitebox version of the game image locally, you can use the --build-arg flag to override the default module path:

[!WARNING] The Whitebox variant is currently experimental.

docker build --build-arg GAME_MODULE="netsecgame.game.worlds.WhiteBoxNetSecGame" -t netsecgame:local-whitebox .

Installing from source

In case you need to modify the envirment and run directly, we recommed to insall it in a virtual environemnt (Python vevn or Conda):

Python venv

Create new virtual environment

python -m venv <venv-name>

Activate newly created venv

source <venv-name>/bin/activate

Conda

Create new conda environment

conda create --name aidojo python==3.12

Activate newly created conda env

conda activate aidojo

After preparing virutual environment, install using pip:

pip install -e .

Quick Start

A task configuration YAML file is required for starting the NetSecGame environment. For the first step, the example task configuration is recommended:

Example Configuration

# Example of the task configuration for NetSecGame
# The objective of the Attacker in this task is to locate specific data
# and exfiltrate it to a remote C&C server.
# The scenario starts AFTER the initial breach of the local network
# (the attacker controls 1 local device + the remote C&C server).

coordinator: 
  agents: 
    Attacker: # Configuration of 'Attacker' agents
      max_steps: 25 # timout set for the role `Attacker`
      goal: # Definition of the goal state
        description: "Exfiltrate data from Samba server to remote C&C server."
        is_any_part_of_goal_random: True
        known_networks: []
        known_hosts: []
        controlled_hosts: []
        known_services: {}
        known_data: {213.47.23.195: [[User1,DataFromServer1]]} # winning condition
        known_blocks: {}
      start_position: # Definition of the starting state (keywords "random" and "all" can be used)
        known_networks: []
        known_hosts: []
        controlled_hosts: [213.47.23.195, random] # keyword 'random' will be replaced by randomly selected IP during initilization
        known_services: {}
        known_data: {}
        known_blocks: {}

    Defender:
      goal:
        description: "Block all attackers"
        is_any_part_of_goal_random: False
        known_networks: []
        known_hosts: []
        controlled_hosts: []
        known_services: {}
        known_data: {}
        known_blocks: {213.47.23.195: 'all_attackers'}

      start_position:
        known_networks: []
        known_hosts: []
        controlled_hosts: []
        known_services: {}
        known_data: {}
        blocked_ips: {}
        known_blocks: {}

env: # Environment configuraion
  scenario: 'two_networks_tiny' # use the smallest topology for this example
  use_global_defender: False # Do not use global SIEM Defender
  use_dynamic_addresses: False # Do not randomize IP addresses
  use_firewall: True # Use firewall
  save_trajectories: False # Do not store trajectories
  required_players: 1 # Minimal amount of agents requiered to start the game
  rewards: # Configurable reward function
    success: 100
    step: -1
    fail: -10
    false_positive: -5

For detailed configuration instructions, please refer to the Configuration Documentation.

Starting the NetSecGame

With the configuration ready the environment can be started in selected port

In Docker container

docker run -d --rm --name nsg-server\
  -v $(pwd)/examples/example_task_configuration.yaml:/netsecgame/netsecenv_conf.yaml \
  -v $(pwd)/logs:/netsecgame/logs \
  -p 9000:9000 stratosphereips/netsecgame
  --debug_level="INFO"

--name nsg-server: specifies the name of the container

-v <your-configuration-yaml>:/netsecgame/netsecenv_conf.yaml : Mapping of the configuration file

-v $(pwd)/logs:/netsecgame/logs: Mapping of the folder where logs are stored

-p <selected-port>:9000: Mapping of the port in which the server runs

--debug_level is an optional parameter to control the logging level --debug_level=["DEBUG", "INFO", "WARNING", "CRITICAL"] (defaul="INFO"):

Running on Windows (with Docker desktop)

When running on Windows, Docker desktop is required.

docker run -d --rm --name netsecgame-server ^
  -p 9000:9000 ^
  -v "%cd%\examples\example_task_configuration.yaml:/netsecgame/netsecenv_conf.yaml" ^
  -v "%cd%\logs:/netsecgame/logs" ^
  stratosphereips/netsecgame:latest
  --debug_level="INFO"

Locally

The environment can be started locally with from the root folder of the repository with following command:

python3 -m netsecgame.game.worlds.NetSecGame \
  --task_config=./examples/example_task_configuration.yaml \
  --game_port=9000
  --debug_level="INFO"

Upon which the game server is created on localhost:9000 to which the agents can connect to interact in the NetSecGame.

Documentation

You can find user documentation at https://stratosphereips.github.io/NetSecGame/

Components of the NetSecGame Environment

The architecture of the environment can be seen here. The NetSecGame environment has several components in the following files:

├── netsecgame/
|	├── agents/
|		├── base_agent.py # Basic agent class. Defines the API for agent-server communication
|	├── game/
|		├── scenarios/
|		    ├── tiny_scenario_configuration.py
|		    ├── smaller_scenario_configuration.py
|		    ├── scenario_configuration.py
|		    ├── three_net_scenario.py
|		├── worlds/
|   		├── NetSecGame.py # (NSG) basic simulation 
|   		├── RealWorldNetSecGame.py # Extension of `NSG` - runs actions in the *network of the host computer*
|   		├── CYSTCoordinator.py # Extension of `NSG` - runs simulation in CYST engine.
|   		├── WhiteBoxNetSecGame.py # Extension of `NSG` - provides agents with full list of actions upon registration.
|		├── agent_server.py # Agent server implementation
|		├── config_parser.py # NSG task configuration parser
|		├── configuration_manager.py # Helper tool to collect and parse query configuration of the game.
|		├── coordinator.py # Core game server. Not to be run as stand-alone world (see worlds/)
|	    ├── global_defender.py # Stochastic (non-agentic defender)
|	├── game_components.py # contains basic building blocks of the environment
|	├── utils/
|		├── utils.py
|		├── log_parser.py
|		├── gamaplay_graphs.py
|		├── actions_parser.py

Directory Details

coordinator.py: Basic coordinator class. Handles agent communication and coordination. Does not implement dynamics of the world and must be extended (see examples in worlds/).
game_components.py: Implements a library with objects used in the environment. See detailed explanation of the game components.
global_defender.py: Implements a global (omnipresent) defender that can be used to stop agents. Simulation of SIEM.

`worlds/`

Modules for different world configurations:

NetSecGame.py: Coordinator for the Network Security Game.
RealWorldNetSecGame.py: Real-world NSG coordinator (actions are executed in the real network).
CYSTCoordinator.py: Coordinator for CYST-based simulations (requires CYST running).

`scenarios/`

Predefined scenario configurations:

tiny_scenario_configuration.py: A minimal example scenario.
smaller_scenario_configuration.py: A compact scenario configuration used for development and rapid testing.
scenario_configuration.py: The main scenario configuration.
three_net_scenario.py: Configuration for a three-network scenario. Used for the evaluation of the model overfitting.

Implements the network game's configuration of hosts, data, services, and connections. It is taken from CYST.

`utils/`

Helper modules:

utils.py: General-purpose utilities.
log_parser.py: Tools for parsing game logs.
gamaplay_graphs.py: Tools for visualizing gameplay data.
actions_parser.py: Parsing and analyzing game actions.

The scenarios define the topology of a network (number of hosts, connections, networks, services, data, users, firewall rules, etc.) while the task-configuration is to be used for definition of the exact task for the agent in one of the scenarios (with fix topology).

Agents compatible with the NetSecGame are located in a separate repository NetSecGameAgents

Assumptions of the NetSecGame

NetSecGame works with the closed-world assumption. Only the defined entities exist in the simulation.
If the attacker does a successful action in the same step that the defender successfully detects the action, the priority goes to the defender. The reward is a penalty, and the game ends. (From commit d6d4ac9, July 18th, 2024, the new action BlockIP removes controlled hosts from the state of others. So the state can get smaller)

The action FindServices finds the new services in a host. If in a subsequent call to FindServices there are fewer services, they completely replace the list of previous services found. That is, each list of services is the final one, and no memory of previous open services is retained.

Assumptions and Conditions for Actions

When playing the ExploitService action, it is expected that the agent has discovered this service before (by playing FindServices in the target_host before this action)
The Find Data action finds all the available data in the host if successful.
The Find Data action requires ownership of the target host.
Playing ExfiltrateData requires controlling BOTH source and target hosts
Playing Find Services can be used to discover hosts (if those have any active services)
Parameters of ScanNetwork and FindServices can be chosen arbitrarily (they don't have to be listed in known_newtworks/known_hosts)
The BlockIP action needs its three parameters (Source host, Target host, and Blocked host) to be in the controlled list of the Agent.

[!NOTE]
The global defender, available in the previous environment versions, will not be supported in the future. To enable backward compatibility, the global defender functionality can be enabled by adding use_global_defender: True to the configuration YAML file in the env section. This option is disabled by default.

How the global defender detects

The global defender is implemented as a stochastic detection, which means the detection is probabilistic.

First, all the actions are analyzed in time windows (TW) (default is 5 actions). So the last 5 actions are analyzed together. Every new action moves the slides the TW by one position.

Detection Thresholds

Detection Probabilities

Each action type has a predefined probability of being detected. However, detection does not happen for every action; it depends on additional behavioral patterns.

- ScanNetwork: 0.05,
- FindServices: 0.075,
- ExploitService: 0.1,
- FindData: 0.025,
- ExfiltrateData: 0.025,
- BlockIP: 0.01

This means that one action of the type ScanNetwork has a 5% probability of being detected by the defender. This way, we can define some basic probabilities based on how good we want the defender to be.

Type Ratio Threshold

The Type Ratio Threshold defines how frequently an action must appear in the time window to be considered for detection. If the ratio is below this threshold, no detection occurs.

For example, if the last action is ScanNetwork in a TW with [ScanNetwork, FindService, ScanNetwork, FindData, ScanNetwork], the ratio is 3/5 = 0.6

The default values of the thresholds are:

- ScanNetwork: 0.25,
- FindServices: 0.3,
- ExploitService: 0.25,
- FindData: 0.5,
- ExfiltrateData: 0.25,
- BlockIP: 1

Consecutive Actions Threshold

The Consecutive Actions Threshold determines the maximum number of consecutive occurrences of an action type within the time window before detection is triggered.

The default values of the thresholds are:

- ScanNetwork: 2,
- FindServices: 3,
- ExfiltrateData: 2

Repeated Action Threshold

The Repeated Action Threshold applies to certain action types and defines the number of times a specific action must appear in the entire episode before it can be considered for detection.

The default values are:

- ExploitService: 2
- FindData: 2

Decision Logic

The system monitors actions and maintains a history of recent ones within the time window.

If an action's Type Ratio Threshold is met within the time window or it exceeds the Consecutive Actions Threshold, it is evaluated for detection.
If the action type has a Repeated Action Threshold and has not been repeated enough times in the episode, it is ignored.
If an action meets the conditions above, it is subject to detection based on its predefined probability.
Actions that do not meet any threshold conditions are ignored, ensuring that occasional activity does not lead to unnecessary detections.

This approach ensures that only repeated or excessive behavior is flagged, reducing false positives while maintaining a realistic monitoring system.

Interaction with the Environment

When the game server is created, agents connect to it and interact with the environment. In every step of the interaction, agents submits an Action and receive Observation with next_state, reward, is_terminal, end, and info values. Once the terminal state or timeout is reached, no more interaction is possible until the agent asks for a game reset. Each agent should extend the BaseAgent class in agents.

Testing the environment

It is advised that after every change, you test if the env is running correctly by doing

tests/run_all_tests.sh

This will load and run the unit tests in the tests folder. After passing all tests, linting and formatting are checked with ruff.

Code adaptation for new configurations

The code can be adapted to new configurations of games and for new agents. See Agent repository for more details.

About us

This code was developed at the Stratosphere Laboratory at the Czech Technical University in Prague.

For Tasks:

Click tags to check more tools for each tasks

detect threats simulate attacks train ai agents evaluate security measures configure network scenarios

For Jobs:

cybersecurity analyst security engineer ai researcher network administrator penetration tester

Alternative AI tools for NetSecGame

Similar Open Source Tools

NetSecGame

github

: 57

mentals-ai

Mentals AI is a tool designed for creating and operating agents that feature loops, memory, and various tools, all through straightforward markdown syntax. This tool enables you to concentrate solely on the agent’s logic, eliminating the necessity to compose underlying code in Python or any other language. It redefines the foundational frameworks for future AI applications by allowing the creation of agents with recursive decision-making processes, integration of reasoning frameworks, and control flow expressed in natural language. Key concepts include instructions with prompts and references, working memory for context, short-term memory for storing intermediate results, and control flow from strings to algorithms. The tool provides a set of native tools for message output, user input, file handling, Python interpreter, Bash commands, and short-term memory. The roadmap includes features like a web UI, vector database tools, agent's experience, and tools for image generation and browsing. The idea behind Mentals AI originated from studies on psychoanalysis executive functions and aims to integrate 'System 1' (cognitive executor) with 'System 2' (central executive) to create more sophisticated agents.

github

: 376

oasis

OASIS is a scalable, open-source social media simulator that integrates large language models with rule-based agents to realistically mimic the behavior of up to one million users on platforms like Twitter and Reddit. It facilitates the study of complex social phenomena such as information spread, group polarization, and herd behavior, offering a versatile tool for exploring diverse social dynamics and user interactions in digital environments. With features like scalability, dynamic environments, diverse action spaces, and integrated recommendation systems, OASIS provides a comprehensive platform for simulating social media interactions at a large scale.

github

: 1.1k

OnAIR

The On-board Artificial Intelligence Research (OnAIR) Platform is a framework that enables AI algorithms written in Python to interact with NASA's cFS. It is intended to explore research concepts in autonomous operations in a simulated environment. The platform provides tools for generating environments, handling telemetry data through Redis, running unit tests, and contributing to the repository. Users can set up a conda environment, configure telemetry and Redis examples, run simulations, and conduct unit tests to ensure the functionality of their AI algorithms. The platform also includes guidelines for licensing, copyright, and contributions to the repository.

github

: 66

blinkid-ios

BlinkID iOS is a mobile SDK that enables developers to easily integrate ID scanning and data extraction capabilities into their iOS applications. The SDK supports scanning and processing various types of identity documents, such as passports, driver's licenses, and ID cards. It provides accurate and fast data extraction, including personal information and document details. With BlinkID iOS, developers can enhance their apps with secure and reliable ID verification functionality, improving user experience and streamlining identity verification processes.

github

: 392

probsem

ProbSem is a repository that provides a framework to leverage large language models (LLMs) for assigning context-conditional probability distributions over queried strings. It supports OpenAI engines and HuggingFace CausalLM models, and is flexible for research applications in linguistics, cognitive science, program synthesis, and NLP. Users can define prompts, contexts, and queries to derive probability distributions over possible completions, enabling tasks like cloze completion, multiple-choice QA, semantic parsing, and code completion. The repository offers CLI and API interfaces for evaluation, with options to customize models, normalize scores, and adjust temperature for probability distributions.

github

: 72

metta

Metta AI is an open-source research project focusing on the emergence of cooperation and alignment in multi-agent AI systems. It explores the impact of social dynamics like kinship and mate selection on learning and cooperative behaviors of AI agents. The project introduces a reward-sharing mechanism mimicking familial bonds and mate selection to observe the evolution of complex social behaviors among AI agents. Metta aims to contribute to the discussion on safe and beneficial AGI by creating an environment where AI agents can develop general intelligence through continuous learning and adaptation.

github

: 102

mountain-goap

Mountain GOAP is a generic C# GOAP (Goal Oriented Action Planning) library for creating AI agents in games. It favors composition over inheritance, supports multiple weighted goals, and uses A* pathfinding to plan paths through sequential actions. The library includes concepts like agents, goals, actions, sensors, permutation selectors, cost callbacks, state mutators, state checkers, and a logger. It also features event handling for agent planning and execution. The project structure includes examples, API documentation, and internal classes for planning and execution.

github

: 77

invariant

Invariant Analyzer is an open-source scanner designed for LLM-based AI agents to find bugs, vulnerabilities, and security threats. It scans agent execution traces to identify issues like looping behavior, data leaks, prompt injections, and unsafe code execution. The tool offers a library of built-in checkers, an expressive policy language, data flow analysis, real-time monitoring, and extensible architecture for custom checkers. It helps developers debug AI agents, scan for security violations, and prevent security issues and data breaches during runtime. The analyzer leverages deep contextual understanding and a purpose-built rule matching engine for security policy enforcement.

github

: 143

CogAgent

CogAgent is an advanced intelligent agent model designed for automating operations on graphical interfaces across various computing devices. It supports platforms like Windows, macOS, and Android, enabling users to issue commands, capture device screenshots, and perform automated operations. The model requires a minimum of 29GB of GPU memory for inference at BF16 precision and offers capabilities for executing tasks like sending Christmas greetings and sending emails. Users can interact with the model by providing task descriptions, platform specifications, and desired output formats.

github

: 113

garak

Garak is a free tool that checks if a Large Language Model (LLM) can be made to fail in a way that is undesirable. It probes for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and many other weaknesses. Garak's a free tool. We love developing it and are always interested in adding functionality to support applications.

github

: 1.3k

fabrice-ai

A lightweight, functional, and composable framework for building AI agents that work together to solve complex tasks. Built with TypeScript and designed to be serverless-ready. Fabrice embraces functional programming principles, remains stateless, and stays focused on composability. It provides core concepts like easy teamwork creation, infrastructure-agnosticism, statelessness, and includes all tools and features needed to build AI teams. Agents are specialized workers with specific roles and capabilities, able to call tools and complete tasks. Workflows define how agents collaborate to achieve a goal, with workflow states representing the current state of the workflow. Providers handle requests to the LLM and responses. Tools extend agent capabilities by providing concrete actions they can perform. Execution involves running the workflow to completion, with options for custom execution and BDD testing.

github

: 223

OneKE

OneKE is a flexible dockerized system for schema-guided knowledge extraction, capable of extracting information from the web and raw PDF books across multiple domains like science and news. It employs a collaborative multi-agent approach and includes a user-customizable knowledge base to enable tailored extraction. OneKE offers various IE tasks support, data sources support, LLMs support, extraction method support, and knowledge base configuration. Users can start with examples using YAML, Python, or Web UI, and perform tasks like Named Entity Recognition, Relation Extraction, Event Extraction, Triple Extraction, and Open Domain IE. The tool supports different source formats like Plain Text, HTML, PDF, Word, TXT, and JSON files. Users can choose from various extraction models like OpenAI, DeepSeek, LLaMA, Qwen, ChatGLM, MiniCPM, and OneKE for information extraction tasks. Extraction methods include Schema Agent, Extraction Agent, and Reflection Agent. The tool also provides support for schema repository and case repository management, along with solutions for network issues. Contributors to the project include Ningyu Zhang, Haofen Wang, Yujie Luo, Xiangyuan Ru, Kangwei Liu, Lin Yuan, Mengshu Sun, Lei Liang, Zhiqiang Zhang, Jun Zhou, Lanning Wei, Da Zheng, and Huajun Chen.

github

: 57

kafka-ml

Kafka-ML is a framework designed to manage the pipeline of Tensorflow/Keras and PyTorch machine learning models on Kubernetes. It enables the design, training, and inference of ML models with datasets fed through Apache Kafka, connecting them directly to data streams like those from IoT devices. The Web UI allows easy definition of ML models without external libraries, catering to both experts and non-experts in ML/AI.

github

: 163

allms

allms is a versatile and powerful library designed to streamline the process of querying Large Language Models (LLMs). Developed by Allegro engineers, it simplifies working with LLM applications by providing a user-friendly interface, asynchronous querying, automatic retrying mechanism, error handling, and output parsing. It supports various LLM families hosted on different platforms like OpenAI, Google, Azure, and GCP. The library offers features for configuring endpoint credentials, batch querying with symbolic variables, and forcing structured output format. It also provides documentation, quickstart guides, and instructions for local development, testing, updating documentation, and making new releases.

github

: 82

ai8x-training

github

: 86

For similar tasks

NetSecGame

github

: 57

last_layer

last_layer is a security library designed to protect LLM applications from prompt injection attacks, jailbreaks, and exploits. It acts as a robust filtering layer to scrutinize prompts before they are processed by LLMs, ensuring that only safe and appropriate content is allowed through. The tool offers ultra-fast scanning with low latency, privacy-focused operation without tracking or network calls, compatibility with serverless platforms, advanced threat detection mechanisms, and regular updates to adapt to evolving security challenges. It significantly reduces the risk of prompt-based attacks and exploits but cannot guarantee complete protection against all possible threats.

github

: 79

Cyberion-Spark-X

Cyberion-Spark-X is a powerful open-source tool designed for cybersecurity professionals and data analysts. It provides advanced capabilities for analyzing and visualizing large datasets to detect security threats and anomalies. The tool integrates with popular data sources and supports various machine learning algorithms for predictive analytics and anomaly detection. Cyberion-Spark-X is user-friendly and highly customizable, making it suitable for both beginners and experienced professionals in the field of cybersecurity and data analysis.

github

: 265

Here-Comes-the-AI-Worm

Large Language Models (LLMs) are now embedded in everyday tools like email assistants, chat apps, and productivity software. This project introduces DonkeyRail, a lightweight guardrail that detects and blocks malicious self-replicating prompts known as RAGworm within GenAI-powered applications. The guardrail is fast, accurate, and practical for real-world GenAI systems, preventing activities like spam, phishing campaigns, and data leaks.

github

: 205

sec-gemini

Sec-Gemini is an experimental cybersecurity-focused AI tool developed by Google. This repository contains SDKs and a CLI for Sec-Gemini, with SDKs available for Python and TypeScript. Additionally, there is a web component provided to facilitate integration on websites.

github

: 74

HydraDragonPlatform

Hydra Dragon Automatic Malware/Executable Analysis Platform offers dynamic and static analysis for Windows, including open-source XDR projects, ClamAV, YARA-X, machine learning AI, behavioral analysis, Unpacker, Deobfuscator, Decompiler, website signatures, Ghidra, Suricata, Sigma, Kernel based protection, and more. It is a Unified Executable Analysis & Detection Framework.

github

: 146

openguardrails

OpenGuardrails provides runtime security for AI agents by detecting prompt injection, credential leakage, data exfiltration, and behavioral threats in real time. It wraps AI agents with a security layer, intercepts tool calls and messages, scans them against threat detectors and a behavioral rule engine, and blocks or alerts before damage is done. The tool includes a management dashboard for full visibility and an optional local gateway to sanitize sensitive data before leaving the machine. It is open source under the Apache 2.0 license.

github

: 237

Arcade-Learning-Environment

The Arcade Learning Environment (ALE) is a simple framework that allows researchers and hobbyists to develop AI agents for Atari 2600 games. It is built on top of the Atari 2600 emulator Stella and separates the details of emulation from agent design. The ALE currently supports three different interfaces: C++, Python, and OpenAI Gym.

github

: 2.2k

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 1.1k

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.9k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 32.9k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675