awesome-LLM-game-agent-papers
A Survey on Large Language Model-Based Game Agents
Stars: 229
This repository provides a comprehensive survey of research papers on large language model (LLM)-based game agents. LLMs are powerful AI models that can understand and generate human language, and they have shown great promise for developing intelligent game agents. This survey covers a wide range of topics, including adventure games, crafting and exploration games, simulation games, competition games, cooperation games, communication games, and action games. For each topic, the survey provides an overview of the state-of-the-art research, as well as a discussion of the challenges and opportunities for future work.
README:
🔥 Must-read papers for LLM-based Game agents.
💫 Continuously update on a weekly basis. (last update: 2024/09/22)
- [2019/09] Interactive Fiction Games: A Colossal Adventure AAAI 2020 [paper] [code]
- [2020/10] ALFWorld: Aligning Text and Embodied Environments for Interactive Learning ICLR 2021 [paper][code]
- [2022/03] ScienceWorld: Is your Agent Smarter than a 5th Grader? EMNLP 2022 [paper] [code]
- [2022/10] ReAct: Synergizing Reasoning and Acting in Language Models ICLR 2023 [paper] [code]
- [2023/03] Reflexion: Language Agents with Verbal Reinforcement Learning NeurIPS 2023 [paper] [code]
- [2023/04] Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions arXiv [paper]
- [2023/05] SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks NeurIPS 2023 [paper] [code]
- [2023/10] FireAct: Toward Language Agent Fine-tuning arXiv [paper][code]
- [2023/11] ADaPT: As-Needed Decomposition and Planning with Language Models arXiv [paper][code]
- [2024/02] Soft Self-Consistency Improves Language Model Agents arXiv [paper][code]
- [2024/02] Empowering Large Language Model Agents through Action Learning arXiv [paper][code]
- [2024/03] KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents arXiv [paper][code]
- [2024/03] Language Guided Exploration for RL Agents in Text Environments arXiv [paper][code]
- [2024/03] Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents ACL 2024 [paper][code]
- [2024/04] Learning From Failure: Integrating Negative Examples When Fine-tuning Large Language Models as Agent arXiv[paper][code]
- [2024/04] ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy [paper]
- [2024/05] Agent Planning with World Knowledge Model arXiv [paper][code]
- [2024/05] THREAD: Thinking Deeper with Recursive Spawning arXiv [paper]
- [2024/06] Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement arXiv [paper][code]
- [2023/09] Motif: Intrinsic Motivation from Artificial Intelligence Feedback ICLR 2024 [paper] [code]
- [2024/03] Cradle: Empowering Foundation Agents Towards General Computer Control arXiv [paper][code]
- [2024/03] Playing NetHack with LLMs: Potential & Limitations as Zero-Shot Agents arXiv [paper] [code]
- [2023/02] Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents NeurIPS 2023 [paper][code]
- [2023/03] Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks FMDM@NeurIPS2023 [paper][code]
- [2023/05] Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory arXiv [paper]
- [2023/05] VOYAGER: An Open-Ended Embodied Agent with Large Language Models FMDM@NeurIPS2023 [paper][code]
- [2023/10] LLaMA Rider: Spurring Large Language Models to Explore the Open World arXiv [paper][code]
- [2023/10] Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds ICLR 2024 [paper]
- [2023/11] JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models arXiv [paper][code]
- [2023/11] See and Think: Embodied Agent in Virtual Environment arXiv [paper][code]
- [2023/12] MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception CVPR 2024 [paper][code]
- [2023/12] Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft arXiv [paper]
- [2023/12] Creative Agents: Empowering Agents with Imagination for Creative Tasks arXiv [paper][code]
- [2024/02] RL-GPT: Integrating Reinforcement Learning and Code-as-policy arXiv [paper]
- [2024/03] MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control arXiv [paper][code]
- [2024/07] Odyssey: Empowering Agents with Open-World Skills. arXiv [paper][code]
- [2023/02] Guiding Pretraining in Reinforcement Learning with Large Language Models ICML 2023 [paper]
- [2023/05] SPRING: Studying Papers and Reasoning to play Games NeurIPS 2023 [paper]
- [2023/06] OMNI: Open-endedness via Models of human Notions of Interestingness arXiv [paper][code]
- [2023/09] AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback arXiv [paper]
- [2024/03] EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents arXiv [paper]
- [2024/04] AgentKit: Flow Engineering with Graphs, not Coding arXiv [paper][code]
- [2024/04] World Models with Hints of Large Language Models for Goal Achieving arXiv [paper]
- [2024/07] Enhancing Agent Learning through World Dynamics Modeling arXiv [paper]
- [2023/04] Generative Agents: Interactive Simulacra of Human Behavior UIST 2023 [paper][code]
- [2023/08] AgentSims: An Open-Source Sandbox for Large Language Model Evaluation arXiv [paper]
- [2023/10] Humanoid Agents: Platform for Simulating Human-like Generative Agents arXiv [paper]
- [2023/10] Lyfe Agents: Generative agents for low-cost real-time social interactions arXiv [paper]
- [2023/10] SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents arXiv [paper][code]
- [2024/03] SOTOPIA-$\pi$: Interactive Learning of Socially Intelligent Language Agents arXiv [paper][code]
- [2024/09] Altera: Building Digital Humans [website]
- [2022/01] Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents ICML 2022 [paper][code]
- [2022/12] LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models ICCV 2023 [paper]
- [2023/05] Language Models Meet World Models: Embodied Experiences Enhance Language Models NeurIPS 2023 [paper][code]
- [2023/10] Octopus: Embodied Vision-Language Programmer from Environmental Feedback arXiv [paper] [code]
- [2024/01] True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning arXiv[paper][code]
- [2024/01] CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents ICLR 2024 [paper][code]
- [2022/10] Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task ICLR 2023 [paper]
- [2023/06] ChessGPT: Bridging Policy Learning and Language Modeling NeurIPS 2023 [paper][code]
- [2023/08] Are ChatGPT and GPT-4 Good Poker Players?--A Pre-Flop Analysis arXiv [paper]
- [2023/09] Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4 arXiv [paper]
- [2023/12] Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach arXiv [paper][code]
- [2024/01] PokerGPT: An End-to-End Lightweight Solver for Multi-Player Texas Hold'em via Large Language Model arXiv [paper]
- [2024/01] SwarmBrain: Embodied agent for real-time strategy game StarCraft II via large language models arXiv [paper]
- [2024/02] PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models arXiv [paper][code]
- [2024/02] Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization arXiv [paper][code]
- [2024/03] Embodied LLM Agents Learn to Cooperate in Organized Teams arXiv [paper]
- [2023/07] Building Cooperative Embodied Agents Modularly with Large Language Models ICLR 2024 [paper][code]
- [2023/09] MindAgent: Emergent Gaming Interaction arXiv [paper]
- [2023/10] Evaluating Multi-agent Coordination Abilities in Large Language Models arXiv [paper]
- [2023/12] LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination arXiv [paper]
- [2024/02] S-Agents: Self-organizing Agents in Open-ended Environments arXiv [paper]
- [2024/03] ProAgent: Building Proactive Cooperative Agents with Large Language Models AAAI 2024 [paper]
- [2024/03] Can LLM-Augmented Autonomous Agents Cooperate?, An Evaluation of Their Cooperative Capabilities through Melting Pot arXiv [paper]
- [2024/03] Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation arXiv[paper]
- [2024/05] Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration arXiv[paper][code]
- [2022/12] Human-Level Play in the Game of Diplomacy by Combining Language Models with Strategic Reasoning Science [paper]
- [2023/08] GameEval: Evaluating LLMs on Conversational Games arXiv [paper][code]
- [2023/09] Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf arXiv [paper]
- [2023/10] Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game arXiv [paper]
- [2023/10] Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation arXiv [paper]
- [2023/10] AvalonBench: Evaluating LLMs Playing the Game of Avalon FMDM@NeurIPS2023 [paper][code]
- [2023/10] LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay arXiv [paper]
- [2023/10] Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models arXiv [paper][code]
- [2023/11] War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars arXiv [paper][code]
- [2023/11] clembench: Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents EMNLP 2023 [paper]
- [2023/12] Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game arXiv [paper]
- [2023/12] Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games [paper]
- [2024/02] Enhance Reasoning for Large Language Models in the Game Werewolf arXiv [paper]
- [2024/02] What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents arXiv [paper]
- [2024/04] Self-playing Adversarial Language Game Enhances LLM Reasoning [paper][code]
- [2024/06] PLAYER: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games arXiv[paper]
- [2023/02] Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning ICML 2023 [paper][code]
- [2024/03] Cradle: Empowering Foundation Agents Towards General Computer Control arXiv [paper][code]
- [2024/03] Will GPT-4 Run DOOM? arXiv [paper][code]
- [2024/03] Evaluate LLMs in Real Time with Street Fighter III GitHub [code]
- [2024/07] Baba Is AI: Break the Rules to Beat the Benchmark ICML 2024 [paper]
- [2024/08] Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games arXiv [paper]
- [2024/09] Can VLMs Play Action Role-Playing Games? Take Black Myth Wukong as a Study Case arXiv [paper] [code]
If you find this repository useful, please cite our paper:
@misc{hu2024survey,
title={A Survey on Large Language Model-Based Game Agents},
author={Sihao Hu and Tiansheng Huang and Fatih Ilhan and Selim Tekin and Gaowen Liu and Ramana Kompella and Ling Liu},
year={2024},
eprint={2404.02039},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
If you discover any papers that are suitable but not included, please contact Sihao Hu ([email protected]).
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for awesome-LLM-game-agent-papers
Similar Open Source Tools
awesome-LLM-game-agent-papers
This repository provides a comprehensive survey of research papers on large language model (LLM)-based game agents. LLMs are powerful AI models that can understand and generate human language, and they have shown great promise for developing intelligent game agents. This survey covers a wide range of topics, including adventure games, crafting and exploration games, simulation games, competition games, cooperation games, communication games, and action games. For each topic, the survey provides an overview of the state-of-the-art research, as well as a discussion of the challenges and opportunities for future work.
Awesome-Robotics-3D
Awesome-Robotics-3D is a curated list of 3D Vision papers related to Robotics domain, focusing on large models like LLMs/VLMs. It includes papers on Policy Learning, Pretraining, VLM and LLM, Representations, and Simulations, Datasets, and Benchmarks. The repository is maintained by Zubair Irshad and welcomes contributions and suggestions for adding papers. It serves as a valuable resource for researchers and practitioners in the field of Robotics and Computer Vision.
Awesome-LLM-Interpretability
Awesome-LLM-Interpretability is a curated list of materials related to LLM (Large Language Models) interpretability, covering tutorials, code libraries, surveys, videos, papers, and blogs. It includes resources on transformer mechanistic interpretability, visualization, interventions, probing, fine-tuning, feature representation, learning dynamics, knowledge editing, hallucination detection, and redundancy analysis. The repository aims to provide a comprehensive overview of tools, techniques, and methods for understanding and interpreting the inner workings of large language models.
Awesome_papers_on_LLMs_detection
This repository is a curated list of papers focused on the detection of Large Language Models (LLMs)-generated content. It includes the latest research papers covering detection methods, datasets, attacks, and more. The repository is regularly updated to include the most recent papers in the field.
LLM-Agents-Papers
A repository that lists papers related to Large Language Model (LLM) based agents. The repository covers various topics including survey, planning, feedback & reflection, memory mechanism, role playing, game playing, tool usage & human-agent interaction, benchmark & evaluation, environment & platform, agent framework, multi-agent system, and agent fine-tuning. It provides a comprehensive collection of research papers on LLM-based agents, exploring different aspects of AI agent architectures and applications.
ABigSurveyOfLLMs
ABigSurveyOfLLMs is a repository that compiles surveys on Large Language Models (LLMs) to provide a comprehensive overview of the field. It includes surveys on various aspects of LLMs such as transformers, alignment, prompt learning, data management, evaluation, societal issues, safety, misinformation, attributes of LLMs, efficient LLMs, learning methods for LLMs, multimodal LLMs, knowledge-based LLMs, extension of LLMs, LLMs applications, and more. The repository aims to help individuals quickly understand the advancements and challenges in the field of LLMs through a collection of recent surveys and research papers.
Awesome-LLM-Robotics
This repository contains a curated list of **papers using Large Language/Multi-Modal Models for Robotics/RL**. Template from awesome-Implicit-NeRF-Robotics Please feel free to send me pull requests or email to add papers! If you find this repository useful, please consider citing and STARing this list. Feel free to share this list with others! ## Overview * Surveys * Reasoning * Planning * Manipulation * Instructions and Navigation * Simulation Frameworks * Citation
Paper-Reading-ConvAI
Paper-Reading-ConvAI is a repository that contains a list of papers, datasets, and resources related to Conversational AI, mainly encompassing dialogue systems and natural language generation. This repository is constantly updating.
llm-misinformation-survey
The 'llm-misinformation-survey' repository is dedicated to the survey on combating misinformation in the age of Large Language Models (LLMs). It explores the opportunities and challenges of utilizing LLMs to combat misinformation, providing insights into the history of combating misinformation, current efforts, and future outlook. The repository serves as a resource hub for the initiative 'LLMs Meet Misinformation' and welcomes contributions of relevant research papers and resources. The goal is to facilitate interdisciplinary efforts in combating LLM-generated misinformation and promoting the responsible use of LLMs in fighting misinformation.
do-research-in-AI
This repository is a collection of research lectures and experience sharing posts from frontline researchers in the field of AI. It aims to help individuals upgrade their research skills and knowledge through insightful talks and experiences shared by experts. The content covers various topics such as evaluating research papers, choosing research directions, research methodologies, and tips for writing high-quality scientific papers. The repository also includes discussions on academic career paths, research ethics, and the emotional aspects of research work. Overall, it serves as a valuable resource for individuals interested in advancing their research capabilities in the field of AI.
Call-for-Reviewers
The `Call-for-Reviewers` repository aims to collect the latest 'call for reviewers' links from various top CS/ML/AI conferences/journals. It provides an opportunity for individuals in the computer/ machine learning/ artificial intelligence fields to gain review experience for applying for NIW/H1B/EB1 or enhancing their CV. The repository helps users stay updated with the latest research trends and engage with the academic community.
Everything-LLMs-And-Robotics
The Everything-LLMs-And-Robotics repository is the world's largest GitHub repository focusing on the intersection of Large Language Models (LLMs) and Robotics. It provides educational resources, research papers, project demos, and Twitter threads related to LLMs, Robotics, and their combination. The repository covers topics such as reasoning, planning, manipulation, instructions and navigation, simulation frameworks, perception, and more, showcasing the latest advancements in the field.
prompt-in-context-learning
An Open-Source Engineering Guide for Prompt-in-context-learning from EgoAlpha Lab. 📝 Papers | ⚡️ Playground | 🛠 Prompt Engineering | 🌍 ChatGPT Prompt | ⛳ LLMs Usage Guide > **⭐️ Shining ⭐️:** This is fresh, daily-updated resources for in-context learning and prompt engineering. As Artificial General Intelligence (AGI) is approaching, let’s take action and become a super learner so as to position ourselves at the forefront of this exciting era and strive for personal and professional greatness. The resources include: _🎉Papers🎉_: The latest papers about _In-Context Learning_ , _Prompt Engineering_ , _Agent_ , and _Foundation Models_. _🎉Playground🎉_: Large language models(LLMs)that enable prompt experimentation. _🎉Prompt Engineering🎉_: Prompt techniques for leveraging large language models. _🎉ChatGPT Prompt🎉_: Prompt examples that can be applied in our work and daily lives. _🎉LLMs Usage Guide🎉_: The method for quickly getting started with large language models by using LangChain. In the future, there will likely be two types of people on Earth (perhaps even on Mars, but that's a question for Musk): - Those who enhance their abilities through the use of AIGC; - Those whose jobs are replaced by AI automation. 💎EgoAlpha: Hello! human👤, are you ready?
awesome-llm-attributions
This repository focuses on unraveling the sources that large language models tap into for attribution or citation. It delves into the origins of facts, their utilization by the models, the efficacy of attribution methodologies, and challenges tied to ambiguous knowledge reservoirs, biases, and pitfalls of excessive attribution.
For similar tasks
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
onnxruntime-genai
ONNX Runtime Generative AI is a library that provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management. Users can call a high level `generate()` method, or run each iteration of the model in a loop. It supports greedy/beam search and TopP, TopK sampling to generate token sequences, has built in logits processing like repetition penalties, and allows for easy custom scoring.
jupyter-ai
Jupyter AI connects generative AI with Jupyter notebooks. It provides a user-friendly and powerful way to explore generative AI models in notebooks and improve your productivity in JupyterLab and the Jupyter Notebook. Specifically, Jupyter AI offers: * An `%%ai` magic that turns the Jupyter notebook into a reproducible generative AI playground. This works anywhere the IPython kernel runs (JupyterLab, Jupyter Notebook, Google Colab, Kaggle, VSCode, etc.). * A native chat UI in JupyterLab that enables you to work with generative AI as a conversational assistant. * Support for a wide range of generative model providers, including AI21, Anthropic, AWS, Cohere, Gemini, Hugging Face, NVIDIA, and OpenAI. * Local model support through GPT4All, enabling use of generative AI models on consumer grade machines with ease and privacy.
khoj
Khoj is an open-source, personal AI assistant that extends your capabilities by creating always-available AI agents. You can share your notes and documents to extend your digital brain, and your AI agents have access to the internet, allowing you to incorporate real-time information. Khoj is accessible on Desktop, Emacs, Obsidian, Web, and Whatsapp, and you can share PDF, markdown, org-mode, notion files, and GitHub repositories. You'll get fast, accurate semantic search on top of your docs, and your agents can create deeply personal images and understand your speech. Khoj is self-hostable and always will be.
langchain_dart
LangChain.dart is a Dart port of the popular LangChain Python framework created by Harrison Chase. LangChain provides a set of ready-to-use components for working with language models and a standard interface for chaining them together to formulate more advanced use cases (e.g. chatbots, Q&A with RAG, agents, summarization, extraction, etc.). The components can be grouped into a few core modules: * **Model I/O:** LangChain offers a unified API for interacting with various LLM providers (e.g. OpenAI, Google, Mistral, Ollama, etc.), allowing developers to switch between them with ease. Additionally, it provides tools for managing model inputs (prompt templates and example selectors) and parsing the resulting model outputs (output parsers). * **Retrieval:** assists in loading user data (via document loaders), transforming it (with text splitters), extracting its meaning (using embedding models), storing (in vector stores) and retrieving it (through retrievers) so that it can be used to ground the model's responses (i.e. Retrieval-Augmented Generation or RAG). * **Agents:** "bots" that leverage LLMs to make informed decisions about which available tools (such as web search, calculators, database lookup, etc.) to use to accomplish the designated task. The different components can be composed together using the LangChain Expression Language (LCEL).
danswer
Danswer is an open-source Gen-AI Chat and Unified Search tool that connects to your company's docs, apps, and people. It provides a Chat interface and plugs into any LLM of your choice. Danswer can be deployed anywhere and for any scale - on a laptop, on-premise, or to cloud. Since you own the deployment, your user data and chats are fully in your own control. Danswer is MIT licensed and designed to be modular and easily extensible. The system also comes fully ready for production usage with user authentication, role management (admin/basic users), chat persistence, and a UI for configuring Personas (AI Assistants) and their Prompts. Danswer also serves as a Unified Search across all common workplace tools such as Slack, Google Drive, Confluence, etc. By combining LLMs and team specific knowledge, Danswer becomes a subject matter expert for the team. Imagine ChatGPT if it had access to your team's unique knowledge! It enables questions such as "A customer wants feature X, is this already supported?" or "Where's the pull request for feature Y?"
infinity
Infinity is an AI-native database designed for LLM applications, providing incredibly fast full-text and vector search capabilities. It supports a wide range of data types, including vectors, full-text, and structured data, and offers a fused search feature that combines multiple embeddings and full text. Infinity is easy to use, with an intuitive Python API and a single-binary architecture that simplifies deployment. It achieves high performance, with 0.1 milliseconds query latency on million-scale vector datasets and up to 15K QPS.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.