
ai-agent-papers
A collection of AI Agents papers (Updated biweekly)
Stars: 565

The AI Agents Papers repository provides a curated collection of papers focusing on AI agents, covering topics such as agent capabilities, applications, architectures, and presentations. It includes a variety of papers on ideation, decision making, long-horizon tasks, learning, memory-based agents, self-evolving agents, and more. The repository serves as a valuable resource for researchers and practitioners interested in AI agent technologies and advancements.
README:
Updated biweekly.
AI agents can think, act, and complete tasks by themselves.
But can they really replace our jobs?
π₯: Recommended papers
π: Survey papers
βοΈ: Benchmark papers
- Agent Capabilities
- GenAI Agents Architecture
- GenAI Agents Applications
- GenAI Agents Presentations
- "The Need for Verification in AI-Driven Scientific Discovery" [paper]
- "What Would an LLM Do? Evaluating Policymaking Capabilities of Large Language Models" [paper]
- "LLM-empowered Agents Simulation Framework for Scenario Generation in Service Ecosystem Governance Social World Models" [paper]
- "Language Models Do Not Follow Occamβs Razor: A Benchmark for Inductive and Abductive Reasoning" [paper]
- "LLM-empowered Agents Simulation Framework for Scenario Generation in Service Ecosystem Governance" [paper]
- "VulAgent: A Hypothesis Validation-Based Multi-Agent System for Software Vulnerability Detection" [paper]
- "Tackling One Health Risks: How Large Language Models are leveraged for Risk Negotiation and Consensus-building" [paper]
- "Agents of Discovery" [paper]
- "ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization" [paper]
- "Empowering LLMs with Parameterized Skills for Adversarial Long-Horizon Planning" [paper]
- "The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs" [paper]
- "WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents" [paper]
- βοΈ "LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering" [[paper]]
- "SWE-QA: Can Language Models Answer Repository-level Code Questions?" [paper]
- "rStar2-Agent: Agentic Reasoning Technical Report" [paper]
- "The Landscape of Agentic Reinforcement Learning for LLMs: A Survey" [paper]
- "Scaling Agents via Continual Pre-training" [paper]
- "ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization" [paper]
- "ArcMemo: Abstract Reasoning Composition with Lifelong LLM Memory" [paper]
- π "LLM-based Agentic Reasoning Frameworks: A Survey from Methods to Scenarios" [paper]
- π "Reinforcement Learning Foundations for Deep Research Systems: A Survey" [paper]
- "Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance" [paper]
- π "A Comprehensive Survey of Self-Evolving AI Agents" [paper]
- "HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research" [paper]
- "SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience"
- "SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents" [paper]
- "HERAKLES: Hierarchical Skill Compilation for Open-ended LLM Agents" [paper]
- βοΈ "Building Self-Evolving Agents via Experience-Driven Lifelong Learning: A Framework and Benchmark" [paper]
- Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory [paper]
- "Memp: Exploring Agent Procedural Memory" [paper]
- "Nemori: Self-Organizing Agent Memory Inspired by Cognitive Science" [paper]
- "Coarse-to-Fine Grounded Memory for LLM Agent Planning" [paper]
- "Learn to Memorize: Optimizing LLM-based Agents with Adaptive Memory Framework" [paper]
- "Memento: Fine-tuning LLM Agents without Fine-tuning LLMs" [paper]
- "Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning" [paper]
- "K-Dense Analyst: Towards Fully Automated Scientific Analysis" [paper]
- π "From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery" [paper]
- "The AI Data Scientist" [paper]
- "Spacer: Towards Engineered Scientific Inspiration" [paper]
- "BIODISCO: Multi-agent hypothesis generation with dual-mode evidence, iterative feedback and temporal evaluation" [paper]
- "Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization" [paper]
- "MK2 at PBIG Competition: A Prompt Generation Solution" [paper]
- "LLM Agents Are the Antidote to Walled Gardens", University of Oxford. [paper]
- "Exploring Advanced LLM Multi-Agent Systems Based on Blackboard Architecture", State Key Laboratory. [paper]
- "Aime: Towards Fully-Autonomous Multi-Agent Framework", ByteDance. [paper]
- π "A Survey of Context Engineering for Large Language Models" [paper]
- π "A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents" [paper]
- "From Reasoning to Super-Intelligence: A Search-Theoretic Perspective", AA-I. [paper]
- "Making REST APIs Agent-Ready: From OpenAPI to Model Context Protocol Servers for Tool-Augmented LLMs", University of Michigan. [paper]
- "Large Language Model Powered Intelligent Urban Agents: Concepts, Capabilities, and Applications", Shandong University. [paper]
- "Emotionally Intelligent Task-oriented Dialogue Systems: Architecture, Representation, and Optimisation", Heinrich Heine University. [paper]
- "Agent Ideate: A Framework for Product Idea Generation from Patents Using Agentic AI", TCS Research. [paper]
- "Agent Exchange: Shaping the Future of AI Agent Economics", Shanghai Jiao Tong University. [paper]
- "Evaluating LLM Agent Collusion in Double Auctions", Relativity, Stanford University, Arb Research. [paper]
- "Enhancing COBOL Code Explanations: A Multi-Agents Approach Using Large Language Models", Queenβs University, IBM USA. [paper]
- "CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale", Duke University, Army Research Laboratory. [paper]
- "Deep Researcher with Test-Time Diffusion", Google. [paper]
- "AI Agents-as-Judge: Automated Assessment of Accuracy, Consistency, Completeness and Clarity for Enterprise Documents", Accenture. [paper]
- "Agentic Retrieval of Topics and Insights from Earnings Calls", Bloomberg. [paper]
- βοΈ "Ready Jurist One: Benchmarking Language Agents for Legal Intelligence in Dynamic Environments", Fudan University. [paper]
- "Routine: A Structural Planning Framework for LLM Agent System in Enterprise", Digital China AI Research. [paper]
- "Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance", ByteDance. [paper]
- "Compliance Brain Assistant: Conversational Agentic AI for Assisting Compliance Tasks in Enterprise Environments", Meta. [paper]
- "Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems", Tsinghua University. [paper]
- βοΈ "DABstep: Data Agent Benchmark for Multi-step Reasoning", Adyen, Hugging Face. [paper]
- π "Toward Real-World Table Agents: Capabilities, Workflows, and Design Principles for LLM-based Table Intelligence", Zhejiang University. [paper]
- π "The Evolving Role of Large Language Models in Scientific Innovation: Evaluator, Collaborator, and Scientist", University of North Texas. [paper]
- "AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench", Meta. [paper]
- "Open-ended Scientific Discovery via Bayesian Surprise", Allen Institute for AI. [paper]
- "Large Language Models as Innovators: A Framework to Leverage Latent Space Exploration for Novelty Discovery", WrocΕaw University. [paper]
- "Too Human to Model: The Uncanny Valley of LLMs in Social Simulation", Atmospheric Environmental Research. [paper]
- "Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust", CAMEL-AI.org. [paper]
- "LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra",Princeton University, Salesforce Research. [paper]
- π "Large Language Models for Agent-Based Modelling: Current and possible uses across the modelling cycle" [paper]
- "Foundation Model Self-Play: Open-Ended Strategy Innovation via Foundation Models", CIFAR AI Chair. [paper]
- "MemOS: A Memory OS for AI System", MemTensor (Shanghai) Technology Co., Ltd. [paper]
- βοΈ "Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions", UC San Diego. [paper]
- "MIRIX: Multi-Agent Memory System for LLM-Based Agents", MIRIX AI. [paper]
- βοΈ "DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents" [paper]
- π "From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents" [paper]
- π "Deep Research Agents: A Systematic Examination And Roadmap" [paper]
- π "Towards AI Search Paradigm" [paper]
- "Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge" [paper]
- "MMSearch-R1: Incentivizing LMMs to Search" [paper]
- "Towards Robust Fact-Checking: A Multi-Agent System with Advanced Evidence Retrieval" [paper]
- [Jun 2025] "AUTOMIND: Adaptive Knowledgeable Agent for Automated Data Science" [paper]
- π [Jun 2025] "Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents" [paper]
- [Jun 2025] "SheetMind: An End-to-End LLM-Powered Multi-Agent Framework for Spreadsheet Automation" [paper]
- [Jun 2025] "SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications" [paper]
- [Jun 2025] "Towards Community-Driven Agents for Machine Learning Engineering" [paper]
- [Jun 2025] "MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement" [paper]
- "Oversight Structures for Agentic AI in Public-Sector Organizations" [paper]
- βοΈ "AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance" [paper]
- π "Application-Driven Value Alignment in Agentic AI Systems: Survey and Perspectives" [paper]
- "Intelligent Design 4.0: Paradigm Evolution Toward the Agentic AI Era" [paper]
- "Improved LLM Agents for Financial Document Question Answering" [paper]
- βοΈ "ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering" [paper]
- "Decide less, communicate more: On the construct validity of end-to-end fact-checking in medicine" [paper]
- "SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models" [paper]
- βοΈ "SOP-Bench: Complex Industrial SOPs for Evaluating LLM Agents" [paper]
- "Intelligent Design 4.0: Paradigm Evolution Toward the Agentic AI Era" [paper]
- "Managing Complex Failure Analysis Workflows with LLM-based Reasoning and Acting Agents" [paper]
- "AgenticControl: An Automated Control Design Framework Using Large Language Models" [paper]
- π "A Survey of AI for Materials Science: Foundation Models, LLM Agents, Datasets, and Tools" [paper]
- π "A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law" [paper]
- π "Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models" [paper]
- "Table-R1: Inference-Time Scaling for Table Reasoning" [paper]
- "Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning" [paper]
- "Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning" [paper]
- "Agent RL Scaling Law: Spontaneous Code Execution for Mathematical Problem Solving" [paper]
- "Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent" [paper]
- "An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents" [paper]
- "Process vs. Outcome Reward: Which is Better for Agentic RAG Reinforcement Learning" [paper]
- "MIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool Learning" [paper]
- "EvolveSearch: An Iterative Self-Evolving Search Agent" [paper]
- "VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection" [paper]
- "Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning" [paper]
- "RM-R1: Reward Modeling as Reasoning" [paper]
- "Reward Reasoning Model" [paper]
- "R3: Robust Rubric-Agnostic Reward Models" [paper]
- "AutoLibra: Agent Metric Induction from Open-Ended Feedback" [paper]
- "MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models (Short Version)" [paper]
- "MemEngine: A Unified and Modular Library for Developing Advanced Memory of LLM-based Agents" [paper]
- "MARK: Memory Augmented Refinement of Knowledge" [paper]
- π "Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions" [paper]
- "Skill Discovery for Software Scripting Automation via Offline Simulations with LLMs" [paper]
- "Rethinking Agent Design: From Top-Down Workflows to Bottom-Up Skill Evolution" [paper]
- "Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution" [paper]
- "Absolute Zero: Reinforced Self-play Reasoning with Zero Data" [paper]
- "Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks" [paper]
- "DEBATE, TRAIN, EVOLVE: Self-Evolution of Language Model Reasoning" [paper]
- "Self Rewarding Self Improving" [paper]
- "EvolveSearch: An Iterative Self-Evolving Search Agent" [paper]
- "AlphaEvolve: A coding agent for scientific and algorithmic discovery" [paper]
- "Meta-Design Matters:A Self-Design Multi-Agent System" [paper]
- "Darwin GΓΆdel Machine:Open-Ended Evolution of Self-Improving Agents" [paper]
- "SEW: Self-Evolving Agentic Workflows for Automated Code Generation" [paper]
- "Multi-Agent Collaboration via Evolving Orchestration" [paper]
- π "Creativity in LLM-based Multi-Agent Systems: A Survey" [paper]
- βοΈ "Benchmarking LLMsβ Swarm intelligence" [paper]
- "Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems" [paper]
- "Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications" [paper]
- "Towards Multi-Agent Reasoning Systems for Collaborative Expertise Delegation: An Exploratory Design Study" [paper]
- "34 Examples of LLM Applications in Materials Science and Chemistry: Towards Automation, Assistants, Agents, and Accelerated Scientific Discovery" [paper]
- "PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration" [paper]
- "R&D-Agent: Automating Data-Driven AI Solution Building Through LLM-Powered Automated Research, Development, and Evolution" [paper]
- π "From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery" [paper]
- "Towards Artificial Intelligence Research Assistant for Expert-Involved Learning" [paper]
- "MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering" [paper]
- "ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering" [paper]
- "Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise Analytics" [paper]
- "Agentic Feature Augmentation: Unifying Selection and Generation with Teaming, Planning, and Memories" [paper]
- "JARVIS: A Multi-Agent Code Assistant for High-Quality EDA Script Generation" [paper]
- "MLZero: A Multi-Agent System for End-to-end Machine Learning Automation" [paper]
- "Can Agents Fix Agent Issues?" [paper]
- "Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI" [paper]
- "The Real Barrier to LLM Agent Usability is Agentic ROI" [paper]
- π "A Survey on Large Language Model based Human-Agent Systems" [paper]
- π "Vision-Language-Action Models: Concepts, Progress, Applications and Challenges" [paper]
- π "Multi-agent Embodied AI: Advances and Future Directions" [paper]
- "Efficient Agent Training for Computer Use" [paper]
- βοΈ "AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios" [paper]
- "Inference-Time Scaling for Generalist Reward Modeling" [paper]
- "Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead"[paper]
- "Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection"[paper]
- "Dual Engines of Thoughts: A Depth-Breadth Integration Framework for Open-Ended Analysis"[paper]
- π "A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems"[paper]
- "Welcome to the Era of Experience" [paper]
- "SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills"[paper]
- "Exploring Expert Failures Improves LLM Agent Tuning" [paper]
- "Inducing Programmatic Skills for Agentic Tasks" [paper]
- "Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory" [paper]
- "Local Prompt Optimization" [paper]
- "Revisiting Prompt Optimization with Large Reasoning ModelsβA Case Study on Event Extraction" [paper]
- "Iterative Trajectory Exploration for Multimodal Agents" [papaer]
- "FlowReasoner: Reinforcing Query-Level Meta-Agents" [paper]
- "A Self-Improving Coding Agent" [paper]
- "Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models" [paper]
- "ToolRL: Reward is All Tool Learning Needs" [paper]
- "OTC: Optimal Tool Calls via Reinforcement Learning" [paper]
- "LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities" [paper]
- π "Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey" [paper]
- "The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search" [paper]
- "UFO2: The Desktop AgentOS" [paper]
- "AGENTADA: Skill-Adaptive Data Analytics for Tailored Insight Discovery"[paper]
- βοΈ "BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents" [paper]
- "Toward Super Agent System with Hybrid AI Router" [paper] "AgentA/B: Automated and Scalable Web A/B Testing with Interactive LLM Agents" [paper]
- [Apr 2025] "UXAgent: A System for Simulating Usability Testing of Web Design with LLM Agents" [paper]
- π "Challenges and Paths Towards AI for Software Engineering"[paper]
- π "Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems"[paper]
- π "Adaptive Human-Agent Teaming: A Review of Empirical Studies from the Process Dynamics Perspective" [paper]
- π "A Survey of AI Agent Protocols" [paper]
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for ai-agent-papers
Similar Open Source Tools

ai-agent-papers
The AI Agents Papers repository provides a curated collection of papers focusing on AI agents, covering topics such as agent capabilities, applications, architectures, and presentations. It includes a variety of papers on ideation, decision making, long-horizon tasks, learning, memory-based agents, self-evolving agents, and more. The repository serves as a valuable resource for researchers and practitioners interested in AI agent technologies and advancements.

Paper-Reading-ConvAI
Paper-Reading-ConvAI is a repository that contains a list of papers, datasets, and resources related to Conversational AI, mainly encompassing dialogue systems and natural language generation. This repository is constantly updating.

Everything-LLMs-And-Robotics
The Everything-LLMs-And-Robotics repository is the world's largest GitHub repository focusing on the intersection of Large Language Models (LLMs) and Robotics. It provides educational resources, research papers, project demos, and Twitter threads related to LLMs, Robotics, and their combination. The repository covers topics such as reasoning, planning, manipulation, instructions and navigation, simulation frameworks, perception, and more, showcasing the latest advancements in the field.

awesome-and-novel-works-in-slam
This repository contains a curated list of cutting-edge works in Simultaneous Localization and Mapping (SLAM). It includes research papers, projects, and tools related to various aspects of SLAM, such as 3D reconstruction, semantic mapping, novel algorithms, large-scale mapping, and more. The repository aims to showcase the latest advancements in SLAM technology and provide resources for researchers and practitioners in the field.

Awesome-World-Models
This repository is a curated list of papers related to World Models for General Video Generation, Embodied AI, and Autonomous Driving. It includes foundation papers, blog posts, technical reports, surveys, benchmarks, and specific world models for different applications. The repository serves as a valuable resource for researchers and practitioners interested in world models and their applications in robotics and AI.

Awesome-LLM-Robotics
This repository contains a curated list of **papers using Large Language/Multi-Modal Models for Robotics/RL**. Template from awesome-Implicit-NeRF-Robotics Please feel free to send me pull requests or email to add papers! If you find this repository useful, please consider citing and STARing this list. Feel free to share this list with others! ## Overview * Surveys * Reasoning * Planning * Manipulation * Instructions and Navigation * Simulation Frameworks * Citation

ABigSurveyOfLLMs
ABigSurveyOfLLMs is a repository that compiles surveys on Large Language Models (LLMs) to provide a comprehensive overview of the field. It includes surveys on various aspects of LLMs such as transformers, alignment, prompt learning, data management, evaluation, societal issues, safety, misinformation, attributes of LLMs, efficient LLMs, learning methods for LLMs, multimodal LLMs, knowledge-based LLMs, extension of LLMs, LLMs applications, and more. The repository aims to help individuals quickly understand the advancements and challenges in the field of LLMs through a collection of recent surveys and research papers.

Awesome-Robotics-3D
Awesome-Robotics-3D is a curated list of 3D Vision papers related to Robotics domain, focusing on large models like LLMs/VLMs. It includes papers on Policy Learning, Pretraining, VLM and LLM, Representations, and Simulations, Datasets, and Benchmarks. The repository is maintained by Zubair Irshad and welcomes contributions and suggestions for adding papers. It serves as a valuable resource for researchers and practitioners in the field of Robotics and Computer Vision.

Awesome_Test_Time_LLMs
This repository focuses on test-time computing, exploring various strategies such as test-time adaptation, modifying the input, editing the representation, calibrating the output, test-time reasoning, and search strategies. It covers topics like self-supervised test-time training, in-context learning, activation steering, nearest neighbor models, reward modeling, and multimodal reasoning. The repository provides resources including papers and code for researchers and practitioners interested in enhancing the reasoning capabilities of large language models.

Awesome_papers_on_LLMs_detection
This repository is a curated list of papers focused on the detection of Large Language Models (LLMs)-generated content. It includes the latest research papers covering detection methods, datasets, attacks, and more. The repository is regularly updated to include the most recent papers in the field.

LLM-Agent-Survey
LLM-Agent-Survey is a comprehensive repository that provides a curated list of papers related to Large Language Model (LLM) agents. The repository categorizes papers based on LLM-Profiled Roles and includes high-quality publications from prestigious conferences and journals. It aims to offer a systematic understanding of LLM-based agents, covering topics such as tool use, planning, and feedback learning. The repository also includes unpublished papers with insightful analysis and novelty, marked for future updates. Users can explore a wide range of surveys, tool use cases, planning workflows, and benchmarks related to LLM agents.

Awesome-LLM-Interpretability
Awesome-LLM-Interpretability is a curated list of materials related to LLM (Large Language Models) interpretability, covering tutorials, code libraries, surveys, videos, papers, and blogs. It includes resources on transformer mechanistic interpretability, visualization, interventions, probing, fine-tuning, feature representation, learning dynamics, knowledge editing, hallucination detection, and redundancy analysis. The repository aims to provide a comprehensive overview of tools, techniques, and methods for understanding and interpreting the inner workings of large language models.

MedLLMsPracticalGuide
This repository serves as a practical guide for Medical Large Language Models (Medical LLMs) and provides resources, surveys, and tools for building, fine-tuning, and utilizing LLMs in the medical domain. It covers a wide range of topics including pre-training, fine-tuning, downstream biomedical tasks, clinical applications, challenges, future directions, and more. The repository aims to provide insights into the opportunities and challenges of LLMs in medicine and serve as a practical resource for constructing effective medical LLMs.

awesome-LLM-game-agent-papers
This repository provides a comprehensive survey of research papers on large language model (LLM)-based game agents. LLMs are powerful AI models that can understand and generate human language, and they have shown great promise for developing intelligent game agents. This survey covers a wide range of topics, including adventure games, crafting and exploration games, simulation games, competition games, cooperation games, communication games, and action games. For each topic, the survey provides an overview of the state-of-the-art research, as well as a discussion of the challenges and opportunities for future work.

OpenRedTeaming
OpenRedTeaming is a repository focused on red teaming for generative models, specifically large language models (LLMs). The repository provides a comprehensive survey on potential attacks on GenAI and robust safeguards. It covers attack strategies, evaluation metrics, benchmarks, and defensive approaches. The repository also implements over 30 auto red teaming methods. It includes surveys, taxonomies, attack strategies, and risks related to LLMs. The goal is to understand vulnerabilities and develop defenses against adversarial attacks on large language models.

Awesome-Embodied-Agent-with-LLMs
This repository, named Awesome-Embodied-Agent-with-LLMs, is a curated list of research related to Embodied AI or agents with Large Language Models. It includes various papers, surveys, and projects focusing on topics such as self-evolving agents, advanced agent applications, LLMs with RL or world models, planning and manipulation, multi-agent learning and coordination, vision and language navigation, detection, 3D grounding, interactive embodied learning, rearrangement, benchmarks, simulators, and more. The repository provides a comprehensive collection of resources for individuals interested in exploring the intersection of embodied agents and large language models.
For similar tasks

llm.hunyuan.T1
Hunyuan-T1 is a cutting-edge large-scale hybrid Mamba reasoning model driven by reinforcement learning. It has been officially released as an upgrade to the Hunyuan Thinker-1-Preview model. The model showcases exceptional performance in deep reasoning tasks, leveraging the TurboS base and Mamba architecture to enhance inference capabilities and align with human preferences. With a focus on reinforcement learning training, the model excels in various reasoning tasks across different domains, showcasing superior abilities in mathematical, logical, scientific, and coding reasoning. Through innovative training strategies and alignment with human preferences, Hunyuan-T1 demonstrates remarkable performance in public benchmarks and internal evaluations, positioning itself as a leading model in the field of reasoning.

ai-agent-papers
The AI Agents Papers repository provides a curated collection of papers focusing on AI agents, covering topics such as agent capabilities, applications, architectures, and presentations. It includes a variety of papers on ideation, decision making, long-horizon tasks, learning, memory-based agents, self-evolving agents, and more. The repository serves as a valuable resource for researchers and practitioners interested in AI agent technologies and advancements.

generative-models
Generative Models by Stability AI is a repository that provides various generative models for research purposes. It includes models like Stable Video 4D (SV4D) for video synthesis, Stable Video 3D (SV3D) for multi-view synthesis, SDXL-Turbo for text-to-image generation, and more. The repository focuses on modularity and implements a config-driven approach for building and combining submodules. It supports training with PyTorch Lightning and offers inference demos for different models. Users can access pre-trained models like SDXL-base-1.0 and SDXL-refiner-1.0 under a CreativeML Open RAIL++-M license. The codebase also includes tools for invisible watermark detection in generated images.

genai-os
Kuwa GenAI OS is an open, free, secure, and privacy-focused Generative-AI Operating System. It provides a multi-lingual turnkey solution for GenAI development and deployment on Linux and Windows. Users can enjoy features such as concurrent multi-chat, quoting, full prompt-list import/export/share, and flexible orchestration of prompts, RAGs, bots, models, and hardware/GPUs. The system supports various environments from virtual hosts to cloud, and it is open source, allowing developers to contribute and customize according to their needs.

Neurite
Neurite is an innovative project that combines chaos theory and graph theory to create a digital interface that explores hidden patterns and connections for creative thinking. It offers a unique workspace blending fractals with mind mapping techniques, allowing users to navigate the Mandelbrot set in real-time. Nodes in Neurite represent various content types like text, images, videos, code, and AI agents, enabling users to create personalized microcosms of thoughts and inspirations. The tool supports synchronized knowledge management through bi-directional synchronization between mind-mapping and text-based hyperlinking. Neurite also features FractalGPT for modular conversation with AI, local AI capabilities for multi-agent chat networks, and a Neural API for executing code and sequencing animations. The project is actively developed with plans for deeper fractal zoom, advanced control over node placement, and experimental features.

fast-stable-diffusion
Fast-stable-diffusion is a project that offers notebooks for RunPod, Paperspace, and Colab Pro adaptations with AUTOMATIC1111 Webui and Dreambooth. It provides tools for running and implementing Dreambooth, a stable diffusion project. The project includes implementations by XavierXiao and is sponsored by Runpod, Paperspace, and Colab Pro.

big-AGI
big-AGI is an AI suite designed for professionals seeking function, form, simplicity, and speed. It offers best-in-class Chats, Beams, and Calls with AI personas, visualizations, coding, drawing, side-by-side chatting, and more, all wrapped in a polished UX. The tool is powered by the latest models from 12 vendors and open-source servers, providing users with advanced AI capabilities and a seamless user experience. With continuous updates and enhancements, big-AGI aims to stay ahead of the curve in the AI landscape, catering to the needs of both developers and AI enthusiasts.

generative-ai
This repository contains codes related to Generative AI as per YouTube video. It includes various notebooks and files for different days covering topics like map reduce, text to SQL, LLM parameters, tagging, and Kaggle competition. The repository also includes resources like PDF files and databases for different projects related to Generative AI.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.