Awesome_LLM_System-PaperList

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!

Stars: 184

Visit

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on LLMs inference and serving.

README:

description: >- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on LLMs inference and serving.

Awesome_LLM_System-PaperList

Survey

Paper	Keywords	Institute (first)	Publication	Others
Full Stack Optimization for Transformer Inference: a Survey	Hardware and software co-design	UCB	Arxiv
A survey of techniques for optimizing transformer inference	Transformer optimization	Iowa State Univeristy	Journal of Systems Architecture
A Survey on Model Compression for Large Language Models	Model Compression	UCSD	Arxiv
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems	Optimization technique: quant, pruning, continuous batching, virtual memory	CMU	Arxiv
LLM Inference Unveiled: Survey and Roofline Model Insights	Performance analysis	Infinigence-AI	Arxiv	LLMViewer
LLM Inference Serving: Survey of Recent Advances and Opportunities		Northeastern University	Arxiv
Efficient Large Language Models: A Survey		The Ohio State University	Transactions on Machine Learning Research

Framework

Paper/OpenSource Project	Keywords	Institute (first)	Publication	Others
DeepSpeed Infernce: Enabling Efficient Inference of Transformer Models at Unprecedented Scale	Deepspeed; Kerenl Fusion	MicroSoft	SC 2022	Github repo
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference	Deepspeed; Split fuse	MicroSoft	Arxiv	Github repo
Efficient Memory Management for Large Language Model Serving with PagedAttention	vLLM; pagedAttention	UCB	SOSP 2023	Github repo
TensorRT-LLM/FastTransformer		NVIDIA
lightLLM		Shanghai Artifcial Intelligence Laboratory
MLC LLM	TVM; Multi-platforms	MLC-Team
Text-Generation-Inference(TGI)		Huggingface
NanoFlow: Towards Optimal Large Language Model Serving Throughput	Distributed, Parallel, and Cluster Computing	University of Washington	Arxiv	Github repo
rtp-llm		Alibaba		Github repo
Efficiently Programming Large Language Models using SGLang	Agent Language	UCB	Arxiv	Github repo
HybridFlow: A Flexible and Efficient RLHF Framework	RLHF Training	ByteDance	Eurosys 2024	Github Repo
ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation	RLHF Training	THU	Arxiv	Github Repo
Enabling Parallelism Hot Switching for Efficient Training of Large Language Models		Peking University	SOSP 2024

Serving

Paper	Keywords	Institute (first)	Publication	Others
Fast Distributed Inference Serving for Large Language Models	Distributed inference serving	PKU	Arxiv
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving	Pipeline Parallel; Auto parallel	UCB	OSDI 2023	Github repo
Orca: A Distributed Serving System for Transformer-Based Generative Models	Continuous batching	Seoul National University	OSDI2022
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads	Multiple Decoding Heads	Princeton University	Arxiv	Github repo
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU	Consumer-grade GPU	SJTU	Arxiv	Github repo
LLM in a flash: Efficient Large Language Model Inference with Limited Memory	flash; Pruning	Apple	Arxiv
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline	Length Perception	NUS	NeurIPS 2023	Github repo
S3: Increasing GPU Utilization during Generative Inference for Higher Throughput		Harvard University	Arxiv
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving	Decouple	PKU	OSDI 2024
Splitwise: Efficient generative LLM inference using phase splitting	Decouple	UW	ISCA 2024	Track issue
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU	Single GPU	Stanford University	Arxiv	Github repo
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve	Decouple	GaTech	OSDI 2024
SpotServe: Serving Generative Large Language Models on Preemptible Instances	Preemptible GPU	CMU	ASPLOS 2024	Empty Github repo
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification	Tree-based Speculative	CMU	ASPLOS 2024
AttentionStore: Cost-effective Attention Reuse across Multi-turn Conversations in Large Language Model Serving	Cache the multi-turn prefill KV-cache in host-DRAM and SSD	NUS	ATC 2024
MuxServe: Flexible Multiplexing for Efficient Multiple LLM Serving	Use spatial-temporal multiplexing method to serve multi-LLMs	MMLab	Arxiv
PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference	KV Cache Compression	Shanghai Jiao Tong University	Arxiv
You Only Cache Once: Decoder-Decoder Architectures for Language Models	KV Cache	Microsoft Research	Arxiv
Better & Faster Large Language Models via Multi-token Prediction	Multi-token Prediction	Meta	Arxiv
ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference	Decouple	Hanyang University	ASPLOS 2024
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable	LLM Applications	SJTU	OSDI 2024
Fairness in Serving Large Language Models	Fairness; LLM Serving	UC Berkeley,Stanford University	OSDI 2024
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving	KV Cache	Moonshot AI	Github
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention	Pre-fillingfor Long-Context Dynamic Sparse Attention	Microsoft	Arxiv	Github repo
MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool	Memory Pool	Huawei	Arxiv
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management	sparisity	Seoul National University	OSDI 2024
Llumnix: Dynamic Scheduling for Large Language Model Serving	Preemptible GPU	Alibaba Group	OSDI 2024
PUZZLE: Efficiently Aligning Large Language Models through Light-Weight Context Switch	Multi-Agent	Tsinghua University	ATC 2024
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention	Sparsity; Long context	PKU	Arxiv
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference	Sparsity; Related token	MIT	ICML 2024
Accelerating Production LLMs with Combined Token/Embedding Speculators	Speculative decoding	IBM Research	Arxiv	Github repo
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference	KV Cache	Apple	Arxiv
Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU	Attention Saddles,KV cache	Shanghai Jiao Tong University	Arxiv
TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text	KV Cache for RAG	Moore Threads AI	Arxiv	Github repo
Efficient Streaming Language Models with Attention Sinks	StreamingLLM, Static sparsity	MIT	ICLR 2024	Github repo
H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models	sparsity attention	UT Austin	nips 2024
SparQ Attention: Bandwidth-Efficient LLM Inference	sparsity attention	GraphCore	ICML 2024
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention	sparsity attention	msra	arxiv
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval	Vector Retrieval	msra	arxiv
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion	KV Cache cross	University of Chicago	Eurosys
Epic: Efficient Position-Independent Context Caching for Serving Large Language Models	Position independent	PKU	arxiv
CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving	KV Cache compression	University of Chicage	sigcomm
SCOPE:OptimizingKey-Value Cache Compression in Long-context Generation	Separate handling of prefill and decoding KV Cache	SEU	arxiv 2024
FASTDECODE: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines	Heterogeneous pipelines	THU	arxiv 2024

Operating System

Paper	Keywords	Institute(first)	Publication	Others
AIOS: LLM Agent Operating System	OS; LLM Agent	Rutgers University	Arxiv

Transformer accelerate

Paper	Keywords	Institute (first)	Publication	Others
TurboTransformers: An Efficient GPU serving System For Transformer Models		Tencent	PPoPP 2021	Github repo
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness	FlashAttention; Online Softmax	Stanford University	NeurIPS 2023	Github repo
FlashAttention2: Faster Attention with Better Parallelism and Work Partitioning		Stanford University	Arxiv	Github repo
FlashDecoding++: Faster Large Language Model Inference on GPUs	Softmax with Unified Maximum Value	Tsinghua University	Mlsys 2024
FlashFFTConv: Efficient Convolutions for Long Sentences with Tensor Cores	FFT; TensorCore; Long Sentences	Stanford University	Arxiv	Github repo
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks		Georgia Institute of Technology	ASPLOS 2023
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs	Variable-Length Inputs	UCR	PPoPP 2022	Github repo
Fast Transformer Decoding: One Write-Head is All You Need	MQA	Google	Arxiv
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints	GQA	Google Research	ACL 2023
LightSeq: A High Performance Inference Library for Transformers		ByteDance	NAACL 2021	Github repo
LightSeq2: LightSeq2: Accelerated Training for Transformer-based Models on GPUs		ByteDance	SC 2022
Blockwise Parallel Transformer for Large Context Models	Blockwise transformer	UCB	NeurIPS 2023	Github repo
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention	Dynamic Memory Management	Microsoft Research India	Arxiv

Model Compression

Quant and Pruning

Paper	Keywords	Institute (first)	Publication	Others
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving		SJTU	mlsys 2024	Github repo
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference	Dynamic Compression	NVIDIA	Arxiv
Quant-LLM: Accelerating the Serving of Large Language Models via FP6-Centric Algorithm-System Co-Design on Modern GPUs	FP6	USYD	ATC 2024
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration	AWQ	MIT	mlsys 2024 bp
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity		Univeristy of Sydney	VLDB 2024	Github repo
CLLMs: Consistency Large Language Models	Consistency	Shanghai Jiao Tong University	Arxiv	Github repo
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers		ETH	ICLR 2023
Optimal Brain Damage(OBD)	Breaking work	AT&T bell	NIPS 2rd
Optimal Brain Surgeon:Extensions and performance comparisons	Breaking work	standford	NIPS 1993
WoodFisher: Efficient Second-Order Approximation for Neural Network Compression		ETH	NeurIPS 2020
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models		MIT	PMLR 2023
QuIP: 2-Bit Quantization of Large Language Models With Guarantees		Cornell University	NeurIPS 2023
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks		Cornell University	PMLR 2024
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models	VQ	MSRA	EMNLP 2024
GPTVQ: The Blessing of Dimensionality for LLM Quantization	VQ	Qualcomm AI Research	ICML 2024
PQCache: Product Quantization-based KVCache for Long Context LLM Inference	PQ	PKU	arxiv 2024
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval	ANNs	MSRA	arxiv 2024
Transformer-VQ: Linear-Time Transformers via Vector Quantization	VQ	Independent Researcher	ICLR 2024
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache	KVCache	Rice University	ICML 2024
QServe:W4A8KV4QuantizationandSystemCo-designforEfficientLLMServing	Algorithm and system codesign	MIT	Arxiv2024
QTIP: Quantization with Trellises and Incoherence Processing	VQ	Cornell University	2024 nips spotlight
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression	Improve STE	Yandex, HSE	2024 nips oral

Communication

Paper	Keywords	Institute (first)	Publication
Overlap communication with dependent compuation via Decompostion in Large Deep Learning Models	Overlap	Google	ASPLOS 2023
Efficiently scaling Transformer inference	Scaling	Google	Mlsys 2023
Centauri: Enabling efficient scheduling for communication-computation overlap in large model training via communication	communication partition	PKU	ASPLOS 2024

Energy

Paper	Keywords	Institute (first)	Publication	Others
Zeus: Understanding and Optimizing GPU energy Consumption of DNN Training		Yale University	NSDI 2023	Github repo
Power-aware Deep Learning Model Serving with μ-Serve		UIUC	ATC 2024
Characterizing Power Management Opportunities for LLMs in the Cloud	LLM	Microsoft Azure	ASPLOS 2024
DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency	LLM Serving Cluster	UIUC	Arxiv

Decentralized

Paper	Keywords	Institute (first)	Publication	Others
FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs	Consumer-grade GPU	HKBU	Arxiv
Petals: Collaborative Inference and Fine-tuning of Large Models		Yandex	Arxiv

Serveless

Paper	Keywords	Institute (first)	Publication	Others
ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models	cold boot	The University of Edinburgh	OSDI 2024	Empty Github
StreamBox: A Lightweight GPU SandBox for Serverless Inference Workflow		HUST	ATC 2024	Github

Trace

Paper	Keywords	Institute (first)	Publication	Others
Characterization of Large Language Model Development in the Datacenter	Cluster trace(for LLM)	ShangHai AI Lab	NSDI 2024	Github
BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems	GPT users trace	HKUSTGZ	Arxiv 2024	Github
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving	Disaggregated trace	Moonshot AI	Github
Splitwise: Efficient generative LLM inference using phase splitting	Disaggregated trace	UW and microsoft	ISCA 2024	Github Trace

For Tasks:

Click tags to check more tools for each tasks

analyze papers optimize inference manage memory accelerate serving compress models

For Jobs:

researcher data scientist machine learning engineer ai researcher software developer

Alternative AI tools for Awesome_LLM_System-PaperList

Similar Open Source Tools

Awesome_LLM_System-PaperList

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on LLMs inference and serving.

github

: 184

Awesome-LLM4IE-Papers

github

: 645

Awesome-LLM-Papers-Comprehensive-Topics

github

: 172

LLM4EC

LLM4EC is an interdisciplinary research repository focusing on the intersection of Large Language Models (LLM) and Evolutionary Computation (EC). It provides a comprehensive collection of papers and resources exploring various applications, enhancements, and synergies between LLM and EC. The repository covers topics such as LLM-assisted optimization, EA-based LLM architecture search, and applications in code generation, software engineering, neural architecture search, and other generative tasks. The goal is to facilitate research and development in leveraging LLM and EC for innovative solutions in diverse domains.

github

: 78

Awesome-Resource-Efficient-LLM-Papers

A curated list of high-quality papers on resource-efficient Large Language Models (LLMs) with a focus on various aspects such as architecture design, pre-training, fine-tuning, inference, system design, and evaluation metrics. The repository covers topics like efficient transformer architectures, non-transformer architectures, memory efficiency, data efficiency, model compression, dynamic acceleration, deployment optimization, support infrastructure, and other related systems. It also provides detailed information on computation metrics, memory metrics, energy metrics, financial cost metrics, network communication metrics, and other metrics relevant to resource-efficient LLMs. The repository includes benchmarks for evaluating the efficiency of NLP models and references for further reading.

github

: 105

ai-reference-models

The Intel® AI Reference Models repository contains links to pre-trained models, sample scripts, best practices, and tutorials for popular open-source machine learning models optimized by Intel to run on Intel® Xeon® Scalable processors and Intel® Data Center GPUs. The purpose is to quickly replicate complete software environments showcasing the AI capabilities of Intel platforms. It includes optimizations for popular deep learning frameworks like TensorFlow and PyTorch, with additional plugins/extensions for improved performance. The repository is licensed under Apache License Version 2.0.

github

: 676

models

The Intel® AI Reference Models repository contains links to pre-trained models, sample scripts, best practices, and tutorials for popular open-source machine learning models optimized by Intel to run on Intel® Xeon® Scalable processors and Intel® Data Center GPUs. It aims to replicate the best-known performance of target model/dataset combinations in optimally-configured hardware environments. The repository will be deprecated upon the publication of v3.2.0 and will no longer be maintained or published.

github

: 669

Cool-GenAI-Fashion-Papers

Cool-GenAI-Fashion-Papers is a curated list of resources related to GenAI-Fashion, including papers, workshops, companies, and products. It covers a wide range of topics such as fashion design synthesis, outfit recommendation, fashion knowledge extraction, trend analysis, and more. The repository provides valuable insights and resources for researchers, industry professionals, and enthusiasts interested in the intersection of AI and fashion.

github

: 129

LLM-KG4QA

LLM-KG4QA is a repository focused on the integration of Large Language Models (LLMs) and Knowledge Graphs (KGs) for Question Answering (QA). It covers various aspects such as using KGs as background knowledge, reasoning guideline, and refiner/filter. The repository provides detailed information on pre-training, fine-tuning, and Retrieval Augmented Generation (RAG) techniques for enhancing QA performance. It also explores complex QA tasks like Explainable QA, Multi-Modal QA, Multi-Document QA, Multi-Hop QA, Multi-run and Conversational QA, Temporal QA, Multi-domain and Multilingual QA, along with advanced topics like Optimization and Data Management. Additionally, it includes benchmark datasets, industrial and scientific applications, demos, and related surveys in the field.

github

: 80

ai-game-development-tools

Here we will keep track of the AI Game Development Tools, including LLM, Agent, Code, Writer, Image, Texture, Shader, 3D Model, Animation, Video, Audio, Music, Singing Voice and Analytics. 🔥 * Tool (AI LLM) * Game (Agent) * Code * Framework * Writer * Image * Texture * Shader * 3D Model * Avatar * Animation * Video * Audio * Music * Singing Voice * Speech * Analytics * Video Tool

github

: 312

ai-game-devtools

github

: 735

AudioLLM

AudioLLMs is a curated collection of research papers focusing on developing, implementing, and evaluating language models for audio data. The repository aims to provide researchers and practitioners with a comprehensive resource to explore the latest advancements in AudioLLMs. It includes models for speech interaction, speech recognition, speech translation, audio generation, and more. Additionally, it covers methodologies like multitask audioLLMs and segment-level Q-Former, as well as evaluation benchmarks like AudioBench and AIR-Bench. Adversarial attacks such as VoiceJailbreak are also discussed.

github

: 71

Awesome-AgenticLLM-RL-Papers

This repository serves as the official source for the survey paper 'The Landscape of Agentic Reinforcement Learning for LLMs: A Survey'. It provides an extensive overview of various algorithms, methods, and frameworks related to Agentic RL, including detailed information on different families of algorithms, their key mechanisms, objectives, and links to relevant papers and resources. The repository covers a wide range of tasks such as Search & Research Agent, Code Agent, Mathematical Agent, GUI Agent, RL in Vision Agents, RL in Embodied Agents, and RL in Multi-Agent Systems. Additionally, it includes information on environments, frameworks, and methods suitable for different tasks related to Agentic RL and LLMs.

github

: 245

Azure-AIGEN-demos

Microsoft Foundry is a unified Azure platform-as-a-service offering for enterprise AI operations, model builders, and application development. This foundation combines production-grade infrastructure with friendly interfaces, enabling developers to focus on building applications rather than managing infrastructure. Microsoft Foundry unifies agents, models, and tools under a single management grouping with built-in enterprise-readiness capabilities including tracing, monitoring, evaluations, and customizable enterprise setup configurations. The platform provides streamlined management through unified Role-based access control (RBAC), networking, and policies under one Azure resource provider namespace.

github

: 746

LLM-for-Healthcare

The repository 'LLM-for-Healthcare' provides a comprehensive survey of large language models (LLMs) for healthcare, covering data, technology, applications, and accountability and ethics. It includes information on various LLM models, training data, evaluation methods, and computation costs. The repository also discusses tasks such as NER, text classification, question answering, dialogue systems, and generation of medical reports from images in the healthcare domain.

github

: 96

GenAI-Learning

GenAI-Learning is a repository dedicated to providing resources and courses for individuals interested in Generative AI. It covers a wide range of topics from prompt engineering to user-centered design, offering courses on LLM Bootcamp, DeepLearning AI, Microsoft Copilot Learning, Amazon Generative AI, Google Cloud Skills, NVIDIA Learn, Oracle Cloud, and IBM AI Learn. The repository includes detailed course descriptions, partners, and topics for each course, making it a valuable resource for AI enthusiasts and professionals.

github

: 108

For similar tasks

Awesome_LLM_System-PaperList

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on LLMs inference and serving.

github

: 184

Awesome-LLM-RAG

This repository, Awesome-LLM-RAG, aims to record advanced papers on Retrieval Augmented Generation (RAG) in Large Language Models (LLMs). It serves as a resource hub for researchers interested in promoting their work related to LLM RAG by updating paper information through pull requests. The repository covers various topics such as workshops, tutorials, papers, surveys, benchmarks, retrieval-enhanced LLMs, RAG instruction tuning, RAG in-context learning, RAG embeddings, RAG simulators, RAG search, RAG long-text and memory, RAG evaluation, RAG optimization, and RAG applications.

github

: 733

LLM-Tool-Survey

This repository contains a collection of papers related to tool learning with large language models (LLMs). The papers are organized according to the survey paper 'Tool Learning with Large Language Models: A Survey'. The survey focuses on the benefits and implementation of tool learning with LLMs, covering aspects such as task planning, tool selection, tool calling, response generation, benchmarks, evaluation, challenges, and future directions in the field. It aims to provide a comprehensive understanding of tool learning with LLMs and inspire further exploration in this emerging area.

github

: 220

Awesome-CVPR2024-ECCV2024-AIGC

A Collection of Papers and Codes for CVPR 2024 AIGC. This repository compiles and organizes research papers and code related to CVPR 2024 and ECCV 2024 AIGC (Artificial Intelligence and Graphics Computing). It serves as a valuable resource for individuals interested in the latest advancements in the field of computer vision and artificial intelligence. Users can find a curated list of papers and accompanying code repositories for further exploration and research. The repository encourages collaboration and contributions from the community through stars, forks, and pull requests.

github

: 427

LLMs-in-science

The 'LLMs-in-science' repository is a collaborative environment for organizing papers related to large language models (LLMs) and autonomous agents in the field of chemistry. The goal is to discuss trend topics, challenges, and the potential for supporting scientific discovery in the context of artificial intelligence. The repository aims to maintain a systematic structure of the field and welcomes contributions from the community to keep the content up-to-date and relevant.

github

: 103

Awesome-Papers-Autonomous-Agent

Awesome-Papers-Autonomous-Agent is a curated collection of recent papers focusing on autonomous agents, specifically interested in RL-based agents and LLM-based agents. The repository aims to provide a comprehensive resource for researchers and practitioners interested in intelligent agents that can achieve goals, acquire knowledge, and continually improve. The collection includes papers on various topics such as instruction following, building agents based on world models, using language as knowledge, leveraging LLMs as a tool, generalization across tasks, continual learning, combining RL and LLM, transformer-based policies, trajectory to language, trajectory prediction, multimodal agents, training LLMs for generalization and adaptation, task-specific designing, multi-agent systems, experimental analysis, benchmarking, applications, algorithm design, and combining with RL.

github

: 521

awesome-lifelong-llm-agent

This repository is a collection of papers and resources related to Lifelong Learning of Large Language Model (LLM) based Agents. It focuses on continual learning and incremental learning of LLM agents, identifying key modules such as Perception, Memory, and Action. The repository serves as a roadmap for understanding lifelong learning in LLM agents and provides a comprehensive overview of related research and surveys.

github

: 55

LLM-Agent-Survey

LLM-Agent-Survey is a comprehensive repository that provides a curated list of papers related to Large Language Model (LLM) agents. The repository categorizes papers based on LLM-Profiled Roles and includes high-quality publications from prestigious conferences and journals. It aims to offer a systematic understanding of LLM-based agents, covering topics such as tool use, planning, and feedback learning. The repository also includes unpublished papers with insightful analysis and novelty, marked for future updates. Users can explore a wide range of surveys, tool use cases, planning workflows, and benchmarks related to LLM agents.

github

: 113

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 1.1k

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 3.5k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 32.9k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675