awesome-llm-understanding-mechanism
awesome papers in LLM interpretability
Stars: 291
This repository is a collection of papers focused on understanding the internal mechanism of large language models (LLM). It includes research on topics such as how LLMs handle multilingualism, learn in-context, and handle factual associations. The repository aims to provide insights into the inner workings of transformer-based language models through a curated list of papers and surveys.
README:
Focusing on: understanding the internal mechanism of large language models (LLM). (keep updating ...)
Conference paper recommendation: please contact me.
https://transformer-circuits.pub/2023/interpretability-dreams/index.html
https://www.lesswrong.com/posts/X2i9dQQK3gETCyqh2/chris-olah-s-views-on-agi-safety
Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis. [pdf] [EMNLP 2024] [2024.9]
Scaling and evaluating sparse autoencoders. [pdf] [OpenAI] [2024.6]
How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning. [pdf] [EMNLP 2024] [2024.6]
Neuron-Level Knowledge Attribution in Large Language Models. [pdf] [EMNLP 2024] [2024.6]
Locating and Editing Factual Associations in Mamba. [pdf] [COLM 2024] [2024.4]
Chain-of-Thought Reasoning Without Prompting. [pdf] [Deepmind] [2024.2]
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking. [pdf] [ICLR 2024] [2024.2]
Long-form evaluation of model editing. [pdf] [NAACL 2024] [2024.2]
What does the Knowledge Neuron Thesis Have to do with Knowledge? [pdf] [ICLR 2024] [2023.11]
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks. [pdf] [ICLR 2024] [2023.11]
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. [blog] [Anthropic] [2023.10]
Interpreting CLIP's Image Representation via Text-Based Decomposition. [pdf] [ICLR 2024] [2023.10]
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods. [pdf] [ICLR 2024] [2023.10]
Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level. [blog] [Deepmind] [2023.12]
Successor Heads: Recurring, Interpretable Attention Heads In The Wild. [pdf] [ICLR 2024] [2023.12]
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. [blog] [Anthropic] [2023.10]
Impact of Co-occurrence on Factual Knowledge of Large Language Models. [pdf] [EMNLP 2023] [2023.10]
Function vectors in large language models. [pdf] [ICLR 2024] [2023.10]
Can Large Language Models Explain Themselves? [pdf] [2023.10]
Neurons in Large Language Models: Dead, N-gram, Positional. [pdf] [ACL 2024] [2023.9]
Sparse Autoencoders Find Highly Interpretable Features in Language Models. [pdf] [ICLR 2024] [2023.9]
Do Machine Learning Models Memorize or Generalize? [blog] [2023.8]
Overthinking the Truth: Understanding how Language Models Process False Demonstrations. [pdf] [2023.7]
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning. [pdf] [EMNLP 2023 best paper] [2023.5]
Let's Verify Step by Step. [pdf] [ICLR 2024] [2023.5]
What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning. [pdf] [ACL 2023] [2023.5]
Language models can explain neurons in language models. [blog] [OpenAI] [2023.5]
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis [pdf] [EMNLP 2023] [2023.5]
Dissecting Recall of Factual Associations in Auto-Regressive Language Models. [pdf] [EMNLP 2023] [2023.4]
Are Emergent Abilities of Large Language Models a Mirage? [pdf] [NeurIPS 2023 best paper] [2023.4]
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression. [pdf] [2023.4]
Towards automated circuit discovery for mechanistic interpretability. [pdf] [NeurIPS 2023] [2023.4]
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model. [pdf] [NeurIPS 2023] [2023.4]
A Theory of Emergent In-Context Learning as Implicit Structure Induction. [pdf] [2023.3]
Larger language models do in-context learning differently. [pdf] [Google Research] [2023.3]
Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models. [pdf] [NeurIPs 2023] [2023.1]
Transformers as Algorithms: Generalization and Stability in In-context Learning. [pdf] [ICML 2023] [2023.1]
Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers. [pdf] [ACL 2023] [2022.12]
How does gpt obtain its ability? tracing emergent abilities of language models to their sources. [blog] [2022.12]
Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters. [pdf] [ACL 2023] [2022.12]
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small. [pdf] [ICLR 2023] [2022.11]
Inverse scaling can become U-shaped. [pdf] [EMNLP 2023] [2022.11]
What learning algorithm is in-context learning? Investigations with linear models. [pdf] [ICLR 2023] [2022.11]
Mass-Editing Memory in a Transformer. [pdf] [ICLR 2023] [2022.10]
Polysemanticity and Capacity in Neural Networks. [pdf] [2022.10]
Analyzing Transformers in Embedding Space. [pdf] [ACL 2023] [2022.9]
Toy Models of Superposition. [blog] [Anthropic] [2022.9]
Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango. [pdf] [2022.9]
Emergent Abilities of Large Language Models. [pdf] [Google Research] [2022.6]
Mechanistic Interpretability, Variables, and the Importance of Interpretable Bases. [blog] [Anthropic] [2022.6]
Towards Tracing Factual Knowledge in Language Models Back to the Training Data. [pdf] [EMNLP 2022] [2022.5]
Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations. [pdf] [EMNLP 2022] [2022.5]
Large Language Models are Zero-Shot Reasoners. [pdf] [NeurIPS 2022] [2022.5]
Scaling Laws and Interpretability of Learning from Repeated Data. [pdf] [Anthropic] [2022.5]
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space. [pdf] [EMNLP 2022] [2022.3]
In-context Learning and Induction Heads. [blog] [Anthropic] [2022.3]
Locating and Editing Factual Associations in GPT. [pdf] [NeurIPS 2022] [2022.2]
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? [pdf] [EMNLP 2022] [2022.2]
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets. [pdf] [OpenAI & Google] [2022.1]
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. [pdf] [NeurIPS 2022] [2022.1]
A Mathematical Framework for Transformer Circuits. [blog] [Anthropic] [2021.12]
An Explanation of In-context Learning as Implicit Bayesian Inference. [pdf] [ICLR 2022] [2021.11]
Towards a Unified View of Parameter-Efficient Transfer Learning. [pdf] [ICLR 2022] [2021.10]
Do Prompt-Based Models Really Understand the Meaning of their Prompts? [pdf] [NAACL 2022] [2021.9]
Deduplicating Training Data Makes Language Models Better. [pdf] [ACL 2022] [2021.7]
LoRA: Low-Rank Adaptation of Large Language Models. [pdf] [ICLR 2022] [2021.6]
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. [pdf] [ACL 2022] [2021.4]
The Power of Scale for Parameter-Efficient Prompt Tuning. [pdf] [EMNLP 2021] [2021.4]
Calibrate Before Use: Improving Few-Shot Performance of Language Models [pdf] [ICML 2021] [2021.2]
Prefix-Tuning: Optimizing Continuous Prompts for Generation. [pdf] [ACL 2021] [2021.1]
Transformer Feed-Forward Layers Are Key-Value Memories. [pdf] [EMNLP 2021] [2020.12]
Scaling Laws for Neural Language Models. [pdf] [OpenAI] [2020.1]
Mechanistic Interpretability for AI Safety A Review. [pdf] [2024.8]
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models. [pdf] [2024.7]
Internal Consistency and Self-Feedback in Large Language Models: A Survey. [pdf] [2024.7]
A Primer on the Inner Workings of Transformer-based Language Models. [pdf] [2024.5] [interpretability]
Usable XAI: 10 strategies towards exploiting explainability in the LLM era. [pdf] [2024.3] [interpretability]
A Comprehensive Overview of Large Language Models. [pdf] [2023.12] [LLM]
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. [pdf] [2023.11] [hallucination]
A Survey of Large Language Models. [pdf] [2023.11] [LLM]
Explainability for Large Language Models: A Survey. [pdf] [2023.11] [interpretability]
A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future. [pdf] [2023.10] [chain of thought]
Instruction tuning for large language models: A survey. [pdf] [2023.10] [instruction tuning]
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning. [pdf] [2023.9] [instruction tuning]
Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models. [pdf] [2023.9] [hallucination]
Reasoning with language model prompting: A survey. [pdf] [2023.9] [reasoning]
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks. [pdf] [2023.8] [interpretability]
A Survey on In-context Learning. [pdf] [2023.6] [in-context learning]
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning. [pdf] [2023.3] [parameter-efficient fine-tuning]
-
https://github.com/ruizheliUOA/Awesome-Interpretability-in-Large-Language-Models (interpretability)
-
https://github.com/cooperleong00/Awesome-LLM-Interpretability?tab=readme-ov-file (interpretability)
-
https://github.com/JShollaj/awesome-llm-interpretability (interpretability)
-
https://github.com/IAAR-Shanghai/Awesome-Attention-Heads (attention)
-
https://github.com/zjunlp/KnowledgeEditingPapers (model editing)
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for awesome-llm-understanding-mechanism
Similar Open Source Tools
awesome-llm-understanding-mechanism
This repository is a collection of papers focused on understanding the internal mechanism of large language models (LLM). It includes research on topics such as how LLMs handle multilingualism, learn in-context, and handle factual associations. The repository aims to provide insights into the inner workings of transformer-based language models through a curated list of papers and surveys.
AwesomeLLM4APR
Awesome LLM for APR is a repository dedicated to exploring the capabilities of Large Language Models (LLMs) in Automated Program Repair (APR). It provides a comprehensive collection of research papers, tools, and resources related to using LLMs for various scenarios such as repairing semantic bugs, security vulnerabilities, syntax errors, programming problems, static warnings, self-debugging, type errors, web UI tests, smart contracts, hardware bugs, performance bugs, API misuses, crash bugs, test case repairs, formal proofs, GitHub issues, code reviews, motion planners, human studies, and patch correctness assessments. The repository serves as a valuable reference for researchers and practitioners interested in leveraging LLMs for automated program repair.
Efficient_Foundation_Model_Survey
Efficient Foundation Model Survey is a comprehensive analysis of resource-efficient large language models (LLMs) and multimodal foundation models. The survey covers algorithmic and systemic innovations to support the growth of large models in a scalable and environmentally sustainable way. It explores cutting-edge model architectures, training/serving algorithms, and practical system designs. The goal is to provide insights on tackling resource challenges posed by large foundation models and inspire future breakthroughs in the field.
AI-resources
AI-resources is a repository containing links to various resources for learning Artificial Intelligence. It includes video lectures, courses, tutorials, and open-source libraries related to deep learning, reinforcement learning, machine learning, and more. The repository categorizes resources for beginners, average users, and advanced users/researchers, providing a comprehensive collection of materials to enhance knowledge and skills in AI.
Awesome-LLM-Compression
Awesome LLM compression research papers and tools to accelerate LLM training and inference.
LLM_MultiAgents_Survey_Papers
This repository maintains a list of research papers on LLM-based Multi-Agents, categorized into five main streams: Multi-Agents Framework, Multi-Agents Orchestration and Efficiency, Multi-Agents for Problem Solving, Multi-Agents for World Simulation, and Multi-Agents Datasets and Benchmarks. The repository also includes a survey paper on LLM-based Multi-Agents and a table summarizing the key findings of the survey.
Efficient-LLMs-Survey
This repository provides a systematic and comprehensive review of efficient LLMs research. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient LLMs topics from **model-centric** , **data-centric** , and **framework-centric** perspective, respectively. We hope our survey and this GitHub repository can serve as valuable resources to help researchers and practitioners gain a systematic understanding of the research developments in efficient LLMs and inspire them to contribute to this important and exciting field.
Awesome-Code-LLM
Analyze the following text from a github repository (name and readme text at end) . Then, generate a JSON object with the following keys and provide the corresponding information for each key, in lowercase letters: 'description' (detailed description of the repo, must be less than 400 words,Ensure that no line breaks and quotation marks.),'for_jobs' (List 5 jobs suitable for this tool,in lowercase letters), 'ai_keywords' (keywords of the tool,user may use those keyword to find the tool,in lowercase letters), 'for_tasks' (list of 5 specific tasks user can use this tool to do,in lowercase letters), 'answer' (in english languages)
awesome-AIOps
awesome-AIOps is a curated list of academic researches and industrial materials related to Artificial Intelligence for IT Operations (AIOps). It includes resources such as competitions, white papers, blogs, tutorials, benchmarks, tools, companies, academic materials, talks, workshops, papers, and courses covering various aspects of AIOps like anomaly detection, root cause analysis, incident management, microservices, dependency tracing, and more.
lobe-cli-toolbox
Lobe CLI Toolbox is an AI CLI Toolbox designed to enhance git commit and i18n workflow efficiency. It includes tools like Lobe Commit for generating Gitmoji-based commit messages and Lobe i18n for automating the i18n translation process. The toolbox also features Lobe label for automatically copying issues labels from a template repo. It supports features such as automatic splitting of large files, incremental updates, and customization options for the OpenAI model, API proxy, and temperature.
awesome-LLM-AIOps
The 'awesome-LLM-AIOps' repository is a curated list of academic research and industrial materials related to Large Language Models (LLM) and Artificial Intelligence for IT Operations (AIOps). It covers various topics such as incident management, log analysis, root cause analysis, incident mitigation, and incident postmortem analysis. The repository provides a comprehensive collection of papers, projects, and tools related to the application of LLM and AI in IT operations, offering valuable insights and resources for researchers and practitioners in the field.
do-research-in-AI
This repository is a collection of research lectures and experience sharing posts from frontline researchers in the field of AI. It aims to help individuals upgrade their research skills and knowledge through insightful talks and experiences shared by experts. The content covers various topics such as evaluating research papers, choosing research directions, research methodologies, and tips for writing high-quality scientific papers. The repository also includes discussions on academic career paths, research ethics, and the emotional aspects of research work. Overall, it serves as a valuable resource for individuals interested in advancing their research capabilities in the field of AI.
Awesome-Quantization-Papers
This repo contains a comprehensive paper list of **Model Quantization** for efficient deep learning on AI conferences/journals/arXiv. As a highlight, we categorize the papers in terms of model structures and application scenarios, and label the quantization methods with keywords.
Awesome-Robotics-3D
Awesome-Robotics-3D is a curated list of 3D Vision papers related to Robotics domain, focusing on large models like LLMs/VLMs. It includes papers on Policy Learning, Pretraining, VLM and LLM, Representations, and Simulations, Datasets, and Benchmarks. The repository is maintained by Zubair Irshad and welcomes contributions and suggestions for adding papers. It serves as a valuable resource for researchers and practitioners in the field of Robotics and Computer Vision.
LLM-for-misinformation-research
LLM-for-misinformation-research is a curated paper list of misinformation research using large language models (LLMs). The repository covers methods for detection and verification, tools for fact-checking complex claims, decision-making and explanation, claim matching, post-hoc explanation generation, and other tasks related to combating misinformation. It includes papers on fake news detection, rumor detection, fact verification, and more, showcasing the application of LLMs in various aspects of misinformation research.
For similar tasks
awesome-llm-understanding-mechanism
This repository is a collection of papers focused on understanding the internal mechanism of large language models (LLM). It includes research on topics such as how LLMs handle multilingualism, learn in-context, and handle factual associations. The repository aims to provide insights into the inner workings of transformer-based language models through a curated list of papers and surveys.
Foundations-of-LLMs
Foundations-of-LLMs is a comprehensive book aimed at readers interested in large language models, providing systematic explanations of foundational knowledge and introducing cutting-edge technologies. The book covers traditional language models, evolution of large language model architectures, prompt engineering, parameter-efficient fine-tuning, model editing, and retrieval-enhanced generation. Each chapter uses an animal as a theme to explain specific technologies, enhancing readability. The content is based on the author team's exploration and understanding of the field, with continuous monthly updates planned. The book includes a 'Paper List' for each chapter to track the latest advancements in related technologies.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.