
LLM-IR-Bias-Fairness-Survey
This is the repo for the survey of Bias and Fairness in IR with LLMs.
Stars: 52

README:
This is the collection of papers related to bias and fairness in IR with LLMs. These papers are organized according to our survey paper Bias and Unfairness in Information Retrieval Systems: New Challenges in the LLM Era [PDF].
[2024/10/25] Our updated version tutorial proposal has been accepted by WSDM 2025 as a Half-Day Tutorial, see you in Hannover, Germany!
[2024/08/25] We provide a Lecture-Style Tutorial at KDD 2024 about "Bias and Unfairness in Information Retrieval Systems: New Challenges in the LLM Era"! For more details (e.g., slides and survey paper), please check it here. This survey is published with this tutorial as part of KDD 2024 proceedings.
Please feel free to contact us if you have any questions or suggestions!
If you find our work useful for your research, please cite our work:
@inproceedings{dai2024bias,
title={Bias and Unfairness in Information Retrieval Systems: New Challenges in the LLM Era},
author={Dai, Sunhao and Xu, Chen and Xu, Shicheng and Pang, Liang and Dong, Zhenhua and Xu, Jun},
booktitle={Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
pages={6437--6447},
year={2024}
}
@inproceedings{dai2025unifying,
title = {Unifying Bias and Unfairness in Information Retrieval: New Challenges in the LLM Era},
author = {Dai, Sunhao and Xu, Chen and Xu, Shicheng and Pang, Liang and Dong, Zhenhua and Xu, Jun},
booktitle = {Proceedings of the Eighteenth ACM International Conference on Web Search and Data Mining},
pages = {998–1001},
numpages = {4},
year = {2025},
}
In this survey, we provide a comprehensive review of emerging and pressing issues related to bias and unfairness in three key stages of the integration of LLMs into IR systems.
We introduce a unified framework to understand these issues as distribution mismatch problems and systematically categorize mitigation strategies into data sampling and distribution reconstruction approaches.
-
Neural Retrievers are Biased Towards LLM-Generated Content, KDD 2024. [Paper]
-
Invisible Relevance Bias: Text-Image Retrieval Models Prefer AI-Generated Images, SIGIR 2024. [Paper]
-
Blinded by Generated Contexts: How Language Models Merge Generated and Retrieved Contexts for Open-Domain QA?, ACL 2024. [Paper]
-
Perplexity-Trap: PLM-Based Retrievers Overrate Low Perplexity Documents, ICLR 2025. [Paper]
-
Source Echo Chamber: Exploring the Escalation of Source Bias in User, Data, and Recommender System Feedback Loop, SIGIR 2025. [Paper]
-
Mitigating Source Bias with LLM Alignment, SIGIR 2025. [Paper]
-
Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos, Preprint 2025. [Paper]
-
Textbooks Are All You Need , Preprint 2023. [Paper]
-
Measuring and Narrowing the Compositionality Gap in Language Models, Findings of EMNLP 2023 [Paper]
-
In-Context Retrieval-Augmented Language Models, TACL 2023 [Paper]
-
Search-in-the-Chain: Interactively Enhancing Large Language Models with Search for Knowledge-intensive Tasks, WWW 2024 [Paper]
-
List-aware Reranking-Truncation Joint Model for Search and Retrieval-augmented Generation, WWW 2024 [Paper]
-
Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation, Preprint 2024 [Paper]
-
Improving Language Models via Plug-and-Play Retrieval Feedback, Preprint 2024 [Paper]
-
Llama 2: Open Foundation and Fine-Tuned Chat Models, Preprint 2023 [Paper]
-
Unified Detoxifying and Debiasing in Language Generation via Inference-time Adaptive Optimization, ICLR 2023 [Paper]
-
Recitation-Augmented Language Models, ICLR 2023 [Paper]
-
Self-Consistency Improves Chain of Thought Reasoning in Language Models, ICLR 2023 [Paper]
-
Large Language Models are Zero-Shot Rankers for Recommender Systems, ECIR 2024. [Paper]
-
Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models, Preprint 2023. [Paper]
-
RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation, Preprint 2023. [Paper]
-
Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting, Preprint 2023. [Paper]
-
Exploring Large Language Model for Graph Data Understanding in Online Job Recommendations, Preprint 2023. [Paper]
-
Large Language Models are Not Stable Recommender Systems, Preprint 2023. [Paper]
-
A Bi-Step Grounding Paradigm for Large Language Models in Recommendation Systems, Preprint 2023. [Paper]
-
Large Language Models as Zero-Shot Conversational Recommenders, CIKM 2023. [Paper]
-
Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation, EMNLP 2023. [Paper]
-
Understanding Biases in ChatGPT-based Recommender Systems: Provider Fairness, Temporal Stability, and Recency, Preprint 2024. [Paper]
-
ChatGPT for Conversational Recommendation: Refining Recommendations by Reprompting with Feedback, Preprint 2024. [Paper]
-
Cross-Task Generalization via Natural Language Crowdsourcing Instructions, ACL 2022 [Paper]
-
Multitask Prompted Training Enables Zero-Shot Task Generalization, ICLR 2022 [Paper]
-
Self-Instruct: Aligning Language Models with Self-Generated Instructions, ACL 2023 [Paper]
-
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation, TACL 2023 [Paper]
-
Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following, AAAI 2024 [Paper]
-
LongAlign: A Recipe for Long Context Alignment of Large Language Models, Preprint 2024. [Paper]
-
Data Engineering for Scaling Language Models to 128K Context, Preprint 2024. [Paper]
-
Large Language Models Are Not Robust Multiple Choice Selectors, ICLR 2024. [Paper]
-
Humans or LLMs as the Judge? A Study on Judgement Biases, Preprint 2024. [Paper]
-
Benchmarking Cognitive Biases in Large Language Models as Evaluators, Preprint 2023. [Paper]
-
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions, Preprint 2023. [Paper]
-
Large Language Models are not Fair Evaluators, Preprint 2023. [Paper]
-
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena, NeurIPS 2023. [Paper]
-
Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate, Preprint 2024. [Paper]
-
EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria, CHI 2024. [Paper]
-
LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores, Preprint 2023. [Paper]
-
Verbosity Bias in Preference Labeling by Large Language Models, Preprint 2023. [Paper]
-
Style Over Substance: Evaluation Biases for Large Language Models, Preprint 2023. [Paper]
-
An Empirical Study of LLM-as-a-Judge for LLM Evaluation: Fine-tuned Judge Models are Task-specific Classifiers, Preprint 2024. [Paper]
-
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment, Preprint 2023. [Paper]
-
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations, Preprint 2023. [Paper]
-
ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning, Preprint 2023. [Paper]
-
Teacher-Student Training for Debiasing: General Permutation Debiasing for Large Language Models, Preprint 2024. [Paper]
-
PRE: A Peer Review Based Large Language Model Evaluator, Preprint 2024. [Paper]
-
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators, Preprint 2024. [Paper]
-
LLM Evaluators Recognize and Favor Their Own Generations, Preprint 2024. [Paper]
-
Measuring and Mitigating Unintended Bias in Text Classification, AIES 2018. [Paper]
-
Gender-tuning: Empowering Fine-tuning for Debiasing Pre-trained Language Models, ACL 2023. [Paper]
-
Gender Bias in Neural Natural Language Processing, Preprint 2019. [Paper]
-
MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Moral Discussions, ACL 2023. [Paper]
-
SaFeRDialogues: Taking Feedback Gracefully after Conversational Safety Failures, ACL 2022. [Paper]
-
Do LLMs Implicitly Exhibit User Discrimination in Recommendation? An Empirical Study, Preprint 2023. [Paper]
-
Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation, Recsys 2023. [Paper]
-
Mitigating harm in language models with conditional-likelihood filtration, Preprint 2021. [Paper]
-
Exploring the limits of transfer learning with a unified text-to-text transformer, JMLR 2020. [Paper]
-
CFaiRLLM: Consumer Fairness Evaluation in Large-Language Model Recommender System, Preprint 2024. [Paper]
-
BLIND: Bias Removal With No Demographics, ACL 2023. [Paper]
-
Identifying and Reducing Gender Bias in Word-Level Language Models, NAACL 2019. [Paper]
-
Reducing Sentiment Bias in Language Models via Counterfactual Evaluation, Findings-EMNLP' 20. [Paper]
-
Reducing Gender Bias in Word-Level Language Models with a Gender-Equalizing Loss Function, ACL-workshop 2019. [Paper]
-
Bias of AI-Generated Content: An Examination of News Produced by Large Language Models, Preprint 2023. [Paper]
-
Educational Multi-Question Generation for Reading Comprehension, BEA-workshop 2022 [Paper]
-
Pseudo-Discrimination Parameters from Language Embeddings, Preprint 2024 [Paper]
-
Item-side Fairness of Large Language Model-based Recommendation System, WWW 2024 [Paper]
-
Bias of AI-generated content: an examination of news produced by large language models, Scientific Reports [Paper]
-
Generating Better Items for Cognitive Assessments Using Large Language Models, BEA-workshop 2023 [Paper]
-
Dynamically disentangling social bias from task-oriented representations with adversarial attack, NAACL 2021 [Paper]
-
Using In-Context Learning to Improve Dialogue Safety, EMNLP-findings 2023 [Paper]
-
Large pre-trained language models contain human-like biases of what is right and wrong to do, NML 2023 [Paper]
-
BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage, Preprint 2022 [Paper]
-
Balancing out Bias: Achieving Fairness Through Balanced Training, EMNLP 2022 [Paper]
-
Should We Attend More or Less? Modulating Attention for Fairness, Preprint 2023 [Paper]
-
Constitutional AI: Harmlessness from AI Feedback, reprint 2022 [Paper]
-
He is very intelligent, she is very beautiful? On Mitigating Social Biases in Language Modelling and Generation, ACL-findings 2021 [Paper]
-
Does Gender Matter? Towards Fairness in Dialogue Systems, COLING 2020 [Paper]
-
Training language models to follow instructions with human feedback, NeurIPS 2022 [Paper]
-
Never Too Late to Learn: Regularizing Gender Bias in Coreference Resolution, WSDM 2023 [Paper]
-
CFaiRLLM: Consumer Fairness Evaluation in Large-Language Model Recommender System, Preprint 2024 [Paper]
-
UP5: Unbiased Foundation Model for Fairness-aware Recommendation, EACL 2024 [Paper]
-
ADEPT: A DEbiasing PrompT Framework, AAAI 2023 [Paper]
-
Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation, Recsys 2023. [Paper]
-
Automatic Generation of Distractors for Fill-in-the-Blank Exercises with Round-Trip Neural Machine Translation, ACL-workshop2023. [Paper]
-
Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions, ACL 2023 [Paper]
-
Critic-Guided Decoding for Controlled Text Generation, ACL-finding 2023 [Paper]
-
Item-side Fairness of Large Language Model-based Recommendation System, WWW 2024 [Paper]
-
Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness, Preprint 2023 [Paper]
-
Understanding Biases in ChatGPT-based Recommender Systems: Provider Fairness, Temporal Stability, and Recency, Preprint 2024 [Paper]
-
A Preliminary Study of ChatGPT on News Recommendation: Personalization, Provider Fairness, Fake News, Preprint 2023 [Paper]
-
Estimating the Personality of White-Box Language Models, Preprint 2022 [Paper]
-
Tailoring Personality Traits in Large Language Models via Unsupervisedly-Built Personalized Lexicons, Preprint 2022 [Paper]
-
FairMonitor: A Four-Stage Automatic Framework for Detecting Stereotypes and Biases in Large Language Models, Preprint 2023 [Paper]
-
Evaluating and Inducing Personality in Pre-trained Language Models, NeurIPS 2023 [Paper]
-
Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models, Preprint 2023 [Paper]
-
Studying Large Language Model Generalization with Influence Functions, Preprint 2023 [Paper]
-
Towards Tracing Knowledge in Language Models Back to the Training Data, EMNLP findings 2023 [Paper]
-
Detecting Pretraining Data from Large Language Models, Preprint 2023 [Paper]
-
Watermarking Makes Language Models Radioactive, Preprint 2024 [Paper]
-
WASA: WAtermark-based Source Attribution for Large Language Model-Generated Data, Preprint 2023 [Paper]
-
User Behavior Simulation with Large Language Model based Agents, Preprint 2023 [Paper]
-
On Generative Agents in Recommendation, Preprint 2023 [Paper]
🎉👍 Please feel free to open an issue or make a pull request! 🎉👍
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for LLM-IR-Bias-Fairness-Survey
Similar Open Source Tools

awesome-LLM-AIOps
The 'awesome-LLM-AIOps' repository is a curated list of academic research and industrial materials related to Large Language Models (LLM) and Artificial Intelligence for IT Operations (AIOps). It covers various topics such as incident management, log analysis, root cause analysis, incident mitigation, and incident postmortem analysis. The repository provides a comprehensive collection of papers, projects, and tools related to the application of LLM and AI in IT operations, offering valuable insights and resources for researchers and practitioners in the field.

LLM-for-misinformation-research
LLM-for-misinformation-research is a curated paper list of misinformation research using large language models (LLMs). The repository covers methods for detection and verification, tools for fact-checking complex claims, decision-making and explanation, claim matching, post-hoc explanation generation, and other tasks related to combating misinformation. It includes papers on fake news detection, rumor detection, fact verification, and more, showcasing the application of LLMs in various aspects of misinformation research.

Awesome-System2-Reasoning-LLM
The Awesome-System2-Reasoning-LLM repository is dedicated to a survey paper titled 'From System 1 to System 2: A Survey of Reasoning Large Language Models'. It explores the development of reasoning Large Language Models (LLMs), their foundational technologies, benchmarks, and future directions. The repository provides resources and updates related to the research, tracking the latest developments in the field of reasoning LLMs.

prompt-in-context-learning
An Open-Source Engineering Guide for Prompt-in-context-learning from EgoAlpha Lab. 📝 Papers | ⚡️ Playground | 🛠 Prompt Engineering | 🌍 ChatGPT Prompt | ⛳ LLMs Usage Guide > **⭐️ Shining ⭐️:** This is fresh, daily-updated resources for in-context learning and prompt engineering. As Artificial General Intelligence (AGI) is approaching, let’s take action and become a super learner so as to position ourselves at the forefront of this exciting era and strive for personal and professional greatness. The resources include: _🎉Papers🎉_: The latest papers about _In-Context Learning_ , _Prompt Engineering_ , _Agent_ , and _Foundation Models_. _🎉Playground🎉_: Large language models(LLMs)that enable prompt experimentation. _🎉Prompt Engineering🎉_: Prompt techniques for leveraging large language models. _🎉ChatGPT Prompt🎉_: Prompt examples that can be applied in our work and daily lives. _🎉LLMs Usage Guide🎉_: The method for quickly getting started with large language models by using LangChain. In the future, there will likely be two types of people on Earth (perhaps even on Mars, but that's a question for Musk): - Those who enhance their abilities through the use of AIGC; - Those whose jobs are replaced by AI automation. 💎EgoAlpha: Hello! human👤, are you ready?

Awesome-LLMs-in-Graph-tasks
This repository is a collection of papers on leveraging Large Language Models (LLMs) in Graph Tasks. It provides a comprehensive overview of how LLMs can enhance graph-related tasks by combining them with traditional Graph Neural Networks (GNNs). The integration of LLMs with GNNs allows for capturing both structural and contextual aspects of nodes in graph data, leading to more powerful graph learning. The repository includes summaries of various models that leverage LLMs to assist in graph-related tasks, along with links to papers and code repositories for further exploration.

milvus
Milvus is an open-source vector database built to power embedding similarity search and AI applications. Milvus makes unstructured data search more accessible, and provides a consistent user experience regardless of the deployment environment. Milvus 2.0 is a cloud-native vector database with storage and computation separated by design. All components in this refactored version of Milvus are stateless to enhance elasticity and flexibility. For more architecture details, see Milvus Architecture Overview. Milvus was released under the open-source Apache License 2.0 in October 2019. It is currently a graduate project under LF AI & Data Foundation.

Awesome-LLM-Post-training
The Awesome-LLM-Post-training repository is a curated collection of influential papers, code implementations, benchmarks, and resources related to Large Language Models (LLMs) Post-Training Methodologies. It covers various aspects of LLMs, including reasoning, decision-making, reinforcement learning, reward learning, policy optimization, explainability, multimodal agents, benchmarks, tutorials, libraries, and implementations. The repository aims to provide a comprehensive overview and resources for researchers and practitioners interested in advancing LLM technologies.

Awesome-Efficient-AIGC
This repository, Awesome Efficient AIGC, collects efficient approaches for AI-generated content (AIGC) to cope with its huge demand for computing resources. It includes efficient Large Language Models (LLMs), Diffusion Models (DMs), and more. The repository is continuously improving and welcomes contributions of works like papers and repositories that are missed by the collection.

LLMFarm
LLMFarm is an iOS and MacOS app designed to work with large language models (LLM). It allows users to load different LLMs with specific parameters, test the performance of various LLMs on iOS and macOS, and identify the most suitable model for their projects. The tool is based on ggml and llama.cpp by Georgi Gerganov and incorporates sources from rwkv.cpp by saharNooby, Mia by byroneverson, and LlamaChat by alexrozanski. LLMFarm features support for MacOS (13+) and iOS (16+), various inferences and sampling methods, Metal compatibility (not supported on Intel Mac), model setting templates, LoRA adapters support, LoRA finetune support, LoRA export as model support, and more. It also offers a range of inferences including LLaMA, GPTNeoX, Replit, GPT2, Starcoder, RWKV, Falcon, MPT, Bloom, and others. Additionally, it supports multimodal models like LLaVA, Obsidian, and MobileVLM. Users can customize inference options through JSON files and access supported models for download.

chatgpt-auto-refresh
ChatGPT Auto Refresh is a userscript that keeps ChatGPT sessions fresh by eliminating network errors and Cloudflare checks. It removes the 10-minute time limit from conversations when Chat History is disabled, ensuring a seamless experience. The tool is safe, lightweight, and a time-saver, allowing users to keep their sessions alive without constant copy/paste/refresh actions. It works even in background tabs, providing convenience and efficiency for users interacting with ChatGPT. The tool relies on the chatgpt.js library and is compatible with various browsers using Tampermonkey, making it accessible to a wide range of users.

Awesome-Story-Generation
Awesome-Story-Generation is a repository that curates a comprehensive list of papers related to Story Generation and Storytelling, focusing on the era of Large Language Models (LLMs). The repository includes papers on various topics such as Literature Review, Large Language Model, Plot Development, Better Storytelling, Story Character, Writing Style, Story Planning, Controllable Story, Reasonable Story, and Benchmark. It aims to provide a chronological collection of influential papers in the field, with a focus on citation counts for LLMs-era papers and some earlier influential papers. The repository also encourages contributions and feedback from the community to improve the collection.

lobe-cli-toolbox
Lobe CLI Toolbox is an AI CLI Toolbox designed to enhance git commit and i18n workflow efficiency. It includes tools like Lobe Commit for generating Gitmoji-based commit messages and Lobe i18n for automating the i18n translation process. The toolbox also features Lobe label for automatically copying issues labels from a template repo. It supports features such as automatic splitting of large files, incremental updates, and customization options for the OpenAI model, API proxy, and temperature.

Awesome-Latent-CoT
This repository contains a regularly updated paper list for Large Language Models (LLMs) reasoning in latent space. Reasoning in latent space allows for more flexible and efficient thought representation beyond language tokens, bringing AI closer to human-like cognition. The repository covers various aspects of LLMs, including pre-training, supervised finetuning, analysis, interpretability, multimodal reasoning, and applications. It aims to showcase the advancements in reasoning with latent thoughts and continuous concepts in AI models.