 
                Awesome-LVLM-Hallucination
up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources
Stars: 93
 
    
README:
Even though the world has seen the imersive capabilities of large vision language models, particularly in zero-shot inference, such models struggle with hallucinations, which can be referred to as the generation of text with information that is not present in the visual input. Lots of research work is going on to tackle this problem, such as hallucinated objects, inaccurate attributes and relationships, unfaithful descriptions, and so on. Possible reasons behind this could be language prior, insufficient visual context, biases and misinformation in the training dataset, and lot more.
This repository will provide an organized list of state-of-the-art research papers, relevant code, and a brief description related to hallucinations of the Large-Vision-Language Model (LVLM), also known as the Multimodal Large Language Model (MLLM).
The main intention of this project is to provide a platform where all the research work in the field of hallucination in LVLMs is accessed in a constructive way. If you have any suggestions for intersecting work within this field, kindly contribute them by raising an open issue. I am looking forward to fruitful discussion and learning!
- 
CHAIR: Object Hallucination in Image Captioning (EMNLP 2018) - Introduce problem of object hallucination on MSCOCO image captioning task
- CHAIR metrics [built upon unique 80 MSCOCO dataset objects]
 
- 
POPE: Evaluating Object Hallucination in Large Vision-Language Models (EMNLP 2023) - Object existence hallucination [Yes/No]
- Random, Popular and Adversial settings on MSCOCO dataset
 
- 
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models (23 June, 2023) - MME benchmark covers the evaluation of MLLM's perception and cognition abilities
- Perception (Coase-Grained): 4; Perception (Fine-Grained): 5; Perception (OCR): 1; Cognition (Reasoning): 4; [Total 14 subtasks]
- Answer in Yes/No format for easy evaluation & 30 advanced MLLMs are benchmarked
 
- 
M-HalDetect: Detecting and Preventing Hallucinations in Large Vision Language Models (AAAI 2024) - Hallucination detection dataset with fine-grained annotations [accurate, inaccurate and analysis]
- Fine-grained Direct Preference Optimization (FDPO) technique and reward model dataset
- High correlation of reward model score with human evaluation
 
- 
HaELM: Evaluation and Analysis of Hallucination in Large Vision-Language Models (29 August, 2023) - Discussed LVLMs tendency to response as 'Yes' to judgement type queries
- Use of ChatGPT to collect hallucination data via iterative prompt modification
- Open-source LLM trained over this dataset for evaluation of LVLM's response
- Evaluation results on various LVLMs, Generation length and Top-K of sampling
 
- 
CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning (NeurIPS 2023 Workshop) - Automatic construction of question-answer pair with based on dataset with caption annotation using ChatGPT [Yes/No QA pair] and automatic pipeline for evaluation
- Constractive instruction tuning (CIT) with Factual and Constractive QA pairs with Chain-of-Thought (CoT) justification
 
- 
CAST: Cross-modal Alignment Similarity Test for Vision Language Models (17 September, 2024) - Proposed CAST as a way to measure the self-consistency of LVLMs across different modalities.
- This works in two stage, in the first stage the models generate similarities/true statements comparing two inputs, and in the second stage the model judges its own output for truthfulness.
 
- 
MMHAL-BENCH: Aligning Large Multimodal Models with Factually Augmented RLHF (25 September, 2023) - Introduced novel algorithm called Factually Augmented RLHF (Fact-RLHF) to alleviate the reward hacking phenomenon in RLHF
- Developed evaluation benchmark MMHAL-BENCH with a special focus on penalizing hallucinations
- Trained a LLM with RLHF (Llava-RLHF) which shows improved multimodal alignment
 
- 
LRV (GAVIE): Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning (29 September, 2023) - LRV-Instruction - positive and negative robust instruction tuning dataset with 400k visual instructions (16 tasks)
- Negative instruction semantics: (a) Nonexistent Object Manipulation (b) Existent Object Manipulation (c) Knowledge Manipulation
- GPT4-Assisted Visual Instruction Evaluation (GAVIE)
 
- 
NOPE: Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models (09 October, 2023) - VQA diagnostic benchmark to measure object hallucination with use of 'Negative Pompt' based questions
- LLM based generation of 29.5k synthetic negative pronoum (none, no one, nobody. nowhere, neither) dataset
- Finding: tendency of VLMs to hallucinate more on data with higher lexical diversity, more scene relavent objects (co-occurance) and large answer copes.
 
- 
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models (CVPR 2024) - Language Hallucination + Visual Illusion: 1129 VQA paired with total 346 images
- It includes topics such as food, math, geometry, statistics, geography, sports, cartoon, famous illusions, movie, meme, etc. and formats such as including logo, poster, figure, charts, table, map, consecutive images, etc.
 
- 
FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models (02 November, 2023) - Reference-free and fine-grained evaluation metric
- 
- Recognizer : LLM is used for descriptive content identification of LVLM's prediction
 
- 
- Decomposer : LLM is used to generate atomic facts based on recognizer's output
 
- 
- Verifier : Visual Entailment Model (e.g. OFA) is used to verify atomic facts with input image
 
 
- 
Bingo: Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges (07 November, 2023) - Total 308 Images and 370 QA Pairs
- Bias category: Region, OCR and Factual
- Interferance catogary: Image-to-Image and Text-to-Image
 
- 
AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation (13 November, 2023) - LLM free evaluation of hallucination using AMBER benchmark
- Evaluation of hallucination for generative and discriminative task using AMBERSCORE metric (covers existence, attributes and relation types of hallucination)
- Includes hallucinatory target objects (more likely to be imagined by LVLMs)
 
- 
RAH-Bench: Mitigating Hallucination in Visual Language Models with Visual Supervision (27 Novemebr, 2023) - Introduce fine-grained vision instruction dataset named RAI-30K (built upon panoptic scene graph dataset (PSG))
- RAH-BENCH vision hallucination evaluation benchmark (3 types: Categorial, Relation and Attribute Hallucination)
- False Positive Rates as evaluation metric
 
- 
Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models (03 Decemeber, 2023) - Proposed a novel test-bed to evaluate IT-LVLMs (Instruction Tuning Large Vision and Language models) on core computer vision tasks
- Observed poor performance of IT-LVLMs with multiple failure cases in visual grounding
- Identify problems with IT-LVLMSs like generation of hallucinatory events and sensitivity to the input query
 
- 
CCEval: HallE-Switch: Controlling Object Hallucination in Large Vision Language Models (03 Decemebr, 2023) - Suggest an approach to control object existence hallucination in detailed captions of LVLM
- Introduced CCEval which is a GPT-4 assisted evaluation method for detailed captioning (Metrics: CHAIR(i&s), Coverage, Average Length, Average Objects)
- Detailed investigation on LVLM's component that might imfluence hallucination such as alignment of language decoder, volume of instruction data, resolution of input image and so on
- Introduced a controlling parameters over LLMs (HallE-Control) to condition the inference of objects
 
- 
FGHE: Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites (04 December, 2023) - Dealing with fine-grained object hallucination with ReCaption framework
- Two stage frame work : 1) Caption generation with help of ChatGPT 2) Finetuning LVLMs on generated captions
- Inroduced Fine-Grained Object Hallucination Evaluation (FGHE) which similar to POPE. (manually annotted 50 images with 200 binary questions with type multi-object, attributes and behaviour)
 
- 
OpenCHAIR: Mitigating Open-Vocabulary Caption Hallucinations (06 Decemeber, 2023) - soon
 
- 
CorrelationQA: The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs (06 February, 2024) - soon
 
- 
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling (09 February, 2024) - soon
 
- 
VQAv2-IDK: Visually Dehallucinative Instruction Generation: Know What You Don’t Know (15 February, 2024) - soon
 
- 
MHaluBench: Unified Hallucination Detection for Multimodal Large Language Models (20 February, 2024) - soon
 
- 
MAD-Bench: How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts (20 February, 2024) - soon
 
- 
VHTest: Visual Hallucinations of Multi-modal Large Language Models (22 February, 2024) - soon
 
- 
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models (24 February, 2024) - soon
 
- 
Evaluating and Mitigating Number Hallucinations in Large Vision-Language Models: A Consistency Perspective (03 March, 2024) - soon
 
- 
** EvalDial**: Mitigating Dialogue Hallucination for Large Multi-modal Models via Adversarial Instruction Tuning (15 March, 2024) - soon
 
- 
IVL-Hallu: PhD: A Prompted Visual Hallucination Evaluation Dataset (17 March, 2024) - soon
 
- 
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models (29 March, 2024) - soon
 
- 
ALOHa: A New Measure for Hallucination in Captioning Models (3 April, 2024) - soon
 
- 
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models (22 April, 2024) - soon
 
- 
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models (08 May, 2024) - soon
 
- 
MRHal-Bench: Automated Multi-level Preference for MLLMs (18 May, 2024) - soon
 
- 
VLind-Bench: Measuring Language Priors in Large Vision-Language Models (13 June, 2024) - soon
 
- 
MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era (13 June, 2024) - soon
 
- 
Med-HallMark: Detecting and Evaluating Medical Hallucinations in Large Vision Language Models (14 June, 2024) - Medical field hallucination benchmark
- MediHall Score - evaluation metric
 
- 
AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models (16 June, 2024) - soon
 
- 
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models (17 June, 2024) - soon
 
- 
CHAIR-MEN: Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? (20 June, 2024) - soon
 
- 
R-BENCH: Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models (24 June, 2024) (ICML2024) - Introduce an evaluation benchmark to tackle relation type of hallucination
- soon
 
- 
HQH: Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models (24 June, 2024) - Propose a framework called Hallucination benchmark Quality Measurement (HQM) to assess the quality of existing hallucination benchmarks
- soon
 
- 
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models (24 June, 2024) - soon
 
- 
MMHalSnowball: Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models (30 June, 2024) - soon
 
- 
MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context (03 July, 2024) - soon
 
- 
ROPE: Multi-Object Hallucination in Vision-Language Models (08 July, 2024) - Deals with multi-object hallucinations and their cause
- Introduce Recognition-based Object Probing Evaluation (ROPE) for assessing multi-object hallucination
- In-depth analysis of hallucinatory behaviors
 
- 
BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models (18 July, 2024) (ECCV 2024) - Proposed a hallucination evaluation benchmark called BEfore-After (BEAF)
- New metrics introduced: True Understanding (TU), IGnorance (IG), StuBbornness (SB), and InDecision (ID)
 
- 
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning (22 July, 2024) (ECCV 2024) - Introduced a novel VQA dataset for VLM evaluation
- soon
 
- 
MMINSTRUCT: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity (22 July, 2024) - Introduced high-quality and diverse visual instruction tuning dataset
- Claims SOTA performance of MMINSTRUCT finetuned LLava-1.5 on 10 out of 12 famous benchmarks
 
- 
Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs (02 August, 2024) - Constructed hallucination evaluation benchmark with perturbed inputs with 7 different purturbed scenarios
- 12 SOTA MLLMs are benchmarked
 
- 
Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models (18th August, 2024) - Introduced a benchmark to evaluate relation hallucination which further catogarized in to Perceptive and Cognitice type
- 3 evaluation tasks: Yes/No, MCQ, VQA
- code and dataset will be released after paper's acceptance
 
- 
Pfram: Understanding Multimodal Hallucination with Parameter-Free Representation Alignment (02 September, 2024) - soon
 
- 
ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models (14 September, 2024) - soon
 
- 
LLSAVisionQA: Explore the Hallucination on Low-level Perception for MLLMs (15 September, 2024) - soon
 
- 
CAST: Cross-modal Alignment Similarity Test for Vision Language Models (17 September, 2024) - soon
 
- 
JourneyBench: Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images (25 September, 2024) - soon
 
- 
FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs (20 September, 2024) - code: here
- soon
 
- 
EventHallusion: Diagnosing Event Hallucinations in Video LLMs (25 September, 2024) - soon
 
- 
TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable Questions (05 October, 2024) - soon
 
- 
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models (15 October, 2024) - soon
 
- 
MM-SY: Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs (15 October, 2024) - soon (code and benchmark)
 
- 
Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions (15 October, 2024) - soon
 
- 
DeCo: MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation (15 October, 2024) - decoding technique
- soon
 
- 
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio (16 October, 2024) - soon
 
- 
Trust but Verify: Programmatic VLM Evaluation in the Wild (17 October, 2024) - project_page
- soon
 
- 
Tri-HE: Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models (03 November, 2024) - soon
 
- 
H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models (06 November, 2024) - soon
 
- 
VIDHAL: Benchmarking Temporal Hallucinations in Vision LLMs (25 November 2024) - perfromance evaluation on video / frames
- soon
 
- 
HALLUCINOGEN: A Benchmark for Evaluating Object Hallucination in Large Visual-Language Models (29 December, 2024) - soon
 
- 
CAOS: Evaluating Hallucination in Large Vision-Language Models based on Context-Aware Object Similarities (25 January, 2024) - soon
 
- 
Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink (25 January, 2025) - HF link
- soon
 
- 
LanP: Rethinking the Impact of Language Priors in Large Vision-Language Models (17 February, 2025) - soon
 
Note: 'soon' will be replaced with brief description!
- 
FPDO - Reward Model: Detecting and Preventing Hallucinations in Large Vision Language Models (AAAI 2024) - M-HalDetect - Hallucination detection dataset with fine-grained annotations [accurate, inaccurate and analysis]
- Fine-grained Direct Preference Optimization (FDPO) technique and reward model trained on introduced dataset
- High correlation of reward model score with human evaluation
 
- 
HaELM: Evaluation and Analysis of Hallucination in Large Vision-Language Models (29 August, 2023) - Discussed LVLMs tendency to response as 'Yes' to judgement type queries
- Use of ChatGPT to collect hallucination data via iterative prompt modification
- Open-source LLM trained over this dataset for evaluation of LVLM's response
- Evaluation results on various LVLMs, Generation length and Top-K of sampling
 
- 
HallE-Switch: Controlling Object Hallucination in Large Vision Language Models (3 October, 2023) - Suggest an approach to control object existence hallucination in detailed captions of LVLM
- Introduced CCEval which is a GPT-4 assisted evaluation method for detailed captioning (Metrics: CHAIR(i&s), Coverage, Average Length, Average Objects)
- Detailed investigation on LVLM's component that might imfluence hallucination such as alignment of language decoder, volume of instruction data, resolution of input image and so on
- Introduced a controlling parameters over LLMs (HallE-Control) to condition the inference of objects
 
- 
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (22 November, 2023)  - Investigates hallucination toxicity in already existing visual instruction dataset
- Proposed HalluciDoctor method for automatic elimination of such toxicity
- Generation of more counterfactual instruction data with help of HalluciDoctor to improve LVLMs' resistance to hallucination
 
- 
LogicCheckGPT: Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models (18 february, 2024)  - Postprocessing output description of LVLMs
- 5 steps logical loop procedure such as
- Object extraction, Object-to-Attribute inquiring, Attribute-to-Object inquiring, Logic closed llop check and Hallucination detection and mitigation
 
- Experimental analysis on POPE and MME benchmark
 
- 
UNIHD: Unified Hallucination Detection for Multimodal Large Language Models (20 February, 2024) - Introduce a meta evaluation benchmark called MHALUBENCH
- Introduce a framework named UNIHD which detect modality-conflicting hallucinations at various levels such as object, attribute, and scene-text, as well as fact-conflicting hallucinations
 
- 
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback (22 April, 2024) - Use of GPT-4/GPT-4v to generate fine-grained feedback for hallucination detection and detection (by supervised finetuning (SFT) of LVLM)
- Propose automatic pipeline for preference dataset construction
- Hallucination Severity Aware Direct Prefential Optimization (HSA-DPO) is introduced for mitigation of LVLM's hallucination
 
- 
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification (29 May, 2024) - Really cool approach
- Lightweight method for hallucination detection
 
- 
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions (11 June, 2024) - soon
 
- 
MediHallDetector: Detecting and Evaluating Medical Hallucinations in Large Vision Language Models (14 June, 2024) - Medical field hallucination detection
- soon
 
- 
Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification (02 July, 2024)- soon
 
- 
SUQ: Reference-free Hallucination Detection for Large Vision-Language Models (11 August, 2024) - Concluded that Supervised Uncertainity Quantification (SUQ) outperforms other reference-free hallucination detection technique such as Uncertainity-based methods and Consistency-based methods
- An example of supervised Uncertainity Quantification method --> METATOKEN paper
- soon
 
- 
Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding Data (30 August, 2024) - Proposed an approach to create corrupted grounding data which can be used to pre-train MLM hallucination detector
- soon
 
- 
LLMs Can Check Their Own Results to Mitigate Hallucinations in Traffic Understanding Tasks (19 September, 2024) - soon
 
- 
TLDR: Token-Level Detective Reward Model for Large Vision Language Models (07 October, 2024) - soon
 
- 
RadFlag: A Black-Box Hallucination Detection Method for Medical Vision Language Models (01 November, 2024) - soon
 
- 
VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation (18 November, 2024)  - soon
 
- 
DHCP: Detecting Hallucinations by Cross-modal Attention Pattern in Large Vision-Language Models (27 November, 2024) - soon
 
- 
Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs (28 November, 2024) - soon
 
- Up to Date (28th January, 2025) and SOTA research work loading...
Note: 'soon' will be replaced with brief description!
- 
ObjMLM: Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training (10 February 2023) - Deals with object hallucination problem of VLMs
- Discuss the influence of various Vision Language Pretraining (VLP) objective (ITM, ITC and ICLM) and Image encoding methods (region-based, grid-based, and patch-based) on object hallucination
- Introduce novel VLP objective ObjMLM to mitigate object hallucination
 
- 
MMCoT: Multimodal Chain-of-Thought Reasoning in Language Models (17 February 2023) - Two stage framework by finetuning language models to perform Multimodal chain-of-thoughts (CoT) which incorporates language (text) and vision (images) modalities
- Claims state-of-the-art performance of model under 1 billion parameters on ScienceQA benchmark
- Multimodal-CoT has the merits of mitigating hallucination and enhancing convergence speed
 
- 
LRV-GAVIE: Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning (26 June, 2023) - LRV-Instruction - positive and negative robust instruction tuning dataset with 400k visual instructions (16 tasks)
- Negative instruction semantics: (a) Nonexistent Object Manipulation (b) Existent Object Manipulation (c) Knowledge Manipulation
- GPT4-Assisted Visual Instruction Evaluation (GAVIE)
 
- 
LLaVA-RLHF: Aligning Large Multimodal Models with Factually Augmented RLHF (25 September, 2023) - Introduced novel algorithm called Factually Augmented RLHF (Fact-RLHF) to alleviate the reward hacking phenomenon in RLHF
- Developed evaluation benchmark MMHAL-BENCH with a special focus on penalizing hallucinations
- Trained a LLM with RLHF (Llava-RLHF) which shows improved multimodal alignment
 
- 
LURE: Analyzing and Mitigating Object Hallucination in Large Vision-Language Models (01 October, 2023) - Introduced LURE framework which is lightweight and compatible post-hoc approach for rectifying object hallucination in LVLMs
- Statstical analysis of Co-occurence of objects, object uncertainity and object position in generated description which might correlate with object hallucination
- Uncertain objects are put as placeholder with tokens while training LURE and while infernece (for revision)
- Really popular method
 
- 
HallE-Switch: Controlling Object Hallucination in Large Vision Language Models (3 October, 2023) - Suggest an approach to control object existence hallucination in detailed captions of LVLM
- Introduced CCEval which is a GPT-4 assisted evaluation method for detailed captioning (Metrics: CHAIR(i&s), Coverage, Average Length, Average Objects)
- Detailed investigation on LVLM's component that might imfluence hallucination such as alignment of language decoder, volume of instruction data, resolution of input image and so on
- Introduced a controlling parameters over LLMs (HallE-Control) to condition the inference of objects
 
- 
Woodpecker: Hallucination Correction for Multimodal Large Language Models (24 October, 2023) - Really popular method
- Training free, post-hoc method to mitigate hallucination (but computationally expensive!!)
- 5 steps framework:
- Key concept extraction from LVLM's output
- Formulation of questions based on key concepts
- Visual Knowledge validation (use of open-source object detector + pretrained VQA model)
- Visual claim generation (use of fix sentence templates + QA to claim model)
- Hallucination Correction (use LLM to correct LVLM's response)
 
 
- 
VOLCANO: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision (14 November, 2023) - soon
 
- 
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (22 November, 2023) - Investigates hallucination toxicity in already existing visual instruction dataset
- Proposed HalluciDoctor method for automatic elimination of such toxicity
- Generation of more counterfactual instruction data with help of HalluciDoctor to improve LVLMs' resistance to hallucination
 
- 
RAH-Bench: Mitigating Hallucination in Visual Language Models with Visual Supervision (27 Novemebr, 2023) - soon
 
- 
HA-DPO: Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization (28 November, 2023) - soon
 
- 
VCD: Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding (28 November, 2023) - Decoding strategy
- soon
 
- 
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation (CVPR 2024) - soon
 
- 
FGHE: Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites (04 December, 2023) - soon
 
- 
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback (01 December, 2023) - fine-grained refined DPO!
- soon
 
- 
MOCHa: Mitigating Open-Vocabulary Caption Hallucinations (06 December 2023) - soon
 
- 
HACL: Hallucination Augmented Contrastive Learning for Multimodal Large Language Model (12 December 2023) - soon
 
- 
SILKIE: Preference Distillation for Large Visual Language Models (17 December, 2023) - propose VLFeedback dataset for DPO
- soon
 
- 
KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning (23 January, 2024) - soon
 
- 
Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study (31 January, 2024) - soon
 
- 
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling (09 February, 2024) - soon
 
- 
SKIP \N: A Simple Method to Reduce Hallucination in Large Vision-Language Models (12 February, 2024) - soon
 
- 
MARINE: Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance (13 February, 2024) - soon
 
- 
IDK-Instructions: Visually Dehallucinative Instruction Generation: Know What You Don’t Know (15 February, 2024) - soon
 
- 
EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models (15 February, 2024) - soon
 
- 
LogicCheckGPT: Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models (18 february, 2024) - soon
 
- 
POVID: Aligning Modalities in Vision Large Language Models via Preference Fine-tuning (18 february, 2024) - soon
 
- 
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (22 February, 2024) - decoding strategy
- soon
 
- 
CGD: Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding (23 February, 2024) - decoding strategy
- soon
 
- 
IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding (28 February, 2024) - decoding strategy
- soon
 
- 
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding (01 March, 2024) - Decodig strategy to tackle object hallucination
- soon
 
- 
Evaluating and Mitigating Number Hallucinations in Large Vision-Language Models: A Consistency Perspective (03 March, 2024) - number hallucination
- soon
 
- 
AIT: Mitigating Dialogue Hallucination for Large Multi-modal Models via Adversarial Instruction Tuning (15 March, 2024) - soon
 
- 
DVP: What if...?: Counterfactual Inception to Mitigate Hallucination Effects in Large Multimodal Models (20 March, 2024) - soon
 
- 
M3ID: Multi-Modal Hallucination Control by Visual Information Grounding (20 March, 2024) - decoding strategy
- soon
 
- 
PENSIEVE: Retrospect-then-Compare Mitigates Visual Hallucination (21 March, 2024) - decoding strategy
- soon
 
- 
ESREAL: Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models (26 March, 2024) - soon
 
- 
ICD: Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding (27 March, 2024) - decoding strategy
- soon
 
- 
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback (07 April, 2024) - soon
 
- 
Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning (16 April, 2024) - soon
 
- 
FACT: Teaching MLLMs with Faithful, Concise and Transferable Rationales (17 April, 2024) - soon
 
- 
TVP: Exploring the Transferability of Visual Prompting for Multimodal Large Language Models (17 April, 2024) - soon
 
- 
TextSquare: Scaling up Text-Centric Visual Instruction Tuning (19 April, 2024) - soon
 
- 
HSA-DPO: Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback (22 April, 2024) - Use of GPT-4/GPT-4v to generate fine-grained feedback for hallucination detection and detection (by supervised finetuning (SFT) of LVLM)
- Propose automatic pipeline for preference dataset construction
- Hallucination Severity Aware Direct Prefential Optimization (HSA-DPO) is introduced for mitigation of LVLM's hallucination
 
- 
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation (30 April - CVPR 2024) - soon
 
- 
CSR: Calibrated Self-Rewarding Vision Language Models (23 May, 2024) - soon
 
- 
HIO: Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization (24 May, 2024) - soon
 
- 
VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap (24 May, 2024) - soon
 
- 
RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthines (27 May, 2024) - soon
 
- 
AvisC: Don’t Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models (28 May, 2024) - decoding strategy
 
- 
RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in LVLMs (28 May, 2024) - soon
 
- 
HALVA: Mitigating Object Hallucination via Data Augmented Contrastive Tuning (28 May, 2024) - decoding strategy
- will publish code soon
 
- 
NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models (30 May, 2024) - soon
 
- 
CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Model (04 June, 2024) - soon
 
- 
mDPO: Conditional Preference Optimization for Multimodal Large Language Models (17 June, 2024) - soon
 
- 
DBD: Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning? (18 June, 2024) - Introduce novel decoding technique called Differentiated Beam Decoding (DBD)
- soon
 
- 
AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention (18 June, 2024) - Introduce AGLA, a training-free and plug-and-play decoding framework
- soon
 
- 
Residual Visual Decoding: Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models (30 June, 2024) - decoding method
- Soon
 
- 
BDHS: UNDERSTANDING ALIGNMENT IN MULTIMODAL LLMS: A COMPREHENSIVE STUDY (02 July, 2024) - soon
 
- 
REVERIE: Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models (16 July, 2024) (ECCV 2024) - Introduced novel reflective instruction tuning to incorporate rationales into visual instruction tuning
- Proposed large-scale instruction tuning dataset called REVERIE
 
- 
VACoDe: Visual Augmented Contrastive Decoding (26 July, 2024) - decoding strategy using various visual augmentation
- analysed effect of various visual augmentation on LVLMs performance and introduced an algorithm to select the most suitable augmentation for constractive decoding for input image
- soon
 
- 
PAI: Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs (31 July, 2024) (ECCV 2024) - soon
 
- 
MHR: Mitigating Multilingual Hallucination in Large Vision-Language Models (01 August, 2024) - soon
 
- 
ARA: Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation (01 August, 2024) - RAG for LVLMs for mitigating hallucination
- soon
 
- 
SID: Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models (04 August, 2024) - Decoding strategy
- Rethink constractuve decoding (CD) methods in LVLMs for hallucination mitigation
- soon
 
- 
LCD: Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (06 August, 2024) - decoding strategy to mitigate object hallucination
- soon
 
- 
Detect-then-Calibrate: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models (18 August, 2024) - Proposed a novel detect-then-calibrate method to detect and mitigate hallucination
- throshold based hallucination identification
- hallucination rate as metric to calculate final metric called R_score
 
- 
CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs (19 August, 2024) - Do not require additiona training or external dataset or esemble of external LVLMs such as GPT-4
- Use of CLIP model to prepare positive-negative pairs for DPO
- Claims far better performance then similar work - HA-DPO with very few training data samples
 
- 
LQCD: Towards Analyzing and Mitigating Sycophancy in Large Vision-Language Models (21 August, 2024) - Deals with Sycophancy in LVLMs which exists due to negative prompting
- Introduce decoding strategy for improving LVLM's robustness toward sycophancy
 
- 
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data (22 August, 2024) - Introduced Robust Visula Reward model (RoVRM) to improve human-preference alignment in LVLMs
- 3 stage progressive training and optimal transport-based preference data selection approaches to train RoVRM
- Seemless integration with arbitrary ranking-based alignment techniques, such as direct preference optimization (DPO)
 
- 
ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models (25 August, 2024) - constractive decoding method
- use of text-to-image (T2I) model for constractive decoding and mitigate hallucination
- Claimed that experimental investigation on 5 benchmarks showing superior performance compared to existing techniques for hallucination mitigation
 
- 
See or Guess: Counterfactually Regularized Image Captioning (29 August, 2024) - soon
 
- 
Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning (30 August, 2024) - multi-path certainity based decoding
- soon
 
- 
FaithD2T Generating Faithful and Salient Text from Multimodal Data (06 September, 2024) - soon
 
- 
RBD: Mitigating Hallucination in Visual-Language Models via Re-Balancing Contrastive Decoding (10 September, 2024) - Decoding strategy
- soon
 
- 
PACU: Effectively Enhancing Vision Language Large Models by Prompt Augmentation and Caption Utilization (22 September, 2024) - soon
 
- 
Dentist: A Unified Hallucination Mitigation Framework for Large Vision-Language Models (24 September, 2024) - soon
 
- 
TCD: Diagnosing Event Hallucinations in Video LLMs (25 September, 2024) - soon
 
- 
HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding (30 September, 2024) - extension of OPERA paper with vision enhanced penalty decoding
- soon
 
- 
PROJECTAWAY: Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations (03 October, 2024) - soon
 
- 
OHD-Caps: Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) Models (04 October, 2024) - soon
 
- 
LOOK TWICE BEFORE YOU ANSWER: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models (04 October, 2024) - soon
 
- 
DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination (06 October, 2024) - decoding strategy
- soon
 
- 
CAUSALMM: Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality (07 October, 2024) - soon
 
- 
FROM PIXELS TO TOKENS: Revisiting Object Hallucinations in Large Vision-Language Models (09 October, 2024) - soon
 
- 
VHExpansion: Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models (15 October, 2024) - soon
 
- 
SGD: Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided Decoding (17 October, 2024) - decoding technique
- soon
 
- 
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment (18 October, 2024) - soon
 
- 
MFPO: Modality-Fair Preference Optimization for Trustworthy MLLM Alignment (20 October, 2024) - soon (code)
 
- 
CCA: Mitigating Object Hallucination via Concentric Causal Attention (21 October, 2024) - soon
 
- 
VTI: Reducing Hallucinations in Vision-Language Models via Latent Space Steering (22 October, 2024) - soon
 
- 
V-DPO: Mitigating Hallucination in Large Vision Language Models viaVision-Guided Direct Preference Optimization (05 November, 2024) - soon
 
- 
EAH: Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs (15 November, 2024) - soon
 
- 
HDPO: Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization (15 November, 2024) - soon
 
- 
Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination (15 November, 2024) - soon
 
- 
CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs (19 November, 2024) - soon
 
- 
Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance (21 November, 2024) - project page
- soon
 
- 
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models (22 November, 2024) - code will be released soon
- soon
 
- 
VaLiD: Mitigating the Hallucination of Large Vision Language Models by Visual Layer Fusion Contrastive Decoding (24 November, 2024) - soon
 
- 
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens (23 November, 2024) - soon
 
- 
TPO: A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs (26 November, 2024) - soon
 
- 
WhoBrings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis (03 December, 2024) - soon
 
- 
VisVM: Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension (06 December, 2024) - soon
 
- 
Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models (06 December, 2024) - code will be published soon
- soon
 
- 
From Uncertainty to Trust: Enhancing Reliability in Vision-Language Models with Uncertainty-Guided Dropout Decoding (09 December, 2024) - soon
 
- 
VCD Analysis: Delve into Visual Contrastive Decoding for Hallucination Mitigation of Large Vision-Language Models (09 December, 2024) - soon
 
- 
DEHALL: Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning (15th December, 2024) - soon
 
- 
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection (18th December, 2024) - soon
 
- 
VHD: Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence (18th December, 2024) - soon
 
- 
TPO: Token Preference Optimization with Self-Calibrated Visual-Anchored Rewards for Hallucination Mitigation (19th December, 2024) - soon
 
- 
Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage (20th December, 2024) - soon
 
- 
VORD: Visual Ordinal Calibration for Mitigating Object Hallucinations in Large Vision-Language Models (20th December, 2024) - soon
 
- 
SENA: Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution (20th December, 2024) - soon
 
- 
IMCCD: Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding (03 January, 2025) - soon
 
- 
EAGLE: Enhanced Visual Grounding Minimizes Hallucinations in Instructional Multimodal Models (06 January, 2025) - code will be released soon
 
- 
Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild (07 January, 2025) - soon
 
- 
VASparse: Towards Efficient Visual Hallucination Mitigation for Large Vision-Language Model via Visual-Aware Sparsification (11 January, 2025) - soon
 
- 
OPA-DPA: Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key (16 January, 2025) - soon
 
- 
MIAVLM: Mitigating Hallucinations on Object Attributes using Multiview Images and Negative Instructions (17 January, 2025) - soon
 
- 
llava-fix-hallucination: Fixing Imbalanced Attention to Mitigate In-Context Hallucination of Large Vision-Language Model (21 January, 2025) - soon
 
- 
CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs (28 January, 2025) - soon
 
- 
Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs (31 January, 2025) - soon
 
- 
MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction (02 February, 2025) - soon
 
- 
IFCD : Mitigating Hallucinations in Large Vision-Language Models with Internal Fact-based Contrastive Decoding (03 February, 2025) - soon
 
- 
UAC/DAC: Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration (04 February, 2025) - soon
 
- 
VISTA: The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering (05 February, 2025) - soon
 
- 
DeGF: Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models (10 February, 2025) - soon
 
- 
CAP: Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting (12 February, 2025) - soon
 
- 
Up to Date (28th January, 2025) and SOTA research work loading... 
Note: 'soon' will be replaced with brief description!
- DEEP LEARNING APPROACHES ON IMAGE CAPTIONING: A REVIEW (22 August, 2023)
- A Survey on Hallucination in Large Vision-Language Models (1 February, 2024)
- Visual Hallucination: Definition, Quantification, and Prescriptive Remediations (26 March, 2024)
- 
Hallucination of Multimodal Large Language Models: A Survey (29 April, 2024)  
- Unveiling Hallucination in Text, Image, Video, and Audio Foundation Models: A Comprehensive Survey (20 May, 2024)
- 
Benchmark Evaluations, Applications, and Challenges of Large Vision Language Models: A Survey (04 January 2025)  
- Up to Date (28th January, 2025) and SOTA research work loading...
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-LVLM-Hallucination
Similar Open Source Tools
 
            
            lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams. It has the following core features: * **Efficient Inference** : LMDeploy delivers up to 1.8x higher request throughput than vLLM, by introducing key features like persistent batch(a.k.a. continuous batching), blocked KV cache, dynamic split&fuse, tensor parallelism, high-performance CUDA kernels and so on. * **Effective Quantization** : LMDeploy supports weight-only and k/v quantization, and the 4-bit inference performance is 2.4x higher than FP16. The quantization quality has been confirmed via OpenCompass evaluation. * **Effortless Distribution Server** : Leveraging the request distribution service, LMDeploy facilitates an easy and efficient deployment of multi-model services across multiple machines and cards. * **Interactive Inference Mode** : By caching the k/v of attention during multi-round dialogue processes, the engine remembers dialogue history, thus avoiding repetitive processing of historical sessions.
 
            
            Awesome-LM-SSP
The Awesome-LM-SSP repository is a collection of resources related to the trustworthiness of large models (LMs) across multiple dimensions, with a special focus on multi-modal LMs. It includes papers, surveys, toolkits, competitions, and leaderboards. The resources are categorized into three main dimensions: safety, security, and privacy. Within each dimension, there are several subcategories. For example, the safety dimension includes subcategories such as jailbreak, alignment, deepfake, ethics, fairness, hallucination, prompt injection, and toxicity. The security dimension includes subcategories such as adversarial examples, poisoning, and system security. The privacy dimension includes subcategories such as contamination, copyright, data reconstruction, membership inference attacks, model extraction, privacy-preserving computation, and unlearning.
 
            
            langchat
LangChat is an enterprise AIGC project solution in the Java ecosystem. It integrates AIGC large model functionality on top of the RBAC permission system to help enterprises quickly customize AI knowledge bases and enterprise AI robots. It supports integration with various large models such as OpenAI, Gemini, Ollama, Azure, Zhifu, Alibaba Tongyi, Baidu Qianfan, etc. The project is developed solely by TyCoding and is continuously evolving. It features multi-modality, dynamic configuration, knowledge base support, advanced RAG capabilities, function call customization, multi-channel deployment, workflows visualization, AIGC client application, and more.
 
            
            cia
CIA is a powerful open-source tool designed for data analysis and visualization. It provides a user-friendly interface for processing large datasets and generating insightful reports. With CIA, users can easily explore data, perform statistical analysis, and create interactive visualizations to communicate findings effectively. Whether you are a data scientist, analyst, or researcher, CIA offers a comprehensive set of features to streamline your data analysis workflow and uncover valuable insights.
 
            
            Awesome-Graph-LLM
Awesome-Graph-LLM is a curated collection of research papers exploring the intersection of graph-based techniques with Large Language Models (LLMs). The repository aims to bridge the gap between LLMs and graph structures prevalent in real-world applications by providing a comprehensive list of papers covering various aspects of graph reasoning, node classification, graph classification/regression, knowledge graphs, multimodal models, applications, and tools. It serves as a valuable resource for researchers and practitioners interested in leveraging LLMs for graph-related tasks.
 
            
            qianfan-starter
WenXin-Starter is a spring-boot-starter for Baidu's 'WenXin Workshop' large model, facilitating quick integration of Baidu's AI capabilities. It provides complete integration with WenXin Workshop's official API documentation, supports WenShengTu, built-in conversation memory, and supports conversation streaming. It also supports QPS control for individual models and queuing mechanism, with upcoming plugin support.
 
            
            fairseq
Fairseq is a sequence modeling toolkit that enables researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks. It provides reference implementations of various sequence modeling papers covering CNN, LSTM networks, Transformer networks, LightConv, DynamicConv models, Non-autoregressive Transformers, Finetuning, and more. The toolkit supports multi-GPU training, fast generation on CPU and GPU, mixed precision training, extensibility, flexible configuration based on Hydra, and full parameter and optimizer state sharding. Pre-trained models are available for translation and language modeling with a torch.hub interface. Fairseq also offers pre-trained models and examples for tasks like XLS-R, cross-lingual retrieval, wav2vec 2.0, unsupervised quality estimation, and more.
 
            
            aim
Aim is an open-source, self-hosted ML experiment tracking tool designed to handle 10,000s of training runs. Aim provides a performant and beautiful UI for exploring and comparing training runs. Additionally, its SDK enables programmatic access to tracked metadata — perfect for automations and Jupyter Notebook analysis. **Aim's mission is to democratize AI dev tools 🎯**
 
            
            retinify
Retinify is an advanced AI-powered stereo vision library designed for robotics, enabling real-time, high-precision 3D perception by leveraging GPU and NPU acceleration. It is open source under Apache-2.0 license, offers high precision 3D mapping and object recognition, runs computations on GPU for fast performance, accepts stereo images from any rectified camera setup, is cost-efficient using minimal hardware, and has minimal dependencies on CUDA Toolkit, cuDNN, and TensorRT. The tool provides a pipeline for stereo matching and supports various image data types independently of OpenCV.
 
            
            comfyui-photoshop
ComfyUI for Photoshop is a plugin that integrates with an AI-powered image generation system to enhance the Photoshop experience with features like unlimited generative fill, customizable back-end, AI-powered artistry, and one-click transformation. The plugin requires a minimum of 6GB graphics memory and 12GB RAM. Users can install the plugin and set up the ComfyUI workflow using provided links and files. Additionally, specific files like Check points, Loras, and Detailer Lora are required for different functionalities. Support and contributions are encouraged through GitHub.
 
            
            wenxin-starter
WenXin-Starter is a spring-boot-starter for Baidu's "Wenxin Qianfan WENXINWORKSHOP" large model, which can help you quickly access Baidu's AI capabilities. It fully integrates the official API documentation of Wenxin Qianfan. Supports text-to-image generation, built-in dialogue memory, and supports streaming return of dialogue. Supports QPS control of a single model and supports queuing mechanism. Plugins will be added soon.
 
            
            Awesome-LLM-Ensemble
Awesome-LLM-Ensemble is a collection of papers on LLM Ensemble, focusing on the comprehensive use of multiple large language models to benefit from their individual strengths. It provides a systematic review of recent developments in LLM Ensemble, including taxonomy, methods for ensemble before, during, and after inference, benchmarks, applications, and related surveys.
 
            
            intel-extension-for-transformers
Intel® Extension for Transformers is an innovative toolkit designed to accelerate GenAI/LLM everywhere with the optimal performance of Transformer-based models on various Intel platforms, including Intel Gaudi2, Intel CPU, and Intel GPU. The toolkit provides the below key features and examples: * Seamless user experience of model compressions on Transformer-based models by extending [Hugging Face transformers](https://github.com/huggingface/transformers) APIs and leveraging [Intel® Neural Compressor](https://github.com/intel/neural-compressor) * Advanced software optimizations and unique compression-aware runtime (released with NeurIPS 2022's paper [Fast Distilbert on CPUs](https://arxiv.org/abs/2211.07715) and [QuaLA-MiniLM: a Quantized Length Adaptive MiniLM](https://arxiv.org/abs/2210.17114), and NeurIPS 2021's paper [Prune Once for All: Sparse Pre-Trained Language Models](https://arxiv.org/abs/2111.05754)) * Optimized Transformer-based model packages such as [Stable Diffusion](examples/huggingface/pytorch/text-to-image/deployment/stable_diffusion), [GPT-J-6B](examples/huggingface/pytorch/text-generation/deployment), [GPT-NEOX](examples/huggingface/pytorch/language-modeling/quantization#2-validated-model-list), [BLOOM-176B](examples/huggingface/pytorch/language-modeling/inference#BLOOM-176B), [T5](examples/huggingface/pytorch/summarization/quantization#2-validated-model-list), [Flan-T5](examples/huggingface/pytorch/summarization/quantization#2-validated-model-list), and end-to-end workflows such as [SetFit-based text classification](docs/tutorials/pytorch/text-classification/SetFit_model_compression_AGNews.ipynb) and [document level sentiment analysis (DLSA)](workflows/dlsa) * [NeuralChat](intel_extension_for_transformers/neural_chat), a customizable chatbot framework to create your own chatbot within minutes by leveraging a rich set of [plugins](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/docs/advanced_features.md) such as [Knowledge Retrieval](./intel_extension_for_transformers/neural_chat/pipeline/plugins/retrieval/README.md), [Speech Interaction](./intel_extension_for_transformers/neural_chat/pipeline/plugins/audio/README.md), [Query Caching](./intel_extension_for_transformers/neural_chat/pipeline/plugins/caching/README.md), and [Security Guardrail](./intel_extension_for_transformers/neural_chat/pipeline/plugins/security/README.md). This framework supports Intel Gaudi2/CPU/GPU. * [Inference](https://github.com/intel/neural-speed/tree/main) of Large Language Model (LLM) in pure C/C++ with weight-only quantization kernels for Intel CPU and Intel GPU (TBD), supporting [GPT-NEOX](https://github.com/intel/neural-speed/tree/main/neural_speed/models/gptneox), [LLAMA](https://github.com/intel/neural-speed/tree/main/neural_speed/models/llama), [MPT](https://github.com/intel/neural-speed/tree/main/neural_speed/models/mpt), [FALCON](https://github.com/intel/neural-speed/tree/main/neural_speed/models/falcon), [BLOOM-7B](https://github.com/intel/neural-speed/tree/main/neural_speed/models/bloom), [OPT](https://github.com/intel/neural-speed/tree/main/neural_speed/models/opt), [ChatGLM2-6B](https://github.com/intel/neural-speed/tree/main/neural_speed/models/chatglm), [GPT-J-6B](https://github.com/intel/neural-speed/tree/main/neural_speed/models/gptj), and [Dolly-v2-3B](https://github.com/intel/neural-speed/tree/main/neural_speed/models/gptneox). Support AMX, VNNI, AVX512F and AVX2 instruction set. We've boosted the performance of Intel CPUs, with a particular focus on the 4th generation Intel Xeon Scalable processor, codenamed [Sapphire Rapids](https://www.intel.com/content/www/us/en/products/docs/processors/xeon-accelerated/4th-gen-xeon-scalable-processors.html).
 
             
                 
            