
Awesome-LLM-Psychometrics
Awesome paper in LLM Psychometrics and LLM Psychology
Stars: 51

This repository contains a collection of tools and resources for conducting psychometric analysis in the context of latent variable modeling. It includes scripts for data preprocessing, model estimation, and results interpretation. The tools provided here aim to assist researchers and practitioners in the field of psychology and related disciplines to analyze complex relationships among latent variables using advanced statistical techniques.
README:
π Project Website: https://llm-psychometrics.com
This repository accompanies the paper Large Language Model Psychometrics: A Systematic Review of Evaluation, Validation, and Enhancement. It contains a curated list of Large Language Models (LLMs) psychometrics resources. We will continue to update this repository as we find new resources. We would greatly appreciate it if you could contribute to this repository by submitting a pull request or an issue.
-
π Category Entries
- π Personality
- π Values
- π Morality
- π Attitudes & Opinions
- π Heuristics & Biases
- π Social Intelligence & Theory of Mind
- π Psychology of Language
- π Learning and Cognitive Capabilities
If you find this repository useful, we would greatly appreciate it if you could give us a star and cite the paper as follows:
@article{ye2025large,
title={Large Language Model Psychometrics: A Systematic Review of Evaluation, Validation, and Enhancement},
author={Ye, Haoran and Jin, Jing and Xie, Yuhang and Zhang, Xin and Song, Guojie},
journal={arXiv preprint arXiv:2505.08245},
year={2025},
note={Project website: \url{https://llm-psychometrics.com}, GitHub: \url{https://github.com/ValueByte-AI/Awesome-LLM-Psychometrics}}
}
- [ ] Add tags to each entry
- π―οΈ Big Five / HEXACO / Myers-Briggs Type Indicator (MBTI) / Dark Triad / Others & custom
- π§ͺ Personality is the enduring configuration of characteristics and behavior that comprises an individualβs unique adjustment to life.
- βοΈ Schwartzβs Theory / World Values Survey (WVS) / Global Leadership and Organizational Behavior Effectiveness (GLOBE) / Social Value Orientation (SVO) / Others & custom
- π§ͺ Values are enduring beliefs that guide behavior and decision-making, reflecting what is important and desirable to an individual or group.
- 𧬠Moral Foundations (MFT) / Defining Issues Test (DIT) / ETHICS / Others & custom
- π§ͺ Morality is the categorization of intentions, decisions and actions into those that are proper, or right, and those that are improper, or wrong.
-
π£οΈ American National Election Studies (ANES) / American Trends Panel(ATP) / German Longitudinal Election Study (GLES) / Political Compass Test (PCT)
-
π§ͺ Attitudes are always attitudes about something. This implies three necessary elements: first, there is the object of thought, which is both constructed and evaluated. Second, there are acts of construction and evaluation. Third, there is the agent, who is doing the constructing and evaluating. We can therefore suggest that, at its most general, an attitude is the cognitive construction and affective evaluation of an attitude object by an agent.
- π§ͺ Heuristics and biases are mental shortcuts or rules of thumb that simplify decision-making and problem-solving.
-
π Theory of Mind (ToM) / Emotional Intelligence / Social Intelligence
-
π§ͺ Theory of Mind is the ability to attribute mental states such as beliefs, intentions, and knowledge to others.
π§ͺ Emotional Intelligence is the subset of social intelligence that involves the ability to monitor oneβs own and othersβ feelings and emotions, to discriminate among them and to use this information to guide oneβs thinking and actions.
π§ͺ Social Intelligence is the ability to understand and manage people.
- π§βπ€βπ§ Language comprehension / Language generation / Language acquisition
- Test Format: Structured test Β· Open-ended conversation Β· Agentic simulation
- Data and Task Sources: Established inventories (e.g., MFT, SVS, MBTI) Β· Custom-curated items Β· Synthetic items
- Prompting Strategies: Prompt perturbation Β· Performance-enhancing prompts (e.g., CoT) Β· Role-playing prompts
- Model Output & Scoring: Logit-based analysis Β· Direct scoring Β· Rule-based scoring Β· Human scoring Β· Model-based scoring
-
Reliability: Test-retest Β· Parallel forms Β· Inter-rater agreement
-
Content Validity: Data contamination Β· Novel items
-
Construct Validity: Unique abstraction Β· Response set Β· Social Desirability Bias Β· Cross-lingual Tests
-
Criterion / Ecological Validity: External correlation Β· Real-world relevance
-
Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications, 2025.04, [paper]
-
The Mind in the Machine: A Survey of Incorporating Psychological Theories in LLMs, 2025.05, [paper]
-
A review of automatic item generation techniques leveraging large language models, 2025.06, [paper]
- (Big Five) Is Self-knowledge and Action Consistent or Not: Investigating Large Language Model's Personality, ICML 2024, [paper]
- (Big Five) Can LLM Agents Maintain a Persona in Discourse?, 2025.02, [paper]
- (Big Five) Personality testing of large language models: limited temporal stability, but highlighted prosociality, 2024.01, Royal Society Open Science, [paper]
- (Big Five) Identifying and Manipulating the Personality Traits of Language Models, EMNLP 2023, [paper]
- (Big Five) Do Personality Tests Generalize to Large Language Models?, 2023.11, [paper]
- (Big Five) LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models, EACL 2024, [paper]
- (Big Five) PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits, NAACL 2024 Findings, [paper]
- (Big Five) Eliciting Personality Traits in Large Language Models, 2024.02, [paper]
- (Big Five) Revisiting the Reliability of Psychological Scales on Large Language Models, EMNLP 2024, [paper]
- (Big Five) Evaluating and Inducing Personality in Pre-trained Language Models, NeurIPS 2023, [paper]
- (Big Five) Estimating the Personality of White-Box Language Models, 2022.04, [paper]
- (Big Five) Driving Generative Agents With Their Personality, 2024.02, [paper]
- (Big Five) Large Language Models as Superpositions of Cultural Perspectives, 2023.07, [paper] [code]
- (Big Five) Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Model, AAAI 2025, [paper]
- (Big Five) Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics, NAACL 2025 Findings, [paper]
- (Big Five) Evaluating Psychological Safety of Large Language Models, EMNLP 2024, [paper]
- (Big Five) Dynamic Generation of Personalities with Large Language Models, 2024.04, [paper]
- (Big Five) Illuminating the Black Box: A Psychometric Investigation into the Multifaceted Nature of Large Language Models, 2023.12, [paper]
- (Big Five) AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories, 2023.01, Perspectives on Psychological Science, [paper]
- (Big Five) Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric Analysis, 2024.05, [paper]
- (Big Five) ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models, ACL 2024, [paper] [code]
- (Big Five) Do GPT Language Models Suffer From Split Personality Disorder? The Advent Of Substrate-Free Psychometrics, 2024.08, [paper]
- (Big Five) Personality Traits in Large Language Models, 2023.08, [paper]
- (Big Five) You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments, NAACL 2024, [paper]
- (Big Five) Have Large Language Models Developed a Personality?: Applicability of Self-Assessment Tests in Measuring Personality in LLMs, 2023.05, [paper]
- (Big Five) Challenging the Validity of Personality Tests for Large Language Models, Workshop at NeurIPS 2023, [paper]
- (Big Five) LMLPA: Language Model Linguistic Personality Assessment, 2025.01, Computational Linguistics, [paper]
- (Big Five) Dynamic Evaluation of Large Language Models by Meta Probing Agents, ICML 2024, [paper] [code]
- (Big Five) Value Portrait: Assessing Language Models' Values through Psychometrically and Ecologically Valid Items, ACL 2025, [paper]
- (Big Five) Toward Accurate Psychological Simulations: Investigating LLMsβ Responses to Personality and Cultural Variables, Computers in Human Behavior 2025, [paper]
- (Big Five) Personality-Driven Decision-Making in LLM-Based Autonomous Agents, AAMAS 2025, [paper]
- (Big Five) Large Language Models Demonstrate Distinct Personality Profiles, Cureus 2025, [paper]
- (Big Five) Beyond Self-Reports: Multi-Observer Agents for Personality Assessment in Large Language Models, 2025.04, [paper]
- (Big Five) Persona Dynamics: Unveiling the Impact of Personality Traits on Agents in Text-Based Games, 2025.04, [paper]
- (Big Five) Improving Language Model Personas via Rationalization with Psychological Scaffolds, 2025.04, [paper]
- (Big Five)Β The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs, 2025.09, [paper][code]
- (HEXACO) On the Psychology of GPT-4: Moderately anxious, slightly masculine, honest, and humble, 2024.02, [paper]
- (HEXACO) Personality testing of large language models: limited temporal stability, but highlighted prosociality, 2024.01, Royal Society Open Science, [paper]
- (HEXACO) Who is GPT-3? An Exploration of Personality, Values and Demographics, EMNLP 2022 NLP+CSS workshop, [paper]
- (HEXACO) Cognitive phantoms in LLMs through the lens of latent variables, 2024.09, [paper]
- (HEXACO) ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models, ACL 2024, [paper][code]
- (HEXACO) Exploring the Impact of Personality Traits on LLM Bias and Toxicity, 2025.02, [paper]
- (MBTI) Machine Mindset: An MBTI Exploration of Large Language Models, 2023.12, [paper][code]
- (MBTI) Revisiting the Reliability of Psychological Scales on Large Language Models, EMNLP 2024, [paper]
- (MBTI) Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models, AAAI 2025, [paper]
- (MBTI) Illuminating the Black Box: A Psychometric Investigation into the Multifaceted Nature of Large Language Models, 2023.12, [paper]
- (MBTI) Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models, 2023.07, [paper][code]
- (MBTI) Can ChatGPT Assess Human Personalities? A General Evaluation Framework, 2023.03, [paper][code]
- (MBTI) Identifying Multiple Personalities in Large Language Models with External Evaluation, 2024.02, [paper]
- (MBTI) The Better Angels of Machine Personality: How Personality Relates to LLM Safety, 2024.07, [paper]
- (MBTI) Do Large Language Models Have a Personality? A Psychometric Evaluation with Implications for Clinical Medicine and Mental Health AI, 2025.03, [paper]
- (DarkTriad) On the Psychology of GPT-4: Moderately anxious, slightly masculine, honest, and humble, 2024.02, [paper]
- (DarkTriad) Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench, ICLR 2024 Oral, [paper][code]
- (DarkTriad) Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics, NAACL 2025 Findings, [paper]
- (DarkTriad) Evaluating Psychological Safety of Large Language Models, 2022.12, [paper]
- (DarkTriad) Illuminating the Black Box: A Psychometric Investigation into the Multifaceted Nature of Large Language Models, 2023.12, [paper]
- (DarkTriad) Cognitive phantoms in LLMs through the lens of latent variables, 2024.09, [paper]
- (DarkTriad) Do GPT Language Models Suffer From Split Personality Disorder? The Advent Of Substrate-Free Psychometrics, 2024.08, [paper]
- (DarkTriad) I'm Sorry Dave: How the old world of personnel security can inform the new world of AI insider risk, 2025.05, [paper]
- (DarkTriad) Persona Dynamics: Unveiling the Impact of Personality Traits on Agents in Text-Based Games, 2025.04, [paper]
- (Others & custom) Self-assessment, Exhibition, and Recognition: a Review of Personality in Large Language Models, 2024.06, [paper]
- (Others & custom) Is Self-knowledge and Action Consistent or Not: Investigating Large Language Model's Personality, ICML 2024, [paper]
- (Others & custom) Evaluating and Inducing Personality in Pre-trained Language Models, NeurIPS 2023, [paper]
- (Others & custom) Editing Personality For Large Language Models, NLPCC 2024, [paper]
- (Others & custom) Quantifying Risk Propensities of Large Language Models: Ethical Focus and Bias Detection through Role-Play, CogSci 2025, [paper]
- (Others & custom) PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data, 2025.02, [paper]
- (Schwartz) High-Dimension Human Value Representation in Large Language Models, 2024.04, [paper]
- (Schwartz) What does ChatGPT return about human values? Exploring value bias in ChatGPT using a descriptive value theory, 2023.04, [paper]
- (Schwartz) Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartzβs Theory of Basic Values, 2024.01, JMIR Mental Health, [paper]
- (Schwartz) Large Language Models as Superpositions of Cultural Perspectives, 2023.07, [paper]
- (Schwartz) When Prompting Fails to Sway: Inertia in Moral and Value Judgments of Large Language Models, NeurIPS 2022, [paper]
- (Schwartz) Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts, 2024.11, [paper]
- (Schwartz) Who is GPT-3? An Exploration of Personality, Values and Demographics, EMNLP 2022 NLP+CSS workshop, [paper]
- (Schwartz) AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories, 2023.01, Perspectives on Psychological Science, [paper]
- (Schwartz) ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models, ACL 2024, [paper][code]
- (Schwartz) Do LLMs have Consistent Values?, 2024.07, [paper]
- (Schwartz) ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs, 2024.09, [paper]
- (Schwartz) Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Values, ACL 2024, [paper]
- (Schwartz) Measuring Human and AI Values Based on Generative Psychometrics with Large Language Models, AAAI 2025, [paper]
- (Schwartz) ValueDCG: Measuring Comprehensive Human Value Understanding Ability of Language Models, 2023.10, [paper]
- (Schwartz) Value Portrait: Assessing Language Models' Values through Psychometrically and Ecologically Valid Items, ACL 2025, [paper]
- (Schwartz) Cultural Value Alignment in Large Language Models: A Prompt-based Analysis of Schwartz Values in Gemini, ChatGPT, and DeepSeek, 2025.05, [paper]
- (Schwartz) The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral Dilemmas, 2025.05, [paper]
- (Schwartz) Improving Language Model Personas via Rationalization with Psychological Scaffolds, 2025.04, [paper]
- (WVS) ValueDCG: Measuring Comprehensive Human Value Understanding Ability of Language Models, 2023.10, [paper]
- (WVS) Only a Little to the Left: A Theory-grounded Measure of Political Bias in Large Language Models, 2025.03, [paper]
- (WVS) Exploring Large Language Models on Cross-Cultural Values in Connection with Training Methodology, 2024.12, [paper]
- (WVS) Value Compass Leaderboard: A Platform for Fundamental and Validated Evaluation of LLMs Values, 2025.01, [paper]
- (VSM) How Well Do LLMs Represent Values Across Cultures? Empirical Analysis of LLM Responses Based on Hofstede Cultural Dimensions, 2024.06, [paper]
- (VSM) Large Language Models as Superpositions of Cultural Perspectives, 2023.07, [paper][code]
- (VSM) ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models, ACL 2024, [paper][code]
- (VSM) Measuring Human and AI Values Based on Generative Psychometrics with Large Language Models, AAAI 2025, [paper]
- (VSM) Cultural Value Differences of LLMs: Prompt, Language, and Model Size, 2024.07, [paper]
- (GLOBE) LLM-GLOBE: A Benchmark Evaluating the Cultural Values Embedded in LLM Output, 2024.11, [paper]
- (GLOBE) Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models, 2024.06, [paper]
- (GLOBE) ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models, ACL 2024, [paper][code]
- (SVO) Heterogeneous Value Alignment Evaluation for Large Language Models, AAAI 2024 Workshop, [paper][code]
- (Others & custom) Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values?, 2025.01, [paper]
- (Others & custom) Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches, 2024.04, [paper]
- (Others & custom) Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing, 2024.06, [paper]
- (Others & custom) Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models, 2024.06, [paper]
- (Others & custom) Measuring Spiritual Values and Bias of Large Language Models, 2024.10, [paper]
- (Others & custom) LocalValueBench: A Collaboratively Built and Extensible Benchmark for Evaluating Localized Value Alignment and Ethical Safety in Large Language Models, 2024.08, [paper]
- (Others & custom) Are Large Language Models Consistent over Value-laden Questions?, EMNLP 2024, [paper]
- (Others & custom) CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility, 2023.07, [paper]
- (Others & custom) DO MINDFULNESS ACTIVITIES IMPROVE HANDGRIP STRENGTH AMONG OLDER ADULTS: A PROPENSITY SCORE MATCHING APPROACH, 2024.12, Innovation in Aging, [paper]
- (Others & custom) Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions, 2025.04, [paper]
- (Others & custom) Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas, 2025.05, [paper]
- (Others & custom) EAVIT: Efficient and Accurate Human Value Identification from Text data via LLMs, 2025.05, [paper]
- (Others & custom) Do Language Models Think Consistently? A Study of Value Preferences Across Varying Response Lengths, 2025.06, [paper]
- (Others & custom) Measurement of LLMβs Philosophies of Human Nature, 2025.04, [paper] [code]
- (MFT) Moral Foundations of Large Language Models, EMNLP 2024, [paper]
- (MFT) Whose Morality Do They Speak? Unraveling Cultural Bias in Multilingual Language Models, 2024.12, [paper]
- (MFT) Does Moral Code Have a Moral Code? Probing Delphi's Moral Philosophy, NAACL 2022 Workshop, [paper]
- (MFT) MoralBench: Moral Evaluation of LLMs, 2024.06, [paper][code]
- (MFT) Towards "Differential AI Psychology" and in-context Value-driven Statement Alignment with Moral Foundations Theory, 2024.08, [paper]
- (MFT) Analyzing the Ethical Logic of Six Large Language Models, 2025.01, [paper]
- (MFT) Are Large Language Models Moral Hypocrites? A Study Based on Moral Foundations, AIES 2024, [paper]
- (MFT) AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories, 2023.01, Perspectives on Psychological Science, [paper][code]
- (MFT) Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity, ACL 2023 Workshop, [paper]
- (MFT) Exploring and steering the moral compass of Large Language Models, ICPR 2024, [paper]
- (MFT) M3oralBench: A MultiModal Moral Benchmark for LVLMs, 2024.12, [paper]
- (MFT) CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses, NeurIPS 2024, [paper]
- (MFT) Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?, NAACL 2024 Findings, [paper]
- (MFT) The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral Dilemmas, 2025.05, [paper]
- (ETHICS) Despite "super-human" performance, current LLMs are unsuited for decisions about ethics and safety, NeurIPS 2022 Workshop, [paper]
- (ETHICS) Inducing Human-like Biases in Moral Reasoning Language Models, 2024.11, [paper]
- (ETHICS) An Evaluation of GPT-4 on the ETHICS Dataset, 2023.09, [paper]
- (ETHICS) EALM: Introducing Multidimensional Ethical Alignment in Conversational Information Retrieval, SIGIR-AP 2023, [paper][code]
- (DIT) Do Moral Judgment and Reasoning Capability of LLMs Change with Language? A Study using the Multilingual Defining Issues Test, 2024.02, [paper]
- (DIT) Probing the Moral Development of Large Language Models through Defining Issues Test, 2023.09, [paper]
- (Others & Custom) Large-scale moral machine experiment on large language models, 2024.11, [paper]
- (Others & Custom) SaGE: Evaluating Moral Consistency in Large Language Models, LREC-COLING 2024, [paper]
- (Others & Custom) DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life, 2024.10, [paper]
- (Others & Custom) The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making,2024.10, [paper]
- (Others & Custom) Potential benefits of employing large language models in research in moral education and development, 2023.01, Journal of Moral Education, [paper]
- (Others & Custom) Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment, 2024.11, [paper]
- (Others & Custom) Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing, 2024.06, [paper]
- (Others & Custom) When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment, NeurIPS 2022, [paper][code]
- (Others & Custom) Does Cross-Cultural Alignment Change the Commonsense Morality of Language Models?, C3NLP 2024, [paper]
- (Others & Custom) Western, Religious or Spiritual: An Evaluation of Moral Justification in Large Language Models, 2023.11, [paper]
- (Others & Custom) Evaluating Moral Beliefs across LLMs through a Pluralistic Framework, 2024.11, [paper]
- (Others & Custom) LLMs as mirrors of societal moral standards: reflection of cultural divergence and agreement across ethical topics, 2024.12, [paper]
- (Others & Custom) Analyzing the Ethical Logic of Six Large Language Model, 2025.01, [paper]
- (Others & Custom) Extended Japanese Commonsense Morality Dataset with Masked Token and Label Enhancement, CIKM '24 (Short Paper), [paper]
- (Others & Custom) What does AI consider praiseworthy?, 2025.02, AI and Ethics, [paper]
- (Others & Custom) Knowledge of cultural moral norms in large language models, ACL 2023, [paper]
- (Others & Custom) Normative Evaluation of Large Language Models with Everyday Moral Dilemmas, 2025.01, [paper]
- (Others & Custom) Evaluating the Moral Beliefs Encoded in LLMs, NeurIPS 2023, [paper]
- (Others & Custom) The Moral Mind(s) of Large Language Models, 2024.12, [paper]
- (Others & Custom) The moral machine experiment on large language models, 2024.02, Royal Society Open Science, [paper]
- (Others & Custom) Probing the Moral Development of Large Language Models through Defining Issues Test, 2023.09, [paper]
- (Others & Custom) Decoding Multilingual Moral Preferences: Unveiling LLM's Biases through the Moral Machine Experiment, AIES 2024, [paper]
- (Others & Custom) Right vs. Right: Can LLMs Make Tough Choices?, 2024.12, [paper]
- (Culture) Cultural tendencies in generative AI, 2025.06, Nature Human Behaviour, [paper]
- (ANES) Out of One, Many: Using Language Models to Simulate Human Samples, 2023.02, Political Analysis, [paper]
- (ANES) Synthetic Replacements for Human Survey Data? The Perils of Large Language Models, 2024.05, Political Analysis, [paper]
- (ANES) CommunityLM: Probing Partisan Worldviews from Language Models, COLING 2022, [paper]
- (ANES) Representation Bias in Political Sample Simulations with Large Language Models, 2024.07, [paper]
- (ANES) Random Silicon Sampling: Simulating Human Sub-Population Opinion Using a Large Language Model Based on Group-Level Demographic Information, 2024.02, [paper]
- (ANES) Unpacking Political Bias in Large Language Models: A Cross-Model Comparison on U.S. Politics, 2024.12, [paper]
- (ATP) Out of One, Many: Using Language Models to Simulate Human Samples, 2023.02, Political Analysis, [paper]
- (ATP) Whose Opinions Do Language Models Reflect?, ICML 2023, [paper]
- (ATP) Do LLMs Exhibit Human-like Response Biases? A Case Study in Survey Design, 2024.09, Transactions of the Association for Computational Linguistics (TACL), [paper]
- (GLES) Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction, 2025.02, [paper]
- (GLES) Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study, 2024.12, [paper]
- (GLES) Representation Bias in Political Sample Simulations with Large Language Models, 2024.07, [paper]
- (GLES) Vox Populi, Vox AI? Using Language Models to Estimate German Public Opinion, 2024.07, [paper]
- (PCT) PRISM: A Methodology for Auditing Biases in Large Language Models, 2024.10, [paper]
- (PCT) Mapping and Influencing the Political Ideology of Large Language Models using Synthetic Personas, 2024.12, [paper]
- (PCT) The political ideology of conversational AI: Converging evidence on ChatGPT's pro-environmental, left-libertarian orientation, 2023.01, [paper]
- (PCT) Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models, ACL 2024, [paper]
- (PCT) The Political Biases of ChatGPT, 2023.01, Social Sciences, [paper]
- (PCT) The Self-Perception and Political Biases of ChatGPT, 2024.07, [paper]
- (PCT) Revealing Fine-Grained Values and Opinions in Large Language Models, EMNLP 2024 Findings, [paper]
- (Others & custom) The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models, EMNLP 2024 Findings, [paper]
- (Others & custom) Beyond Prompt Brittleness: Evaluating the Reliability and Consistency of Political Worldviews in LLMs, 2024.11, Transactions of the Association for Computational Linguistics (TACL), [paper]
- (Others & custom) Llama meets EU: Investigating the European Political Spectrum through the Lens of LLMs, NAACL 2024 (Short Paper), [paper]
- (Others & custom) Questioning the Survey Responses of Large Language Models, NeurIPS 2024, [paper]
- (Others & custom) Towards Measuring the Representation of Subjective Global Opinions in Language Models, 2023.06, [paper][code]
- (Others & custom) Only a Little to the Left: A Theory-grounded Measure of Political Bias in Large Language Models, 2025.03, [paper]
- (Others & custom) From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models, ACL 2023, [paper]
- (Others & custom) Are Large Language Models Chameleons? An Attempt to Simulate Social Surveys, 2024.05, [paper]
- (Others & custom) Improving GPT Generated Synthetic Samples with Sampling-Permutation Algorithm, 2023.08, [paper]
- (Others & custom) AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction, 2023.05, [paper]
- (Others & custom) Linear Representations of Political Perspective Emerge in Large Language Models, 2025.03, [paper]
- (Others & custom) Can large language models estimate public opinion about global warming? An empirical assessment of algorithmic fidelity and bias, 2024.08, PLOS Climate, [paper]
- (Others & custom) How Accurate are GPT-3βs Hypotheses About Social Science Phenomena?, 2023.07, Digital Society, [paper]
- (Others & custom) IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance, 2025.02, [paper]
- (Others & custom) The Political Biases of ChatGPT, 2023.01, Social Sciences, [paper]
- (Others & custom) Demonstrations of the Potential of AI-based Political Issue Polling, 2023.07, Harvard Data Science Review (HDSR), [paper]
- (Others & custom) Large Language Models Can Be Used to Estimate the Latent Positions of Politicians, 2023.03, [paper]
- (Others & custom) Better Aligned with Survey Respondents or Training Data? Unveiling Political Leanings of LLMs on U.S. Supreme Court Cases, 2025.02, [paper]
- (Others & custom) Are LLMs (Really) Ideological? An IRT-based Analysis and Alignment Tool for Perceived Socio-Economic Bias in LLMs, 2025.05, [paper]
-
Cognitive Network Science Reveals Bias in GPT-3, GPT-3.5 Turbo, and GPT-4 Mirroring Math Anxiety in High-School Students, 2025.04, Big Data and Cognitive Computing, [paper]
-
Evaluating Large Language Models with NeuBAROCO: Syllogistic Reasoning Ability and Human-like Biases, NALOMA IV 2023, [paper]
-
FairMonitor: A Dual-framework for Detecting Stereotypes and Biases in Large Language Models, 2024.05, [paper]
-
Using cognitive psychology to understand GPT-3, 2023.02, PNAS, Proceedings of the National Academy of Sciences, [paper][code]
-
Examining Cognitive Biases in ChatGPT 3.5 and 4 through Human Evaluation and Linguistic Comparison, AMTA 2024, [paper]
-
Do Emotions Really Affect Argument Convincingness? A Dynamic Approach with LLM-based Manipulation Checks, 2025.03, [paper]
-
CogBench: a large language model walks into a psychology lab, ICML 2024, [paper]
-
Cognitive Bias in Decision-Making with LLMs, EMNLP 2024 Findings, [paper]
-
Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT, 2023.10, Nature Computational Science, [paper]
-
Relative Value Biases in Large Language Models, CogSci 2024, [paper]
-
Evaluating Nuanced Bias in Large Language Model Free Response Answers, NLDB 2024, [paper]
-
Investigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMs, 2024.10, [paper]
-
(Ir)rationality and cognitive biases in large language models, 2024.06, Royal Society Open Science, [paper]
-
A Comprehensive Evaluation of Cognitive Biases in LLMs, 2024.10, [paper][code]
-
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval, NeurIPS 2023, [paper]
-
HANS, are you clever? Clever Hans Effect Analysis of Neural Systems, SEM 2024, [paper]
-
Metacognitive Myopia in Large Language Models, 2024.08, [paper]
-
Visual cognition in multimodal large language models, 2025.01, nature machine intelligence, [paper]
-
Development of Cognitive Intelligence in Pre-trained Language Models, EMNLP 2023, [paper]
-
CBEval: A framework for evaluating and interpreting cognitive biases in LLMs, 2024.12, [paper]
-
Can a Hallucinating Model help in Reducing Human "Hallucination"?, 2024.05, [paper]
-
Challenging the appearance of machine intelligence: Cognitive bias in LLMs and Best Practices for Adoption, 2023.04, [paper]
-
Humanlike Cognitive Patterns as Emergent Phenomena in Large Language Models, 2024.12, [paper]
-
Cognitive bias in large language models: Cautious optimism meets anti-Panglossian meliorism, 2023.11, [paper]
-
Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration, 2024.10, [paper]
-
Studying and improving reasoning in humans and machines, 2024.06, Communications Psychology, [paper]
- (Theory of Mind) Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models, EMNLP 2023 Findings, [paper][code]
- (Theory of Mind) A Review on Machine Theory of Mind, 2024.12, IEEE Transactions on Computational Social Systems, [paper]
- (Theory of Mind) A Systematic Review on the Evaluation of Large Language Models in Theory of Mind Tasks, 2025.02, [paper]
- (Theory of Mind) Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses, 2024.06, [paper]
- (Theory of Mind) NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding, EMNLP 2024 Findings, [paper][code]
- (Theory of Mind) Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models, 2024.06, [paper]
- (Theory of Mind) Understanding Social Reasoning in Language Models with Language Models, NeurIPS 2023, [paper]
- (Theory of Mind) HI-TOM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models, EMNLP 2023 Findings, [paper]
- (Theory of Mind) Does ChatGPT have Theory of Mind?, 2023.05, [paper]
- (Theory of Mind) TimeToM: Temporal Space is the Key to Unlocking the Door of Large Language Models' Theory-of-Mind, 2024.07, [paper]
- (Theory of Mind) Unveiling Theory of Mind in Large Language Models: A Parallel to Single Neurons in the Human Brain, 2023.09, [paper]
- (Theory of Mind) MMToM-QA: Multimodal Theory of Mind Question Answering, ACL 2024, [paper]
- (Theory of Mind) Comparing Humans and Large Language Models on an Experimental Protocol Inventory for Theory of Mind Evaluation (EPITOME), 2024.06, Transactions of the Association for Computational Linguistics (TACL), [paper]
- (Theory of Mind) Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models, 2025.02, [paper]
- (Theory of Mind) Theory of Mind May Have Spontaneously Emerged in Large Language Models, 2023.02, [paper][code]
- (Theory of Mind) Violation of Expectation via Metacognitive Prompting Reduces Theory of Mind Prediction Error in Large Language Models, 2023.10, [paper]
- (Theory of Mind) Theory of Mind for Multi-Agent Collaboration via Large Language Models, EMNLP 2023, [paper][code]
- (Theory of Mind) Constrained Reasoning Chains for Enhancing Theory-of-Mind in Large Language Models, PRICAI 2024, [paper]
- (Theory of Mind) Large Model Strategic Thinking, Small Model Efficiency: Transferring Theory of Mind in Large Language Models, 2024.08, [paper]
- (Theory of Mind) Boosting Theory-of-Mind Performance in Large Language Models via Prompting, 2023.04, [paper]
- (Theory of Mind) Probing the Robustness of Theory of Mind in Large Language Models, 2024.10, [paper]
- (Theory of Mind) Dissecting the Ullman Variations with a SCALPEL: Why do LLMs fail at Trivial Alterations to the False Belief Task?, 2024.06, [paper]
- (Theory of Mind) Rethinking Theory of Mind Benchmarks for LLMs: Towards A User-Centered Perspective, CHI 2025 Workshop, [paper]
- (Theory of Mind) Multi-ToM: Evaluating Multilingual Theory of Mind Capabilities in Large Language Models, 2024.11, [paper]
- (Theory of Mind) Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs, EMNLP 2022, [paper]
- (Theory of Mind) Decompose-ToM: Enhancing Theory of Mind Reasoning in Large Language Models through Simulation and Task Decomposition, 2025.01, [paper]
- (Theory of Mind) Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker, ACL 2023, [paper]
- (Theory of Mind) Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models, EACL 2024, [paper]
- (Theory of Mind) ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind, 2025.01, [paper]
- (Theory of Mind) Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common Ground, ACL 2024 Findings, [paper]
- (Theory of Mind) Testing theory of mind in large language models and humans, 2024.05, Nature Human Behaviour, [paper]
- (Theory of Mind) LLMsachieve adult human performance on higher-order theory of mind tasks, 2024.05, [paper]
- (Theory of Mind) PHAnToM: Persona-based Prompting Has An Effect on Theory-of-Mind Reasoning in Large Language Models, 2024.03, [paper]
- (Theory of Mind) ToM-LM: Delegating Theory of Mind Reasoning to External Symbolic Executors in Large Language Models, NeSy 2024, [paper]
- (Theory of Mind) Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks, 2023.02, [paper]
- (Theory of Mind) Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests, CoNLL 2023, [paper]
- (Theory of Mind) Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities, ACL 2024, [paper]
- (Theory of Mind) OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models, ACL 2024, [paper]
- (Theory of Mind) Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection, 2025.01, [paper]
- (Theory of Mind) PersuasiveToM: A Benchmark for Evaluating Machine Theory of Mind in Persuasive Dialogues, 2025.02, [paper][code]
- (Theory of Mind) AutoToM: Automated Bayesian Inverse Planning and Model Discovery for Open-ended Theory of Mind, 2025.02, [paper]
- (Theory of Mind) How FaR Are Large Language Models From Agents with Theory-of-Mind?, 2023.10, [paper]
- (Theory of Mind) Dynamic Evaluation of Large Language Models by Meta Probing Agents, ICML 2024, [paper][code]
- (Emotional Intelligence) A Literature Review on Emotional Intelligence of Large Language Models (LLMs), 2024, International Journal of Advanced Research in Computer Science, [paper]
- (Emotional Intelligence) Large Language Models and Empathy: Systematic Review, 2024.01, Journal of Medical Internet Research, [paper]
- (Emotional Intelligence) EmotionQueen: A Benchmark for Evaluating Empathy of Large Language Models, ACL 2024 Findings, [paper]
- (Emotional Intelligence) ChatGPT outperforms humans in emotional awareness evaluations, 2023.05, Frontiers in Psychology, Emotion Science, [paper]
- (Emotional Intelligence) EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models, 2025.02, [paper][code]
- (Emotional Intelligence) Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench, NeurIPS 2024, [paper][code]
- (Emotional Intelligence) Large Language Models Produce Responses Perceived to be Empathic, 2024.03, [paper]
- (Emotional Intelligence) Large Language Models Understand and Can be Enhanced by Emotional Stimuli, LLM@IJCAI'23, [paper][code]
- (Emotional Intelligence) EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models, 2023.12, [paper][code]
- (Emotional Intelligence) dentification and Description of Emotions by Current Large Language Models, 2023.07, [paper]
- (Emotional Intelligence) EmoBench: Evaluating the Emotional Intelligence of Large Language Models, 2024.02, [paper][code]
- (Emotional Intelligence) Exploring ChatGPTβs Empathic Abilities, ACII 2023, [paper]
- (Emotional Intelligence) The Emotional Intelligence of the GPT-4 Large Language Model, 2024.06, Psychology in Russia: State of the Art, [paper]
- (Emotional Intelligence) Are Large Language Models More Empathetic than Humans?, 2024.06, [paper]
- (Emotional Intelligence) Both Matter: Enhancing the Emotional Intelligence of Large Language Models without Compromising the General Intelligence, ACL 2024 Findings, [paper]
- (Emotional Intelligence) Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models, 2025.05, [paper]
- (Social Intelligence) DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding, EMNLP 2023, [paper]
- (Social Intelligence) SocialAI 0.1: Towards a Benchmark to Stimulate Research on Socio-Cognitive Abilities in Deep Reinforcement Learning Agents, NAACL 2021 Workshop, [paper][code]
- (Social Intelligence) Do LLM Agents Exhibit Social Behavior?, 2023.12, [paper]
- (Social Intelligence) AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents, 2024.01, [paper]
- (Social Intelligence) Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View, 2024.05, [paper]
- (Social Intelligence) Advancing Social Intelligence in AI Agents: Technical Challenges and Open Questions, EMNLP 2024, [paper]
- (Social Intelligence) Large language models can outperform humans in social situational judgments, 2024.11, Scientific Reports, [paper]
- (Social Intelligence) AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios, 2024.10, [paper][code]
- (Social Intelligence) How well DoLarge Language Models Perform on Faux Pas Tests?, ACL 2023 Findings, [paper]
- (Social Intelligence) Towards Objectively Benchmarking Social Intelligence for Language Agents at Action Level, ACL 2024 Findings, [paper]
- (Social Intelligence) Emotional intelligence of Large Language Models, 2023.11, Journal of Pacific Rim Psychology, [paper][code]
- (Social Intelligence) Academically intelligent LLMs are not necessarily socially intelligent, 2024.03, [paper]
- (Social Intelligence) SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents, 2023.10, [paper]
-
(Language comprehension) Language Model Behavior: A Comprehensive Survey, 2023.05, Computational Linguistics(CL), [paper]
-
(Language comprehension) Large Language Models for Psycholinguistic Plausibility Pretesting, EACL 2024 Findings, [paper]
-
(Language comprehension) Syntactic Surprisal From Neural Models Predicts, But Underestimates, Human Processing Difficulty From Syntactic Ambiguities, CoNLL 2022, [paper]
-
(Language comprehension) GPT-4 Surpassing Human Performance in Linguistic Pragmatics, 2023.12, [paper]
-
(Language comprehension) HLB: Benchmarking LLMs' Humanlikeness in Language Use, 2024.09, [paper]
-
(Language comprehension) Large Language Models as Neurolinguistic Subjects: Discrepancy in Performance and Competence for Form and Meaning, 2024.11, [paper]
-
(Language comprehension) Do large language models and humans have similar behaviors in causal inference with script knowledge?, SEM 2024, [paper][code]
-
(Language comprehension) Prompt-based methods may underestimate large language modelsβ linguistic generalizations, 2023.07, [paper]
-
(Language comprehension) Towards a Psychology of Machines: Large Language Models Predict Human Memory, 2024.03, [paper]
-
(Language comprehension) How to Make the Most of LLMs' Grammatical Knowledge for Acceptability Judgments, 2024.08, [paper]
-
(Language comprehension) A Psycholinguistic Evaluation of Language Models' Sensitivity to Argument Roles, 2024.10, [paper]
-
(Language comprehension) Incremental Comprehension of Garden-Path Sentences by Large Language Models: Semantic Interpretation, Syntactic Re-Analysis, and Attention, 2024.05, [paper]
-
(Language comprehension) Evaluating Grammatical Well-Formedness in Large Language Models: A Comparative Study with Human Judgments, CMCL 2024 Workshop, [paper]
-
(Language comprehension) The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs, NeurIPS 2023, [paper]
-
(Language comprehension) Long-form analogies generated by chatGPT lack human-like psycholinguistic properties, CogSci 2023, [paper]
-
(Language comprehension) Large GPT-like Models are Bad Babies: A Closer Look at the Relationship between Linguistic Competence and Psycholinguistic Measures, CoNLL 2023, [paper]
-
(Language comprehension) Computational Sentence-level Metrics Predicting Human Sentence Comprehension, 2024.03, [paper]
-
(Language comprehension) Are Large Language Models Capable of Generating Human-Level Narratives?, EMNLP 2024, [paper]
-
(Language comprehension) How can large language models become more human?, CMCL 2024, [paper]
-
(Language comprehension) A Targeted Assessment of Incremental Processing in Neural LanguageModels and Humans, ACL 2021, [paper]
-
(Language comprehension) Divergences between Language Models and Human Brains, NeurIPS 2024, [paper]
-
(Language generation) Divergent Creativity in Humans and Large Language Models, 2024.05, [paper]
-
(Language generation) The Crowdless Future? Generative AI and Creative Problem-Solving, 2024.08, Organization Science, [paper]
-
(Language generation) Do large language models resemble humans in language use?, CMCL 2024 Workshop, [paper]
-
(Language generation) Art or Artifice? Large Language Models and the False Promise of Creativity, CHI 2024, [paper]
-
(Language generation) Artificial Intelligence is More Creative Than Humans: A Cognitive Science Perspective on the Current State of Generative Language Models, 2023.09, [paper]
-
(Language generation) An empirical investigation of the impact of ChatGPT on creativity, 2024.08, Nature Human Behaviour, [paper]
-
(Language generation) Evaluating Large Language Models via Linguistic Profiling, EMNLP 2024, [paper]
-
(Language generation) The Language of Creativity: Evidence from Humans and Large Language Models, 2024.01, The Journal of Creative Behavior, [paper]
-
(Language generation) Long-form analogies generated by chatGPT lack human-like psycholinguistic properties, CogSci 2023, [paper]
-
(Language generation) Putting GPT-3's Creativity to the (Alternative Uses) Test, ICCC 2022 (Short Paper), [paper]
-
(Language generation) Humanlike Cognitive Patterns as Emergent Phenomena in Large Language Models, 2024.12, [paper]
-
(Language generation) Are Large Language Models Capable of Generating Human-Level Narratives?, 2024.07, [paper]
-
(Language acquisition) Bridging the data gap between children and large language models, 2023.11, Trends in Cognitive Sciences (TICS) [paper]
-
(Language acquisition) PsychomaticsβA Multidisciplinary Framework for Understanding Artificial Minds, 2024.04, Cyberpsychology, Behavior, and Social Networking, [paper]
-
(Language acquisition) Development of Cognitive Intelligence in Pre-trained Language Models, 2024.07, [paper]
-
(Language acquisition) Large GPT-like Models are Bad Babies: A Closer Look at the Relationship between Linguistic Competence and Psycholinguistic Measures, CoNLL 2023, [paper]
-
Large Language Models and Cognitive Science: A Comprehensive Review of Similarities, Differences, and Challenges, 2024.09, [paper]
-
CogBench: a large language model walks into a psychology lab, ICML 2024, [paper]
-
Age against the machineβsusceptibility of large language models to cognitive impairment: cross sectional analysis, 2024.12, The BMJ(British Medical Journal), [paper]
-
The Cognitive Capabilities of Generative AI: A Comparative Analysis with Human Benchmarks, 2024.10, [paper]
-
CogGPT: Unleashing the Power of Cognitive Dynamics on Large Language Models, EMNLP 2024 Findings, [paper]
-
Language models and psychological sciences, 2023.10, Frontiers in Psychology, [paper]
-
M3GIA: A Cognition Inspired Multilingual and Multimodal General Intelligence Ability Benchmark, 2024.06, [paper]
-
CogLM: Tracking Cognitive Development of Large Language Models, 2024.08, [paper]
-
Emergent analogical reasoning in large language models, 2023.07, Nature Human Behaviour, [paper]
-
Understanding LLMs' Fluid Intelligence Deficiency: An Analysis of the ARC Task, 2025.02, [paper][code]
-
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs, 2024.06, [paper]
-
Exploring the Cognitive Knowledge Structure of Large Language Models: An Educational Diagnostic Assessment Approach, EMNLP 2023 (Short Paper), [paper]
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-LLM-Psychometrics
Similar Open Source Tools

Awesome-LLM-Psychometrics
This repository contains a collection of tools and resources for conducting psychometric analysis in the context of latent variable modeling. It includes scripts for data preprocessing, model estimation, and results interpretation. The tools provided here aim to assist researchers and practitioners in the field of psychology and related disciplines to analyze complex relationships among latent variables using advanced statistical techniques.

God-Level-AI
A drill of scientific methods, processes, algorithms, and systems to build stories & models. An in-depth learning resource for humans. This repository is designed for individuals aiming to excel in the field of Data and AI, providing video sessions and text content for learning. It caters to those in leadership positions, professionals, and students, emphasizing the need for dedicated effort to achieve excellence in the tech field. The content covers various topics with a focus on practical application.

datasets
Datasets is a repository that provides a collection of various datasets for machine learning and data analysis projects. It includes datasets in different formats such as CSV, JSON, and Excel, covering a wide range of topics including finance, healthcare, marketing, and more. The repository aims to help data scientists, researchers, and students access high-quality datasets for training models, conducting experiments, and exploring data analysis techniques.

Awesome-RL-for-LRMs
This repository contains a collection of awesome resources for reinforcement learning in language models. It includes tutorials, code implementations, research papers, and tools to help researchers and practitioners explore and apply reinforcement learning techniques in natural language processing tasks. Whether you are a beginner or an expert in the field, this repository aims to provide valuable insights and guidance to enhance your understanding and implementation of reinforcement learning in language models.

AIT
AIT is a repository focused on Algorithmic Information Theory, specifically utilizing Binary Lambda Calculus. It provides resources and tools for studying and implementing algorithms based on information theory principles. The repository aims to explore the relationship between algorithms and information theory through the lens of Binary Lambda Calculus, offering insights into computational complexity and data compression techniques.

llm_rl
llm_rl is a repository that combines llm (language model) and rl (reinforcement learning) techniques. It likely focuses on using language models in reinforcement learning tasks, such as natural language understanding and generation. The repository may contain implementations of algorithms that leverage both llm and rl to improve performance in various tasks. Developers interested in exploring the intersection of language models and reinforcement learning may find this repository useful for research and experimentation.

Data-Science-EBooks
This repository contains a collection of resources in the form of eBooks related to Data Science, Machine Learning, and similar topics.

llm_related
llm_related is a repository that documents issues encountered and solutions found during the application of large models. It serves as a knowledge base for troubleshooting and problem-solving in the context of working with complex models in various applications.

LLMs-in-Finance
This repository focuses on the application of Large Language Models (LLMs) in the field of finance. It provides insights and knowledge about how LLMs can be utilized in various scenarios within the finance industry, particularly in generating AI agents. The repository aims to explore the potential of LLMs to enhance financial processes and decision-making through the use of advanced natural language processing techniques.

turftopic
Turftopic is a Python library that provides tools for sentiment analysis and topic modeling of text data. It allows users to analyze large volumes of text data to extract insights on sentiment and topics. The library includes functions for preprocessing text data, performing sentiment analysis using machine learning models, and conducting topic modeling using algorithms such as Latent Dirichlet Allocation (LDA). Turftopic is designed to be user-friendly and efficient, making it suitable for both beginners and experienced data analysts.

RD-Agent
RD-Agent is a tool designed to automate critical aspects of industrial R&D processes, focusing on data-driven scenarios to streamline model and data development. It aims to propose new ideas ('R') and implement them ('D') automatically, leading to solutions of significant industrial value. The tool supports scenarios like Automated Quantitative Trading, Data Mining Agent, Research Copilot, and more, with a framework to push the boundaries of research in data science. Users can create a Conda environment, install the RDAgent package from PyPI, configure GPT model, and run various applications for tasks like quantitative trading, model evolution, medical prediction, and more. The tool is intended to enhance R&D processes and boost productivity in industrial settings.

evaluation-guidebook
The LLM Evaluation guidebook provides comprehensive guidance on evaluating language model performance, including different evaluation methods, designing evaluations, and practical tips. It caters to both beginners and advanced users, offering insights on model inference, tokenization, and troubleshooting. The guide covers automatic benchmarks, human evaluation, LLM-as-a-judge scenarios, troubleshooting practicalities, and general knowledge on LLM basics. It also includes planned articles on automated benchmarks, evaluation importance, task-building considerations, and model comparison challenges. The resource is enriched with recommended links and acknowledgments to contributors and inspirations.

AI-Agents-for-Medical-Diagnostics
AI Agents for Medical Diagnostics is a repository containing a collection of machine learning models and algorithms designed to assist in medical diagnosis. The tools provided in this repository are specifically tailored for analyzing medical data and making predictions related to various health conditions. By leveraging the power of artificial intelligence, these agents aim to improve the accuracy and efficiency of diagnostic processes in the medical field. Researchers, healthcare professionals, and data scientists can benefit from the resources available in this repository to develop innovative solutions for diagnosing illnesses and predicting patient outcomes.

SolarLLMZeroToAll
SolarLLMZeroToAll is a comprehensive repository that provides a step-by-step guide and resources for learning and implementing Solar Longitudinal Learning Machines (SolarLLM) from scratch. The repository covers various aspects of SolarLLM, including theory, implementation, and applications, making it suitable for beginners and advanced users interested in solar energy forecasting and machine learning. The materials include detailed explanations, code examples, datasets, and visualization tools to facilitate understanding and practical implementation of SolarLLM models.

dyad
Dyad is a lightweight Python library for analyzing dyadic data, which involves pairs of individuals and their interactions. It provides functions for computing various network metrics, visualizing network structures, and conducting statistical analyses on dyadic data. Dyad is designed to be user-friendly and efficient, making it suitable for researchers and practitioners working with relational data in fields such as social network analysis, communication studies, and psychology.

Main
This repository contains material related to the new book _Synthetic Data and Generative AI_ by the author, including code for NoGAN, DeepResampling, and NoGAN_Hellinger. NoGAN is a tabular data synthesizer that outperforms GenAI methods in terms of speed and results, utilizing state-of-the-art quality metrics. DeepResampling is a fast NoGAN based on resampling and Bayesian Models with hyperparameter auto-tuning. NoGAN_Hellinger combines NoGAN and DeepResampling with the Hellinger model evaluation metric.
For similar tasks

Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customerβs subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.

sorrentum
Sorrentum is an open-source project that aims to combine open-source development, startups, and brilliant students to build machine learning, AI, and Web3 / DeFi protocols geared towards finance and economics. The project provides opportunities for internships, research assistantships, and development grants, as well as the chance to work on cutting-edge problems, learn about startups, write academic papers, and get internships and full-time positions at companies working on Sorrentum applications.

tidb
TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

zep-python
Zep is an open-source platform for building and deploying large language model (LLM) applications. It provides a suite of tools and services that make it easy to integrate LLMs into your applications, including chat history memory, embedding, vector search, and data enrichment. Zep is designed to be scalable, reliable, and easy to use, making it a great choice for developers who want to build LLM-powered applications quickly and easily.

telemetry-airflow
This repository codifies the Airflow cluster that is deployed at workflow.telemetry.mozilla.org (behind SSO) and commonly referred to as "WTMO" or simply "Airflow". Some links relevant to users and developers of WTMO: * The `dags` directory in this repository contains some custom DAG definitions * Many of the DAGs registered with WTMO don't live in this repository, but are instead generated from ETL task definitions in bigquery-etl * The Data SRE team maintains a WTMO Developer Guide (behind SSO)

mojo
Mojo is a new programming language that bridges the gap between research and production by combining Python syntax and ecosystem with systems programming and metaprogramming features. Mojo is still young, but it is designed to become a superset of Python over time.

pandas-ai
PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps you to explore, clean, and analyze your data using generative AI.

databend
Databend is an open-source cloud data warehouse that serves as a cost-effective alternative to Snowflake. With its focus on fast query execution and data ingestion, it's designed for complex analysis of the world's largest datasets.
For similar jobs

Perplexica
Perplexica is an open-source AI-powered search engine that utilizes advanced machine learning algorithms to provide clear answers with sources cited. It offers various modes like Copilot Mode, Normal Mode, and Focus Modes for specific types of questions. Perplexica ensures up-to-date information by using SearxNG metasearch engine. It also features image and video search capabilities and upcoming features include finalizing Copilot Mode and adding Discover and History Saving features.

KULLM
KULLM (ꡬλ¦) is a Korean Large Language Model developed by Korea University NLP & AI Lab and HIAI Research Institute. It is based on the upstage/SOLAR-10.7B-v1.0 model and has been fine-tuned for instruction. The model has been trained on 8ΓA100 GPUs and is capable of generating responses in Korean language. KULLM exhibits hallucination and repetition phenomena due to its decoding strategy. Users should be cautious as the model may produce inaccurate or harmful results. Performance may vary in benchmarks without a fixed system prompt.

MMMU
MMMU is a benchmark designed to evaluate multimodal models on college-level subject knowledge tasks, covering 30 subjects and 183 subfields with 11.5K questions. It focuses on advanced perception and reasoning with domain-specific knowledge, challenging models to perform tasks akin to those faced by experts. The evaluation of various models highlights substantial challenges, with room for improvement to stimulate the community towards expert artificial general intelligence (AGI).

1filellm
1filellm is a command-line data aggregation tool designed for LLM ingestion. It aggregates and preprocesses data from various sources into a single text file, facilitating the creation of information-dense prompts for large language models. The tool supports automatic source type detection, handling of multiple file formats, web crawling functionality, integration with Sci-Hub for research paper downloads, text preprocessing, and token count reporting. Users can input local files, directories, GitHub repositories, pull requests, issues, ArXiv papers, YouTube transcripts, web pages, Sci-Hub papers via DOI or PMID. The tool provides uncompressed and compressed text outputs, with the uncompressed text automatically copied to the clipboard for easy pasting into LLMs.

gpt-researcher
GPT Researcher is an autonomous agent designed for comprehensive online research on a variety of tasks. It can produce detailed, factual, and unbiased research reports with customization options. The tool addresses issues of speed, determinism, and reliability by leveraging parallelized agent work. The main idea involves running 'planner' and 'execution' agents to generate research questions, seek related information, and create research reports. GPT Researcher optimizes costs and completes tasks in around 3 minutes. Features include generating long research reports, aggregating web sources, an easy-to-use web interface, scraping web sources, and exporting reports to various formats.

ChatTTS
ChatTTS is a generative speech model optimized for dialogue scenarios, providing natural and expressive speech synthesis with fine-grained control over prosodic features. It supports multiple speakers and surpasses most open-source TTS models in terms of prosody. The model is trained with 100,000+ hours of Chinese and English audio data, and the open-source version on HuggingFace is a 40,000-hour pre-trained model without SFT. The roadmap includes open-sourcing additional features like VQ encoder, multi-emotion control, and streaming audio generation. The tool is intended for academic and research use only, with precautions taken to limit potential misuse.

HebTTS
HebTTS is a language modeling approach to diacritic-free Hebrew text-to-speech (TTS) system. It addresses the challenge of accurately mapping text to speech in Hebrew by proposing a language model that operates on discrete speech representations and is conditioned on a word-piece tokenizer. The system is optimized using weakly supervised recordings and outperforms diacritic-based Hebrew TTS systems in terms of content preservation and naturalness of generated speech.

do-research-in-AI
This repository is a collection of research lectures and experience sharing posts from frontline researchers in the field of AI. It aims to help individuals upgrade their research skills and knowledge through insightful talks and experiences shared by experts. The content covers various topics such as evaluating research papers, choosing research directions, research methodologies, and tips for writing high-quality scientific papers. The repository also includes discussions on academic career paths, research ethics, and the emotional aspects of research work. Overall, it serves as a valuable resource for individuals interested in advancing their research capabilities in the field of AI.