Awesome-LLM-in-Social-Science
Awesome papers involving LLMs in Social Science.
Stars: 261
Awesome-LLM-in-Social-Science is a repository that compiles papers evaluating Large Language Models (LLMs) from a social science perspective. It includes papers on evaluating, aligning, and simulating LLMs, as well as enhancing tools in social science research. The repository categorizes papers based on their focus on attitudes, opinions, values, personality, morality, and more. It aims to contribute to discussions on the potential and challenges of using LLMs in social science research.
README:
Below we compile awesome papers that
- evaluate Large Language Models (LLMs) from a perspective of Social Science.
- align LLMs from a perspective of Social Science.
- employ LLMs to facilitate research, address issues, and enhance tools in Social Science.
- contribute surveys or perspectives on the above topics.
Evaluation, alignment, and simulation are by no means orthogonal. For example, evaluations require simulations. We categorize these papers based on our understanding of their focus. This collection has a special focus on Psychology and Human Values.
Welcome to contribute and discuss!
🤩 Papers with a ⭐️
are contributed by the maintainers of this repository. We would appreciate it if you could give us a star or cite our paper if you find it useful.
-
- 2.1. ❤️ Value
- 2.2. 🩷 Personality
- 2.3. 🔞 Morality
- 2.4. 🎤 Opinion
- 2.5. 🧠 Ability
- Automated Mining of Structured Knowledge from Text in the Era of Large Language Models, 2024.08, KDD 2024, [paper].
- Affective Computing in the Era of Large Language Models: A Survey from the NLP Perspective, 2024.07, [paper].
- Perils and opportunities in using large language models in psychological research, 2024.07, [paper].
- The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models, 2024.06, [paper].
- Can Generative AI improve social science?, 2024.05, PNAS, [paper].
- Foundational Challenges in Assuring Alignment and Safety of Large Language Models, 2024.04, [paper].
- Large Language Model based Multi-Agents: A Survey of Progress and Challenges, 2024.01, [paper], [repo].
- The Rise and Potential of Large Language Model Based Agents: A Survey, 2023, [paper], [repo].
- A Survey on Large Language Model based Autonomous Agents, 2023, [paper], [repo].
- AI Alignment: A Comprehensive Survey, 2023.11, [paper], [website].
- Aligning Large Language Models with Human: A Survey, 2023, [paper], [repo].
- Large Language Model Alignment: A Survey, 2023, [paper].
- Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives, 2023.12, [paper].
- A Survey on Evaluation of Large Language Models, 2023.07, [paper], [repo].
- From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models, 2023.08, [paper], [repo].
- Quantifying ai psychology: A psychometrics benchmark for large language models, 2024.07, [paper].
-
⭐️ Measuring Human and AI Values based on Generative Psychometrics with Large Language Models, 2024.09, [paper], [code].
-
⭐️ ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models, ACL 2024, [paper], [code].
-
Stick to your role! Stability of personal values expressed in large language models, 2024.08, [paper].
-
Do LLMs have Consistent Values?, 2024.07, [paper].
-
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses, 2024.07, [paper].
-
Are Large Language Models Consistent over Value-laden Questions?, 2024.07, [paper].
-
Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches, 2024.04, [paper].
-
Heterogeneous Value Evaluation for Large Language Models, 2023.03, [paper], [code].
TL;DR: This paper introduces the A2EHV method to assess how well these models align with a range of human values categorized under the Social Value Orientation (SVO) framework.
-
Measuring Value Understanding in Language Models through Discriminator-Critique Gap, 2023.10, [paper].
TL;DR: This paper introduces Value Understanding Measurement (VUM) framework to quantitatively assess an LLM's understanding of values. This is done by measuring the discriminator-critique gap (DCG), which evaluates both the model's knowledge of values ("know what") and the reasoning behind this knowledge ("know why").
-
Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Values, 2023.11, [paper].
-
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties, AAAI24, [paper], [code].
-
High-Dimension Human Value Representation in Large Language Models, 2024.04, [paper], [code].
-
Incharacter: Evaluating personality fidelity in role-playing agents through psychological interviews, ACL 2024, [paper], [code]
-
[MBTI] Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models, 2024.01, [paper]
-
Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench, ICLR 2024, [paper], [code]
-
[BFI] AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories, Journal, 2024.01, [paper]
-
Does Role-Playing Chatbots Capture the Character Personalities? Assessing Personality Traits for Role-Playing Chatbots, 2023.10, [paper]
-
[MBTI] Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models, 2023.07, [paper]
-
[MBTI] Can ChatGPT Assess Human Personalities? A General Evaluation Framework, 2023.03, EMNLP 2023, [paper], [code].
-
[BFI] Personality Traits in Large Language Models, 2023.07, [paper]
-
[BFI] Revisiting the Reliability of Psychological Scales on Large Language Models, 2023.05, [paper]
-
[BFI] Systematic Evaluation of GPT-3 for Zero-Shot Personality Estimation, ACL 2023 workshop, [paper]
-
[BFI] Have Large Language Models Developed a Personality?: Applicability of Self-Assessment Tests in Measuring Personality in LLMs, 2023.05, [paper]
-
[BFI] Evaluating and Inducing Personality in Pre-trained Language Models, NeurIPS 2023 (spotlight), [paper]
-
[BFI] Identifying and Manipulating the Personality Traits of Language Models, 2022,12, [paper]
-
Who is GPT-3? An Exploration of Personality, Values and Demographics, 2022.09, [paper]
-
Does GPT-3 Demonstrate Psychopathy? Evaluating Large Language Models from a Psychological Perspective, 2022.12, [paper]
-
Aligning AI With Shared Human Values, 2020, [paper].
-
Exploring the psychology of GPT-4's Moral and Legal Reasoning, 2023.08, [paper].
TL;DR: The paper investigates GPT-4's moral and legal reasoning compared to humans across several domains, using vignette-based studies. It reveals significant parallels and differences in GPT-4's responses, offering insights into its alignment with human moral judgments.
-
Probing the Moral Development of Large Language Models through Defining Issues Test
TL;DR: Defining Issues Test (DIT) based on Kohlberg's model of moral development is used to evaluate the ethical reasoning abilities of LLMs. GPT-3 performs at random baseline level while GPT-4 achieves the highest moral development score equivalent to graduate students.
-
Moral Foundations of Large Language Models, 2023.10, [paper].
-
Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity, 2023.06, [paper]
-
Evaluating the Moral Beliefs Encoded in LLMs, 2023.07, [paper]
-
More human than human: measuring ChatGPT political bias, 2023, [paper].
TL;DR: This paper proposed empirical designs to measure political bias in ChatGPT, showing that ChatGPT exhibits a significant and systematic political bias towards the Democrats in the US, Lula in Brazil, and the Labour Party in the UK.
-
Towards Measuring the Representation of Subjective Global Opinions in Language Models, 2023.07, [paper], [website].
TL;DR: This study explores how to quantitatively assess the representation of subjective global opinions in LLMs. It introduces a dataset from cross-national surveys to capture diverse global perspectives, and develops a metric to measure the similarity between LLM-generated responses and human responses conditioned on nationality, revealing biases and stereotypes in the model's responses.
-
Can Language Models Reason about Individualistic Human Values and Preferences?, 2024.10, [paper].
-
Language Models in Sociological Research: An Application to Classifying Large Administrative Data and Measuring Religiosity, 2021, [paper].
-
Can Large Language Models Transform Computational Social Science?, 2023, [paper], [code].
TL;DR: This document provides a roadmap for using LLMs as CSS tools, including prompting best practices and an evaluation pipeline. Evaluations show that LLMs can serve as zero-shot data annotators and assist with challenging creative generation tasks.
-
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents, 2023, [paper], [code].
TL;DR: The paper introduces SOTOPIA, a novel interactive environment for evaluating social intelligence in language agents through goal-driven social interactions. Experiments using SOTOPIA reveal gaps between SOTA models and human social intelligence, despite models showing some promising capabilities.
-
Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View, 2023, [paper], [code].
TL;DR: This paper explores collaboration mechanisms among LLMs in a multi-agent system by drawing insights from social psychology. Multi-agent collaboration strategies are more important than scaling up single LLMs; fostering effective collaboration is key for more socially-aware AI.
-
Playing repeated games with Large Language Models, 2023.05, [paper].
TL;DR: This paper studies Large Language Models' (LLMs) cooperative and coordinated behavior by letting them play repeated 2-player games. The key findings are that LLMs like GPT-4 perform well in competitive games but struggle to coordinate and alternate strategies in games requiring more cooperation.
-
Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods, 2023, [paper].
-
Using cognitive psychology to understand GPT-3, 2023.02, PNAS, [paper].
-
Large language models as a substitute for human experts in annotating political text, 2024.02, [paper].
-
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements, 2024, [paper], [code].
-
ChatFive: Enhancing User Experience in Likert Scale Personality Test through Interactive Conversation with LLM Agents, CUI 2024, [paper]
-
LLM Agents for Psychology: A Study on Gamified Assessments, 2024.02, [paper].
-
Generative Social Choice, 2023.09, [paper]
-
PAD: Personalized Alignment at Decoding-Time, 2024.10, [paper].
-
Moral Alignment for LLM Agents, 2024.10, [paper].
-
ProgressGym: Alignment with a Millennium of Moral Progress, NeurIPS 2024 D&B Tract Spotlight, [paper], [code].
-
Policy Prototyping for LLMs: Pluralistic Alignment via Interactive and Collaborative Policymaking, 2024.09, [paper].
-
Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration, 2024.06, [paper].
-
[Value] What are human values, and how do we align AI to them?, 2024.04, [paper].
-
A Roadmap to Pluralistic Alignment, ICML 2024, [paper], [code].
-
Agent Alignment in Evolving Social Norms, 2024.01, [paper].
-
[Norm] Align on the Fly: Adapting Chatbot Behavior to Established Norms, 2023.12, [paper], [code].
TL;DR: Using RAG to align LLMs with dynamic, diverse human values such as social norms.
-
[MBTI] Machine Mindset: An MBTI Exploration of Large Language Models, 2023.12, [paper], [code].
TL;DR: Train LLM toward certain MBTI via instruction tuning and direct preference optimization (DPO).
-
Training Socially Aligned Language Models in Simulated Human Society, 2023, [paper], [code].
Keywords: Stable Alignment, social alignment, societal norms and values, simulated social interactions, contrastive supervised learning
TL;DR: This paper presents a training paradigm that permits LMs to learn from simulated social interactions for their social alignment. The model trained under such a paradigm better handles “jailbreaking prompts”.
-
Fine-tuning language models to find agreement among humans with diverse preferences, 2022, [paper].
Keywords: consensus, fine-tuning, diverse preferences, alignment
TL;DR: This work fine-tunes LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions, especially on moral and political issues.
-
ValueNet: A New Dataset for Human Value Driven Dialogue System, AAAI 2022, [paper], [dataset].
-
Large Language Models can Achieve Social Balance, 2024.10, [paper].
-
On the limits of agency in agent-based models, 2024.09, [paper], [code].
-
United in Diversity? Contextual Biases in LLM-Based Predictions of the 2024 European Parliament Elections, 2024.09, [paper].
-
Out of One, Many: Using Language Models to Simulate Human Samples, 2022, [paper].
TL;DR: This work introduces "algorithmic fidelity" - the degree to which the relationships between ideas, attitudes, and contexts in a model mirror those in human groups. They propose 4 criteria for assessing algorithmic fidelity and demonstrate that GPT-3 exhibits a high degree of fidelity for modeling public opinion and political attitudes in the U.S.
-
Social Simulacra: Creating Populated Prototypes for Social Computing Systems, 2022, [paper].
Keywords: social computing prototypes, social simulacra, LLMs, system design refinement
TL;DR: This paper proposes Social Simulacra, a social computing prototype, to mimic authentic social interactions within a system populated by diverse community members, each with distinct behaviors such as posts, replies, and anti-social tendencies.
-
Generative Agents: Interactive Simulacra of Human Behavior, 2023, [paper], [code].
Keywords: generative agents, sandbox environment, natural language communication, emergent social behaviors, Smallville
TL;DR: This paper introduces generative agents and their architecture for memory storage, reflection, retrieval, etc. The agents produce believable individual and emergent social behaviors in an interactive sandbox environment.
-
Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies, 2023, [paper], [code].
TL;DR: This paper presents a methodology for simulating Turing Experiments (TEs) and applies it to replicate well-established findings from economic, psycholinguistic, and social psychology experiments. The results show that larger language models provide more faithful simulations, except for a "hyper-accuracy distortion" (being unhumanly accurate) present in some recent models.
-
Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?, 2023 [paper], [code].
TL;DR: LLMs can be used like economists use homo economicus. Experiments using LLMs show qualitatively similar results to the original economic research. It is promising to use LLM to search for novel social science insights to test in the real world.
-
$S^3$: Social-network Simulation System with Large Language Model-Empowered Agents, 2023, [paper].
Keywords: social network simulation, agent-based simulation, information/attitude/emotion propagation, user behavior modeling
TL;DR: This paper introduces the Social-network Simulation System (S3) to simulate social networks via LLM-based agents. Evaluations using two real-world scenarios, namely gender discrimination and nuclear energy, display high accuracy in replicating individual attitudes, emotions, and behaviors, as well as successfully modeling the phenomena of information, attitude, and emotion propagation at the population level.
-
Rethinking the Buyer’s Inspection Paradox in Information Markets with Language Agents, 2023, [paper].
Keywords: buyer’s inspection paradox, information economics, information market, language model, agent
TL;DR: This work explores the buyer's inspection paradox in a simulated information marketplace, highlighting enhanced decision-making and answer quality when agents temporarily access information before purchase.
-
SocioDojo: Building Lifelong Analytical Agents with Real-world Text and Time Series, 2023, [paper].
Keywords: lifelong learning, human society analysis, hyperportfolio, time series investment, Analyst-Assistant-Actuator architecture, Hypothesis and Proof prompting
TL;DR: The paper introduces SocioDojo, a new environment and hyperportfolio task for training lifelong agents to analyze and make decisions about human society, along with a novel Analyst-Assistant-Actuator architecture and Hypothesis & Proof prompting technique. Experiments show the proposed method achieves over 30% higher returns compared to state-of-the-art methods in the hyperportfolio task requiring societal understanding.
-
Humanoid Agents: Platform for Simulating Human-like Generative Agents, 2023, [paper], [code].
Keywords: humanoid agents, generative agents, basic needs, emotions, relationships
TL;DR: This paper proposes Humanoid Agents, a system that guides generative agents to behave more like humans by introducing dynamic elements that affect behavior - basic needs like hunger and rest, emotions, and relationship closeness.
-
When Large Language Model based Agent Meets User Behavior Analysis: A Novel User Simulation Paradigm, 2023, [paper], [code].
Keywords: user behavior analysis, user simulation, recommender system, profiling/memory/action module
TL;DR: This work employs LLM for user simulation in recommender systems. The experiments demonstrate the superiority of RecAgent over baseline simulation systems and its ability to generate reliable user behaviors.
-
Large Language Model-Empowered Agents for Simulating Macroeconomic Activities, 2023, [paper].
Keywords: macroeconomic simulation, agent-based modeling, prompt-engineering, perception/reflection/decision-making abilities
TL;DR: This work leverages LLM-based agents for macroeconomic simulation. Experiments show that LLM-based agents make realistic decisions, reproducing classic macro phenomena better than rule-based or other AI agents.
-
Generative Agent-Based Modeling: Unveiling Social System Dynamics through Coupling Mechanistic Models with Generative Artificial Intelligence, 2023, [paper].
Keywords: Generative Agend-Based Modeling, norm diffusion, social dynamics
TL;DR: The authors demonstrate Generative Agent-Based Modeling (GABM) through a simple model of norm diffusion, where agents decide on wearing green or blue shirts based on peer influence. The results show emergence of group norms, sensitivity to agent personas, and conformity to asymmetric adoption forces.
-
Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models, 2023.06, NeurIPS 2023, [paper].
TL;DR: We present a new algorithm for using outputs from LLMs for downstream statistic analyses while guaranteeing statistical properties -- like asymptotic unbiasedness and proper uncertainty quantification -- which are fundamental to CSS research.
-
Epidemic Modeling with Generative Agents, 2023.07, [paper], [code].
Keywords: epidemic modeling, generative AI, agent-based model, human behavior, COVID-19
TL;DR: The paper presents a new epidemic modeling approach using generative AI to empower individual agents with reasoning ability. The generative agent-based model collectively flattens the epidemic curve, mimicking patterns like multiple waves, through AI-powered decision-making without imposed rules.
-
Emergent analogical reasoning in large language models, 2023.08, nature human behavior, [paper].
Keywords: GPT-3, Analogical Reasoning, Zero-Shot Learning, Cognitive Processes, Human Comparison
TL;DR: This paper investigates the emergent analogical reasoning capabilities of GPT-3, demonstrating its proficiency in various analogy tasks compared to college students. The research highlights GPT-3's potential in zero-shot learning and its similarity to human cognitive processes in problem-solving.
-
MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents, 2023.10, [paper].
Keywords: agent simulation, job fair environment, task-oriented coordination
TL;DR: The paper introduces "MetaAgents" to enhance coordination in LLMs through a novel collaborative and reasoning approach, tested in a simulated job fair environment. The study reveals both the potential and limitations of LLM-based agents in complex social coordination tasks.
-
War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars, 2023.11, [paper], [code].
TL;DR: This paper presents WarAgent, an AI system simulating historical conflicts, revealing how historical and policy factors critically drive the inevitability and nature of wars.
-
Emergence of Social Norms in Large Language Model-based Agent Societies, 2024.03, [paper], [code].
-
Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior, ICLR-2024, [paper] TL;DR: LLMs are not conventionally designed for predicting and optimizing human behavior. In this paper, we introduce the receivers' "behavior tokens," such as shares, likes, clicks, purchases, and retweets, in the LLM's training corpora to optimize content for the receivers and predict their behaviors. Other than showing similar performance to LLMs on content understanding tasks, our trained models show generalization capabilities on the behavior dimension for behavior simulation, content simulation, behavior understanding, and behavior domain adaptation.
-
The benefits, risks and bounds of personalizing the alignment of large language models to individuals, 2024.04, Nature Machine Intelligence, [paper].
-
A social path to human-like artificial intelligence, 2023.11, Nature Machine Intelligence, [paper].
TL;DR: This paper explores the social pathways to human intelligence, highlighting the roles of collective living, social relationships, and key evolutionary transformations in the development of intelligence.
-
Using large language models in psychology, 2023.10, Nature reviews psychology, [paper].
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-LLM-in-Social-Science
Similar Open Source Tools
Awesome-LLM-in-Social-Science
Awesome-LLM-in-Social-Science is a repository that compiles papers evaluating Large Language Models (LLMs) from a social science perspective. It includes papers on evaluating, aligning, and simulating LLMs, as well as enhancing tools in social science research. The repository categorizes papers based on their focus on attitudes, opinions, values, personality, morality, and more. It aims to contribute to discussions on the potential and challenges of using LLMs in social science research.
Awesome-LLM-in-Social-Science
This repository compiles a list of academic papers that evaluate, align, simulate, and provide surveys or perspectives on the use of Large Language Models (LLMs) in the field of Social Science. The papers cover various aspects of LLM research, including assessing their alignment with human values, evaluating their capabilities in tasks such as opinion formation and moral reasoning, and exploring their potential for simulating social interactions and addressing issues in diverse fields of Social Science. The repository aims to provide a comprehensive resource for researchers and practitioners interested in the intersection of LLMs and Social Science.
MMStar
MMStar is an elite vision-indispensable multi-modal benchmark comprising 1,500 challenge samples meticulously selected by humans. It addresses two key issues in current LLM evaluation: the unnecessary use of visual content in many samples and the existence of unintentional data leakage in LLM and LVLM training. MMStar evaluates 6 core capabilities across 18 detailed axes, ensuring a balanced distribution of samples across all dimensions.
MathVerse
MathVerse is an all-around visual math benchmark designed to evaluate the capabilities of Multi-modal Large Language Models (MLLMs) in visual math problem-solving. It collects high-quality math problems with diagrams to assess how well MLLMs can understand visual diagrams for mathematical reasoning. The benchmark includes 2,612 problems transformed into six versions each, contributing to 15K test samples. It also introduces a Chain-of-Thought (CoT) Evaluation strategy for fine-grained assessment of output answers.
Video-MME
Video-MME is the first-ever comprehensive evaluation benchmark of Multi-modal Large Language Models (MLLMs) in Video Analysis. It assesses the capabilities of MLLMs in processing video data, covering a wide range of visual domains, temporal durations, and data modalities. The dataset comprises 900 videos with 256 hours and 2,700 human-annotated question-answer pairs. It distinguishes itself through features like duration variety, diversity in video types, breadth in data modalities, and quality in annotations.
veScale
veScale is a PyTorch Native LLM Training Framework. It provides a set of tools and components to facilitate the training of large language models (LLMs) using PyTorch. veScale includes features such as 4D parallelism, fast checkpointing, and a CUDA event monitor. It is designed to be scalable and efficient, and it can be used to train LLMs on a variety of hardware platforms.
driverlessai-recipes
This repository contains custom recipes for H2O Driverless AI, which is an Automatic Machine Learning platform for the Enterprise. Custom recipes are Python code snippets that can be uploaded into Driverless AI at runtime to automate feature engineering, model building, visualization, and interpretability. Users can gain control over the optimization choices made by Driverless AI by providing their own custom recipes. The repository includes recipes for various tasks such as data manipulation, data preprocessing, feature selection, data augmentation, model building, scoring, and more. Best practices for creating and using recipes are also provided, including security considerations, performance tips, and safety measures.
awesome-hallucination-detection
This repository provides a curated list of papers, datasets, and resources related to the detection and mitigation of hallucinations in large language models (LLMs). Hallucinations refer to the generation of factually incorrect or nonsensical text by LLMs, which can be a significant challenge for their use in real-world applications. The resources in this repository aim to help researchers and practitioners better understand and address this issue.
AiEditor
AiEditor is a next-generation rich text editor for AI, based on Web Component and supporting various front-end frameworks. It offers two themes, light and dark, along with flexible configuration for developing text editing applications. The editor includes features for basic text formatting, enhancements like undo/redo and format painter, support for attachments like images and videos, code-related functionalities, table manipulation, Markdown support, AI-related features such as continuation and optimization, and more. Planned improvements include collaboration, automated testing, AI picture insertion and drawing, enhanced paste features, WORD and PDF export, Notion-like operations, and integration with ChatGPT.
MMMU
MMMU is a benchmark designed to evaluate multimodal models on college-level subject knowledge tasks, covering 30 subjects and 183 subfields with 11.5K questions. It focuses on advanced perception and reasoning with domain-specific knowledge, challenging models to perform tasks akin to those faced by experts. The evaluation of various models highlights substantial challenges, with room for improvement to stimulate the community towards expert artificial general intelligence (AGI).
ai-notes
Notes on AI state of the art, with a focus on generative and large language models. These are the "raw materials" for the https://lspace.swyx.io/ newsletter. This repo used to be called https://github.com/sw-yx/prompt-eng, but was renamed because Prompt Engineering is Overhyped. This is now an AI Engineering notes repo.
chatAir
ChatAir is a native client for ChatGPT and Gemini, designed to provide a smoother and faster chat experience than ChatGPT. It is developed natively on Android, offering efficient performance and a seamless user experience. ChatAir supports OpenAI/Gemini API calls and allows customization of server addresses. It also features Markdown support, code highlighting, customizable settings for prompts, model, temperature, history, and reply length limit, dark mode, customized themes, and image recognition function for quick and accurate image information retrieval.
long-llms-learning
A repository sharing the panorama of the methodology literature on Transformer architecture upgrades in Large Language Models for handling extensive context windows, with real-time updating the newest published works. It includes a survey on advancing Transformer architecture in long-context large language models, flash-ReRoPE implementation, latest news on data engineering, lightning attention, Kimi AI assistant, chatglm-6b-128k, gpt-4-turbo-preview, benchmarks like InfiniteBench and LongBench, long-LLMs-evals for evaluating methods for enhancing long-context capabilities, and LLMs-learning for learning technologies and applicated tasks about Large Language Models.
edgen
Edgen is a local GenAI API server that serves as a drop-in replacement for OpenAI's API. It provides multi-endpoint support for chat completions and speech-to-text, is model agnostic, offers optimized inference, and features model caching. Built in Rust, Edgen is natively compiled for Windows, MacOS, and Linux, eliminating the need for Docker. It allows users to utilize GenAI locally on their devices for free and with data privacy. With features like session caching, GPU support, and support for various endpoints, Edgen offers a scalable, reliable, and cost-effective solution for running GenAI applications locally.
llm_benchmarks
llm_benchmarks is a collection of benchmarks and datasets for evaluating Large Language Models (LLMs). It includes various tasks and datasets to assess LLMs' knowledge, reasoning, language understanding, and conversational abilities. The repository aims to provide comprehensive evaluation resources for LLMs across different domains and applications, such as education, healthcare, content moderation, coding, and conversational AI. Researchers and developers can leverage these benchmarks to test and improve the performance of LLMs in various real-world scenarios.
Endia
Endia is a dynamic Array library for Scientific Computing, offering automatic differentiation of arbitrary order, complex number support, dual API with PyTorch-like imperative or JAX-like functional interface, and JIT Compilation for speeding up training and inference. It can handle complex valued functions, perform both forward and reverse-mode automatic differentiation, and has a builtin JIT compiler. Endia aims to advance AI & Scientific Computing by pushing boundaries with clear algorithms, providing high-performance open-source code that remains readable and pythonic, and prioritizing clarity and educational value over exhaustive features.
For similar tasks
Awesome-LLM-in-Social-Science
Awesome-LLM-in-Social-Science is a repository that compiles papers evaluating Large Language Models (LLMs) from a social science perspective. It includes papers on evaluating, aligning, and simulating LLMs, as well as enhancing tools in social science research. The repository categorizes papers based on their focus on attitudes, opinions, values, personality, morality, and more. It aims to contribute to discussions on the potential and challenges of using LLMs in social science research.
For similar jobs
Perplexica
Perplexica is an open-source AI-powered search engine that utilizes advanced machine learning algorithms to provide clear answers with sources cited. It offers various modes like Copilot Mode, Normal Mode, and Focus Modes for specific types of questions. Perplexica ensures up-to-date information by using SearxNG metasearch engine. It also features image and video search capabilities and upcoming features include finalizing Copilot Mode and adding Discover and History Saving features.
KULLM
KULLM (구름) is a Korean Large Language Model developed by Korea University NLP & AI Lab and HIAI Research Institute. It is based on the upstage/SOLAR-10.7B-v1.0 model and has been fine-tuned for instruction. The model has been trained on 8×A100 GPUs and is capable of generating responses in Korean language. KULLM exhibits hallucination and repetition phenomena due to its decoding strategy. Users should be cautious as the model may produce inaccurate or harmful results. Performance may vary in benchmarks without a fixed system prompt.
MMMU
MMMU is a benchmark designed to evaluate multimodal models on college-level subject knowledge tasks, covering 30 subjects and 183 subfields with 11.5K questions. It focuses on advanced perception and reasoning with domain-specific knowledge, challenging models to perform tasks akin to those faced by experts. The evaluation of various models highlights substantial challenges, with room for improvement to stimulate the community towards expert artificial general intelligence (AGI).
1filellm
1filellm is a command-line data aggregation tool designed for LLM ingestion. It aggregates and preprocesses data from various sources into a single text file, facilitating the creation of information-dense prompts for large language models. The tool supports automatic source type detection, handling of multiple file formats, web crawling functionality, integration with Sci-Hub for research paper downloads, text preprocessing, and token count reporting. Users can input local files, directories, GitHub repositories, pull requests, issues, ArXiv papers, YouTube transcripts, web pages, Sci-Hub papers via DOI or PMID. The tool provides uncompressed and compressed text outputs, with the uncompressed text automatically copied to the clipboard for easy pasting into LLMs.
gpt-researcher
GPT Researcher is an autonomous agent designed for comprehensive online research on a variety of tasks. It can produce detailed, factual, and unbiased research reports with customization options. The tool addresses issues of speed, determinism, and reliability by leveraging parallelized agent work. The main idea involves running 'planner' and 'execution' agents to generate research questions, seek related information, and create research reports. GPT Researcher optimizes costs and completes tasks in around 3 minutes. Features include generating long research reports, aggregating web sources, an easy-to-use web interface, scraping web sources, and exporting reports to various formats.
ChatTTS
ChatTTS is a generative speech model optimized for dialogue scenarios, providing natural and expressive speech synthesis with fine-grained control over prosodic features. It supports multiple speakers and surpasses most open-source TTS models in terms of prosody. The model is trained with 100,000+ hours of Chinese and English audio data, and the open-source version on HuggingFace is a 40,000-hour pre-trained model without SFT. The roadmap includes open-sourcing additional features like VQ encoder, multi-emotion control, and streaming audio generation. The tool is intended for academic and research use only, with precautions taken to limit potential misuse.
HebTTS
HebTTS is a language modeling approach to diacritic-free Hebrew text-to-speech (TTS) system. It addresses the challenge of accurately mapping text to speech in Hebrew by proposing a language model that operates on discrete speech representations and is conditioned on a word-piece tokenizer. The system is optimized using weakly supervised recordings and outperforms diacritic-based Hebrew TTS systems in terms of content preservation and naturalness of generated speech.
do-research-in-AI
This repository is a collection of research lectures and experience sharing posts from frontline researchers in the field of AI. It aims to help individuals upgrade their research skills and knowledge through insightful talks and experiences shared by experts. The content covers various topics such as evaluating research papers, choosing research directions, research methodologies, and tips for writing high-quality scientific papers. The repository also includes discussions on academic career paths, research ethics, and the emotional aspects of research work. Overall, it serves as a valuable resource for individuals interested in advancing their research capabilities in the field of AI.