TrustLLM

[ICML 2024] TrustLLM: Trustworthiness in Large Language Models

Stars: 535

Visit

TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.

README:

Toolkit for "TrustLLM: Trustworthiness in Large Language Models"

Updates & News

[02/20/2024] Our new work TrustGen and TrustEval toolkit has been released! TrustGen provides a comprehensive guidelines, wssessment, and perspective for trustworthiness across multiple generative models, and TrustEval offers a dynamic evaluation platform.
[01/09/2024] TrustLLM toolkit has been downloaded for 4000+ times!
[15/07/2024] TrustLLM now supports UniGen for dynamic evaluation.
[02/05/2024] 🥂 TrustLLM has been accepted by ICML 2024! See you in Vienna!
[23/04/2024] ⭐ Version 0.3.0: Major updates including bug fixes, enhanced evaluation, and new models added (including ChatGLM3, Llama3-8b, Llama3-70b, GLM4, Mixtral). (See details)
[20/03/2024] ⭐ Version 0.2.4: Fixed many bugs & Support Gemini Pro API
[01/02/2024] 📄 Version 0.2.2: See our new paper about the awareness in LLMs! (link)
[29/01/2024] ⭐ Version 0.2.1: trustllm toolkit now supports (1) Easy evaluation pipeline (2) LLMs in replicate and deepinfra (3) Azure OpenAI API
[20/01/2024] ⭐ Version 0.2.0 of trustllm toolkit is released! See the new features.
[12/01/2024] 🏄 The dataset, leaderboard, and evaluation toolkit are released!

👂TL;DR

TrustLLM (ICML 2024) is a comprehensive framework for studying trustworthiness of large language models, which includes principles, surveys, and benchmarks.
This code repository is designed to provide an easy toolkit for evaluating the trustworthiness of LLMs (See our docs).

Table of Content

Toolkit for "TrustLLM: Trustworthiness in Large Language Models"

🙋 About TrustLLM

We introduce TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.

🧹 Before Evaluation

Installation

Create a new environment:

conda create --name trustllm python=3.9

Installation via Github (recommended):

git clone [email protected]:HowieHwong/TrustLLM.git
cd TrustLLM/trustllm_pkg
pip install .

Installation via pip (deprecated):

pip install trustllm

Installation via conda (deprecated):

conda install -c conda-forge trustllm

Dataset Download

Download TrustLLM dataset:

from trustllm.dataset_download import download_dataset

download_dataset(save_path='save_path')

Generation

We have added generation section from version 0.2.0. Start your generation from this page. Here is an example:

from trustllm.generation.generation import LLMGeneration

llm_gen = LLMGeneration(
    model_path="your model name", 
    test_type="test section", 
    data_path="your dataset file path",
    model_name="", 
    online_model=False, 
    use_deepinfra=False,
    use_replicate=False,
    repetition_penalty=1.0,
    num_gpus=1, 
    max_new_tokens=512, 
    debug=False,
    device='cuda:0'
)

llm_gen.generation_results()

🙌 Evaluation

We have provided a toolkit that allows you to more conveniently assess the trustworthiness of large language models. Please refer to the document for more details. Here is an example:

from trustllm.task.pipeline import run_truthfulness

truthfulness_results = run_truthfulness(  
    internal_path="path_to_internal_consistency_data.json",  
    external_path="path_to_external_consistency_data.json",  
    hallucination_path="path_to_hallucination_data.json",  
    sycophancy_path="path_to_sycophancy_data.json",
    advfact_path="path_to_advfact_data.json"
)

🛎️ Dataset & Task

Dataset overview:

✓ the dataset is from prior work, and ✗ means the dataset is first proposed in our benchmark.

Dataset	Description	Num.	Exist?	Section
SQuAD2.0	It combines questions in SQuAD1.1 with over 50,000 unanswerable questions.	100	✓	Misinformation
CODAH	It contains 28,000 commonsense questions.	100	✓	Misinformation
HotpotQA	It contains 113k Wikipedia-based question-answer pairs for complex multi-hop reasoning.	100	✓	Misinformation
AdversarialQA	It contains 30,000 adversarial reading comprehension question-answer pairs.	100	✓	Misinformation
Climate-FEVER	It contains 7,675 climate change-related claims manually curated by human fact-checkers.	100	✓	Misinformation
SciFact	It contains 1,400 expert-written scientific claims pairs with evidence abstracts.	100	✓	Misinformation
COVID-Fact	It contains 4,086 real-world COVID claims.	100	✓	Misinformation
HealthVer	It contains 14,330 health-related claims against scientific articles.	100	✓	Misinformation
TruthfulQA	The multiple-choice questions to evaluate whether a language model is truthful in generating answers to questions.	352	✓	Hallucination
HaluEval	It contains 35,000 generated and human-annotated hallucinated samples.	300	✓	Hallucination
LM-exp-sycophancy	A dataset consists of human questions with one sycophancy response example and one non-sycophancy response example.	179	✓	Sycophancy
Opinion pairs	It contains 120 pairs of opposite opinions.	240, 120	✗	Sycophancy, Preference
WinoBias	It contains 3,160 sentences, split for development and testing, created by researchers familiar with the project.	734	✓	Stereotype
StereoSet	It contains the sentences that measure model preferences across gender, race, religion, and profession.	734	✓	Stereotype
Adult	The dataset, containing attributes like sex, race, age, education, work hours, and work type, is utilized to predict salary levels for individuals.	810	✓	Disparagement
Jailbreak Trigger	The dataset contains the prompts based on 13 jailbreak attacks.	1300	✗	Jailbreak, Toxicity
Misuse (additional)	This dataset contains prompts crafted to assess how LLMs react when confronted by attackers or malicious users seeking to exploit the model for harmful purposes.	261	✗	Misuse
Do-Not-Answer	It is curated and filtered to consist only of prompts to which responsible LLMs do not answer.	344 + 95	✓	Misuse, Stereotype
AdvGLUE	A multi-task dataset with different adversarial attacks.	912	✓	Natural Noise
AdvInstruction	600 instructions generated by 11 perturbation methods.	600	✗	Natural Noise
ToolE	A dataset with the users' queries which may trigger LLMs to use external tools.	241	✓	Out of Domain (OOD)
Flipkart	A product review dataset, collected starting from December 2022.	400	✓	Out of Domain (OOD)
DDXPlus	A 2022 medical diagnosis dataset comprising synthetic data representing about 1.3 million patient cases.	100	✓	Out of Domain (OOD)
ETHICS	It contains numerous morally relevant scenarios descriptions and their moral correctness.	500	✓	Implicit Ethics
Social Chemistry 101	It contains various social norms, each consisting of an action and its label.	500	✓	Implicit Ethics
MoralChoice	It consists of different contexts with morally correct and wrong actions.	668	✓	Explicit Ethics
ConfAIde	It contains the description of how information is used.	196	✓	Privacy Awareness
Privacy Awareness	It includes different privacy information queries about various scenarios.	280	✗	Privacy Awareness
Enron Email	It contains approximately 500,000 emails generated by employees of the Enron Corporation.	400	✓	Privacy Leakage
Xstest	It's a test suite for identifying exaggerated safety behaviors in LLMs.	200	✓	Exaggerated Safety

Task overview:

○ means evaluation through the automatic scripts (e.g., keywords matching), ● means the automatic evaluation by ChatGPT, GPT-4 or longformer, and ◐ means the mixture evaluation.

More trustworthy LLMs are expected to have a higher value of the metrics with ↑ and a lower value with ↓.

Task Name	Metrics	Type	Eval	Section
Closed-book QA	Accuracy (↑)	Generation	○	Misinformation(Internal)
Fact-Checking	Macro F-1 (↑)	Classification	●	Misinformation(External)
Multiple Choice QA	Accuracy (↑)	Classification	●	Hallucination
Hallucination Classification	Accuracy (↑)	Classification	●	Hallucination
Persona Sycophancy	Embedding similarity (↑)	Generation	◐	Sycophancy
Opinion Sycophancy	Percentage change (↓)	Generation	○	Sycophancy
Factuality Correction	Percentage change (↑)	Generation	○	Adversarial Factuality
Jailbreak Attack Evaluation	RtA (↑)	Generation	○	Jailbreak
Toxicity Measurement	Toxicity Value (↓)	Generation	●	Toxicity
Misuse Evaluation	RtA (↑)	Generation	○	Misuse
Exaggerated Safety Evaluation	RtA (↓)	Generation	○	Exaggerated Safety
Agreement on Stereotypes	Accuracy (↑)	Generation	◐	Stereotype
Recognition of Stereotypes	Agreement Percentage (↓)	Classification	◐	Stereotype
Stereotype Query Test	RtA (↑)	Generation	○	Stereotype
Preference Selection	RtA (↑)	Generation	○	Preference
Salary Prediction	p-value (↑)	Generation	●	Disparagement
Adversarial Perturbation in Downstream Tasks	ASR (↓), RS (↑)	Generation	◐	Natural Noise
Adversarial Perturbation in Open-Ended Tasks	Embedding similarity (↑)	Generation	◐	Natural Noise
OOD Detection	RtA (↑)	Generation	○	Out of Domain (OOD)
OOD Generalization	Micro F1 (↑)	Classification	○	Out of Domain (OOD)
Agreement on Privacy Information	Pearson's correlation (↑)	Classification	●	Privacy Awareness
Privacy Scenario Test	RtA (↑)	Generation	○	Privacy Awareness
Probing Privacy Information Usage	RtA (↑), Accuracy (↓)	Generation	◐	Privacy Leakage
Moral Action Judgement	Accuracy (↑)	Classification	◐	Implicit Ethics
Moral Reaction Selection (Low-Ambiguity)	Accuracy (↑)	Classification	◐	Explicit Ethics
Moral Reaction Selection (High-Ambiguity)	RtA (↑)	Generation	○	Explicit Ethics
Emotion Classification	Accuracy (↑)	Classification	●	Emotional Awareness

🏆 Leaderboard

If you want to view the performance of all models or upload the performance of your LLM, please refer to this link.

📣 Contribution

We welcome your contributions, including but not limited to the following:

New evaluation datasets
Research on trustworthy issues
Improvements to the toolkit

If you intend to make improvements to the toolkit, please fork the repository first, make the relevant modifications to the code, and finally initiate a pull request.

⏰ TODO in Coming Versions

[x] Faster and simpler evaluation pipeline (Version 0.2.1)
[x] Dynamic dataset (UniGen)
[ ] More fine-grained datasets
[ ] Chinese output evaluation
[ ] Downstream application evaluation

Citation

@inproceedings{huang2024trustllm,
  title={TrustLLM: Trustworthiness in Large Language Models},
  author={Yue Huang and Lichao Sun and Haoran Wang and Siyuan Wu and Qihui Zhang and Yuan Li and Chujie Gao and Yixin Huang and Wenhan Lyu and Yixuan Zhang and Xiner Li and Hanchi Sun and Zhengliang Liu and Yixin Liu and Yijue Wang and Zhikun Zhang and Bertie Vidgen and Bhavya Kailkhura and Caiming Xiong and Chaowei Xiao and Chunyuan Li and Eric P. Xing and Furong Huang and Hao Liu and Heng Ji and Hongyi Wang and Huan Zhang and Huaxiu Yao and Manolis Kellis and Marinka Zitnik and Meng Jiang and Mohit Bansal and James Zou and Jian Pei and Jian Liu and Jianfeng Gao and Jiawei Han and Jieyu Zhao and Jiliang Tang and Jindong Wang and Joaquin Vanschoren and John Mitchell and Kai Shu and Kaidi Xu and Kai-Wei Chang and Lifang He and Lifu Huang and Michael Backes and Neil Zhenqiang Gong and Philip S. Yu and Pin-Yu Chen and Quanquan Gu and Ran Xu and Rex Ying and Shuiwang Ji and Suman Jana and Tianlong Chen and Tianming Liu and Tianyi Zhou and William Yang Wang and Xiang Li and Xiangliang Zhang and Xiao Wang and Xing Xie and Xun Chen and Xuyu Wang and Yan Liu and Yanfang Ye and Yinzhi Cao and Yong Chen and Yue Zhao},
  booktitle={Forty-first International Conference on Machine Learning},
  year={2024},
  url={https://openreview.net/forum?id=bWUU0LwwMp}
}

License

The code in this repository is open source under the MIT license.

For Tasks:

Click tags to check more tools for each tasks

evaluate trustworthiness of llms benchmark llms analyze llms research llms develop llms

For Jobs:

researcher data scientist machine learning engineer ai developer software engineer

Alternative AI tools for TrustLLM

Similar Open Source Tools

TrustLLM

github

: 535

COLD-Attack

COLD-Attack is a framework designed for controllable jailbreaks on large language models (LLMs). It formulates the controllable attack generation problem and utilizes the Energy-based Constrained Decoding with Langevin Dynamics (COLD) algorithm to automate the search of adversarial LLM attacks with control over fluency, stealthiness, sentiment, and left-right-coherence. The framework includes steps for energy function formulation, Langevin dynamics sampling, and decoding process to generate discrete text attacks. It offers diverse jailbreak scenarios such as fluent suffix attacks, paraphrase attacks, and attacks with left-right-coherence.

github

: 84

unoplat-code-confluence

github

: 55

IDvs.MoRec

This repository contains the source code for the SIGIR 2023 paper 'Where to Go Next for Recommender Systems? ID- vs. Modality-based Recommender Models Revisited'. It provides resources for evaluating foundation, transferable, multi-modal, and LLM recommendation models, along with datasets, pre-trained models, and training strategies for IDRec and MoRec using in-batch debiased cross-entropy loss. The repository also offers large-scale datasets, code for SASRec with in-batch debias cross-entropy loss, and information on joining the lab for research opportunities.

github

: 119

EasyEdit

EasyEdit is a Python package for edit Large Language Models (LLM) like `GPT-J`, `Llama`, `GPT-NEO`, `GPT2`, `T5`(support models from **1B** to **65B**), the objective of which is to alter the behavior of LLMs efficiently within a specific domain without negatively impacting performance across other inputs. It is designed to be easy to use and easy to extend.

github

: 2.2k

HuatuoGPT-II

HuatuoGPT2 is an innovative domain-adapted medical large language model that excels in medical knowledge and dialogue proficiency. It showcases state-of-the-art performance in various medical benchmarks, surpassing GPT-4 in expert evaluations and fresh medical licensing exams. The open-source release includes HuatuoGPT2 models in 7B, 13B, and 34B versions, training code for one-stage adaptation, partial pre-training and fine-tuning instructions, and evaluation methods for medical response capabilities and professional pharmacist exams. The tool aims to enhance LLM capabilities in the Chinese medical field through open-source principles.

github

: 308

PredictorLLM

PredictorLLM is an advanced trading agent framework that utilizes large language models to automate trading in financial markets. It includes a profiling module to establish agent characteristics, a layered memory module for retaining and prioritizing financial data, and a decision-making module to convert insights into trading strategies. The framework mimics professional traders' behavior, surpassing human limitations in data processing and continuously evolving to adapt to market conditions for superior investment outcomes.

github

: 57

EVE

EVE is an official PyTorch implementation of Unveiling Encoder-Free Vision-Language Models. The project aims to explore the removal of vision encoders from Vision-Language Models (VLMs) and transfer LLMs to encoder-free VLMs efficiently. It also focuses on bridging the performance gap between encoder-free and encoder-based VLMs. EVE offers a superior capability with arbitrary image aspect ratio, data efficiency by utilizing publicly available data for pre-training, and training efficiency with a transparent and practical strategy for developing a pure decoder-only architecture across modalities.

github

: 155

amber-train

Amber is the first model in the LLM360 family, an initiative for comprehensive and fully open-sourced LLMs. It is a 7B English language model with the LLaMA architecture. The model type is a language model with the same architecture as LLaMA-7B. It is licensed under Apache 2.0. The resources available include training code, data preparation, metrics, and fully processed Amber pretraining data. The model has been trained on various datasets like Arxiv, Book, C4, Refined-Web, StarCoder, StackExchange, and Wikipedia. The hyperparameters include a total of 6.7B parameters, hidden size of 4096, intermediate size of 11008, 32 attention heads, 32 hidden layers, RMSNorm ε of 1e^-6, max sequence length of 2048, and a vocabulary size of 32000.

github

: 136

LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing

LLM-PowerHouse is a comprehensive and curated guide designed to empower developers, researchers, and enthusiasts to harness the true capabilities of Large Language Models (LLMs) and build intelligent applications that push the boundaries of natural language understanding. This GitHub repository provides in-depth articles, codebase mastery, LLM PlayLab, and resources for cost analysis and network visualization. It covers various aspects of LLMs, including NLP, models, training, evaluation metrics, open LLMs, and more. The repository also includes a collection of code examples and tutorials to help users build and deploy LLM-based applications.

github

: 648

langfuse

Langfuse is a powerful tool that helps you develop, monitor, and test your LLM applications. With Langfuse, you can: * **Develop:** Instrument your app and start ingesting traces to Langfuse, inspect and debug complex logs, and manage, version, and deploy prompts from within Langfuse. * **Monitor:** Track metrics (cost, latency, quality) and gain insights from dashboards & data exports, collect and calculate scores for your LLM completions, run model-based evaluations, collect user feedback, and manually score observations in Langfuse. * **Test:** Track and test app behaviour before deploying a new version, test expected in and output pairs and benchmark performance before deploying, and track versions and releases in your application. Langfuse is easy to get started with and offers a generous free tier. You can sign up for Langfuse Cloud or deploy Langfuse locally or on your own infrastructure. Langfuse also offers a variety of integrations to make it easy to connect to your LLM applications.

github

: 10.1k

DeepRetrieval

DeepRetrieval is a tool designed to enhance search engines and retrievers using Large Language Models (LLMs) and Reinforcement Learning (RL). It allows LLMs to learn how to search effectively by integrating with search engine APIs and customizing reward functions. The tool provides functionalities for data preparation, training, evaluation, and monitoring search performance. DeepRetrieval aims to improve information retrieval tasks by leveraging advanced AI techniques.

github

: 198

llm4ad

LLM4AD is an open-source Python-based platform leveraging Large Language Models (LLMs) for Automatic Algorithm Design (AD). It provides unified interfaces for methods, tasks, and LLMs, along with features like evaluation acceleration, secure evaluation, logs, GUI support, and more. The platform was originally developed for optimization tasks but is versatile enough to be used in other areas such as machine learning, science discovery, game theory, and engineering design. It offers various search methods and algorithm design tasks across different domains. LLM4AD supports remote LLM API, local HuggingFace LLM deployment, and custom LLM interfaces. The project is licensed under the MIT License and welcomes contributions, collaborations, and issue reports.

github

: 294

Botright

Botright is a tool designed for browser automation that focuses on stealth and captcha solving. It uses a real Chromium-based browser for enhanced stealth and offers features like browser fingerprinting and AI-powered captcha solving. The tool is suitable for developers looking to automate browser tasks while maintaining anonymity and bypassing captchas. Botright is available in async mode and can be easily integrated with existing Playwright code. It provides solutions for various captchas such as hCaptcha, reCaptcha, and GeeTest, with high success rates. Additionally, Botright offers browser stealth techniques and supports different browser functionalities for seamless automation.

github

: 396

awesome-mobile-llm

Awesome Mobile LLMs is a curated list of Large Language Models (LLMs) and related studies focused on mobile and embedded hardware. The repository includes information on various LLM models, deployment frameworks, benchmarking efforts, applications, multimodal LLMs, surveys on efficient LLMs, training LLMs on device, mobile-related use-cases, industry announcements, and related repositories. It aims to be a valuable resource for researchers, engineers, and practitioners interested in mobile LLMs.

github

: 154

vlmrun-cookbook

VLM Run Cookbook is a repository containing practical examples and tutorials for extracting structured data from images, videos, and documents using Vision Language Models (VLMs). It offers comprehensive Colab notebooks demonstrating real-world applications of VLM Run, with complete code and documentation for easy adaptation. The examples cover various domains such as financial documents and TV news analysis.

github

: 259

For similar tasks

TrustLLM

github

: 535

deepeval

DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.

github

: 5.8k

bench

Bench is a tool for evaluating LLMs for production use cases. It provides a standardized workflow for LLM evaluation with a common interface across tasks and use cases. Bench can be used to test whether open source LLMs can do as well as the top closed-source LLM API providers on specific data, and to translate the rankings on LLM leaderboards and benchmarks into scores that are relevant for actual use cases.

github

: 321

llm-autoeval

LLM AutoEval is a tool that simplifies the process of evaluating Large Language Models (LLMs) using a convenient Colab notebook. It automates the setup and execution of evaluations using RunPod, allowing users to customize evaluation parameters and generate summaries that can be uploaded to GitHub Gist for easy sharing and reference. LLM AutoEval supports various benchmark suites, including Nous, Lighteval, and Open LLM, enabling users to compare their results with existing models and leaderboards.

github

: 398

moonshot

Moonshot is a simple and modular tool developed by the AI Verify Foundation to evaluate Language Model Models (LLMs) and LLM applications. It brings Benchmarking and Red-Teaming together to assist AI developers, compliance teams, and AI system owners in assessing LLM performance. Moonshot can be accessed through various interfaces including User-friendly Web UI, Interactive Command Line Interface, and seamless integration into MLOps workflows via Library APIs or Web APIs. It offers features like benchmarking LLMs from popular model providers, running relevant tests, creating custom cookbooks and recipes, and automating Red Teaming to identify vulnerabilities in AI systems.

github

: 196

llm_client

llm_client is a Rust interface designed for Local Large Language Models (LLMs) that offers automated build support for CPU, CUDA, MacOS, easy model presets, and a novel cascading prompt workflow for controlled generation. It provides a breadth of configuration options and API support for various OpenAI compatible APIs. The tool is primarily focused on deterministic signals from probabilistic LLM vibes, enabling specialized workflows for specific tasks and reproducible outcomes.

github

: 123

LLM-Synthetic-Data

LLM-Synthetic-Data is a repository focused on real-time, fine-grained LLM-Synthetic-Data generation. It includes methods, surveys, and application areas related to synthetic data for language models. The repository covers topics like pre-training, instruction tuning, model collapse, LLM benchmarking, evaluation, and distillation. It also explores application areas such as mathematical reasoning, code generation, text-to-SQL, alignment, reward modeling, long context, weak-to-strong generalization, agent and tool use, vision and language, factuality, federated learning, generative design, and safety.

github

: 101

llm-random

This repository contains code for research conducted by the LLM-Random research group at IDEAS NCBR in Warsaw, Poland. The group focuses on developing and using this repository to conduct research. For more information about the group and its research, refer to their blog, llm-random.github.io.

github

: 181

For similar jobs

promptflow

**Prompt flow** is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.

github

: 9.2k

deepeval

github

: 5.8k

MegaDetector

MegaDetector is an AI model that identifies animals, people, and vehicles in camera trap images (which also makes it useful for eliminating blank images). This model is trained on several million images from a variety of ecosystems. MegaDetector is just one of many tools that aims to make conservation biologists more efficient with AI. If you want to learn about other ways to use AI to accelerate camera trap workflows, check out our of the field, affectionately titled "Everything I know about machine learning and camera traps".

github

: 106

leapfrogai

LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.

github

: 255

llava-docker

This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.

github

: 59

carrot

The 'carrot' repository on GitHub provides a list of free and user-friendly ChatGPT mirror sites for easy access. The repository includes sponsored sites offering various GPT models and services. Users can find and share sites, report errors, and access stable and recommended sites for ChatGPT usage. The repository also includes a detailed list of ChatGPT sites, their features, and accessibility options, making it a valuable resource for ChatGPT users seeking free and unlimited GPT services.

github

: 17.1k

TrustLLM

github

: 535

AI-YinMei

AI-YinMei is an AI virtual anchor Vtuber development tool (N card version). It supports fastgpt knowledge base chat dialogue, a complete set of solutions for LLM large language models: [fastgpt] + [one-api] + [Xinference], supports docking bilibili live broadcast barrage reply and entering live broadcast welcome speech, supports Microsoft edge-tts speech synthesis, supports Bert-VITS2 speech synthesis, supports GPT-SoVITS speech synthesis, supports expression control Vtuber Studio, supports painting stable-diffusion-webui output OBS live broadcast room, supports painting picture pornography public-NSFW-y-distinguish, supports search and image search service duckduckgo (requires magic Internet access), supports image search service Baidu image search (no magic Internet access), supports AI reply chat box [html plug-in], supports AI singing Auto-Convert-Music, supports playlist [html plug-in], supports dancing function, supports expression video playback, supports head touching action, supports gift smashing action, supports singing automatic start dancing function, chat and singing automatic cycle swing action, supports multi scene switching, background music switching, day and night automatic switching scene, supports open singing and painting, let AI automatically judge the content.

github

: 529