EasyEdit
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
Stars: 2012
EasyEdit is a Python package for edit Large Language Models (LLM) like `GPT-J`, `Llama`, `GPT-NEO`, `GPT2`, `T5`(support models from **1B** to **65B**), the objective of which is to alter the behavior of LLMs efficiently within a specific domain without negatively impacting performance across other inputs. It is designed to be easy to use and easy to extend.
README:
An Easy-to-use Knowledge Editing Framework for Large Language Models.
Installation • QuickStart • Doc • Paper • Demo • Benchmark • Contributors • Slides • Video • Featured By AK
- Table of Contents
- 🔔News
- Editing Demo
- Knowledge Editing
- 🌟Overview
- Requirements
- 📌Use EasyEdit
- Use EasyEdit with KnowEdit
- Editing Performance
- Citation
- 🎉Contributors
- 2024-11-19, we update the Table 4 results in the paper "A Comprehensive Study of Knowledge Editing for Large Language Models" after optimizing certain methods (related to AdaLoRA) and fixing computational bugs (related to ROME and MEMIT) in the EasyEdit (More details in https://github.com/zjunlp/EasyEdit/issues/427). These improvements have led to better results than before. We will continue updating this paper and welcome everyone to discuss and exchange ideas.
- 2024-11-11, 🎉🎉the paper on model editing for LLMs4Code, "Model Editing for LLMs4Code: How Far are We?", has been accepted by ICSE 2025! This work proposes a benchmark for LLMs4Code editing, CLMEEval, which is built upon EasyEdit!
- 2024-11-09, we fixed a bug regarding the KnowEdit results in the https://github.com/zjunlp/EasyEdit/issues/390. Thanks for the help of @StarLooo to help us with it.
- 2024-10-24, EasyEdit has added two new knowledge editing methods, AlphaEdit. In addition, we have fixed several bugs.
Previous News
-
2024-10-23, the EasyEdit integrates constrained decoding methods from steering editing to mitigate hallucination in LLM and MLLM, with detailed information available in DoLa and DeCo.
-
2024-09-26, 🎉🎉 our paper "WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models" has been accepted by NeurIPS 2024.
-
2024-09-20, 🎉🎉 our papers: "Knowledge Mechanisms in Large Language Models: A Survey and Perspective" and "Editing Conceptual Knowledge for Large Language Models" have been accepted by EMNLP 2024 Findings.
-
2024-07-29, the EasyEdit has added a new model editing algorithm EMMET, which generalizes ROME to the batch setting. This essentially allows making batched edits using the ROME loss function.
-
2024-07-23, we release a new paper: "Knowledge Mechanisms in Large Language Models: A Survey and Perspective", which reviews how knowledge is acquired, utilized, and evolves in large language models. This survey may provide the fundamental mechanisms for precisely and efficiently manipulating (editing) knowledge in LLMs.
-
2024-06-04, 🎉🎉 EasyEdit Paper has been accepted by the ACL 2024 System Demonstration Track.
-
2024-06-03, we released a paper titled "WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models", along with introducing a new editing task: Continuous Knowledge Editing and correspondding lifelong editing method called WISE.
-
2024-04-24, EasyEdit announced support for the ROME method for Llama3-8B. Users are advised to update their transformers package to version 4.40.0.
-
2024-03-29, EasyEdit introduced rollback support for GRACE. For a detailed introduction, refer to the EasyEdit documentation. Future updates will gradually include rollback support for other methods.
-
2024-03-22, a new paper titled "Detoxifying Large Language Models via Knowledge Editing" was released, along with a new dataset named SafeEdit and a new detoxification method called DINM.
-
2024-03-12, another paper titled "Editing Conceptual Knowledge for Large Language Models" was released, introducing a new dataset named ConceptEdit.
-
2024-03-01, EasyEdit added support for a new method called FT-M. This method involves training a specific MLP layer using cross-entropy loss on the target answer and masking the original text. It outperforms the FT-L implementation in ROME. The author of issue https://github.com/zjunlp/EasyEdit/issues/173 is thanked for their advice.
-
2024-02-27, EasyEdit added support for a new method called InstructEdit, with technical details provided in the paper "InstructEdit: Instruction-based Knowledge Editing for Large Language Models".
- 2024-02-09, the EasyEdit has added the support for the Dynamic LoRA model editing method MELO'AAAI24.
- 2024-02-06, we release a new paper: "EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models" with an HF demo EasyInstruct.
- 2024-02-06, we release a preliminary tool EasyDetect for LLM hallucination detection,with a demo.
- 2024-01-24, the EasyEdit has added the support for editing Mistral-7B (manually update transformers==4.34.0), we have also fixed some bugs in evaluating MEND (slightly influence the performance).
- 2024-01-16, the EasyEdit has added the support for the precise model editing method PMET'AAAI24.
- 2024-01-03, we release a new paper:"A Comprehensive Study of Knowledge Editing for Large Language Models" with a new benchmark KnowEdit! KnowEdit is constructed by re-organizing and cleaning existing datasests including WikiBio, ZsRE, WikiData Counterfact, WikiData Recent, convsent, Sanitation with new train/val/test spliting. Special thanks to the builders and maintainers of the those datasets.We are looking forward to any comments or discussions on this topic :)
- 2023-12-06, the EasyEdit has added the support for the lifelong model editing method GRACE'NeurIPS24.
- 2023-11-18, our tutorial "Knowledge Editing for Large Language Models" has been accepted by COLING 2024.
- 2023-10-25, our tutorial "Knowledge Editing for Large Language Models" has been accepted by AAAI 2024.
- 2023-10-24, the EasyEdit has added the support for efficient editing of Baichuan2, ChatGLM2, InternLM, QWen and fixed several bugs for a better user experience.
- 2023-10-14, we release the MultimodalEditor based on the paper "Can We Edit Multimodal Large Language Models?".
- 2023-10-13, we release the paper "Can We Edit Multimodal Large Language Models?" accepted by EMNLP 2023.
- 2023-10-08, our paper "Editing Large Language Models: Problems, Methods, and Opportunities" has been accepted by EMNLP 2023.
- 2023-10-07, the EasyEdit has added the support for editing models with multiple GPUs, using huggingface
Accelerate
. - 2023-9-21, the EasyEdit has added the support for Parameter-Efficient Fine-Tuning through AdaLoRA to inject knowledge into the LLM.
- 2023-8-31, the EasyEdit has added the support for official fine-tuning API for gpt-3.5-turbo to customize ChatGPT for your editing cases.
- 2023-8-15, we release the paper "EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models."
- 2023-7-12, we release version 0.0.1, supporting several knowledge editing techniques for LLMs. EasyEdit helps to better align LLMs with changing needs and values of users.
- 2023-5-22, we release the paper "Editing Large Language Models: Problems, Methods, and Opportunities" and provide a paper list at PaperList.
- 2023-3-25, the EasyEdit project has been launched and is under development.
A Comprehensive Study of Knowledge Editing for Large Language Models [paper][benchmark][code]
IJCAI 2024 Tutorial Google Drive
COLING 2024 Tutorial Google Drive
AAAI 2024 Tutorial Google Drive
AACL 2023 Tutorial [Google Drive] [Baidu Pan]
There is a demonstration of editing. The GIF file is created by Terminalizer.
We provide a handy Jupyter Notebook! It allows you to edit a LLM's knowledge of the US president, switching from Biden to Trump and even back to Biden. This includes methods like WISE, AlphaEdit, AdaLoRA, and Prompt-based editing.
Deployed models may still make unpredictable errors. For example, LLMs notoriously hallucinate, perpetuate bias, and factually decay, so we should be able to adjust specific behaviors of pre-trained models.
Knowledge editing aims to adjust base model's $(f_\theta)$ behavior on the particular edit descriptor $[x_e, y_e]$ efficiently.
Evaluating the performance of the model after a single edit. The model reloads the original weights (e.g. LoRA discards the adapter weights) after a single edit. You should set sequential_edit=False
$$\theta' \leftarrow \text{arg} \min\limits_{\theta} (\Vert f_\theta(x_e) - y_e \Vert)$$
This requires sequentially editing, and evaluation is performed after all knowledge updates have been applied:
$$\theta' \leftarrow \text{arg} \min\limits_{\theta} \sum_{e=1}^{\Vert X_e \Vert} (\Vert f_\theta(x_e) - y_e \Vert)$$
It makes parameter adjustments for $(x_e, y_e)$, where $x_e \in X_e$ and $f_\theta'(x_e) = y_e$. Here, $X_e$ represents the whole edit set. To enable continuous editing, you can set sequential_edit=True
: README (for more details).
Factual Knowledge Editing
- Inject knowledge that LLMs have not seen before. such as:
- How many times has Messi won the World Cup? 0 $\rightarrow$ 1:
- Update outdated knowledge. such as:
- The president of USA: Donald Trump $\rightarrow$ Joe Biden:
- Erase sensitive information. such as:
- The phone number of someone is XXXX $\rightarrow$ __
Without influencing the model behavior on unrelated samples, the ultimate goal is to create an edited model $(f_\theta')$.
Safety Editing
**Detoxifying LLM** strives to build a safe and trustworthy large language model (LLM). Knowledge editing focuses on specific areas for permanent adjustment without compromising overall performance. Then, detoxifying LLM via knowledge editing leverages a small amount of data, usually an instance, to correct the toxic behaviors of the LLM. The edited LLM can defend against various malicious inputs. [README](https://github.com/zjunlp/EasyEdit/blob/main/examples/SafeEdit.md)MultiModal Model Editing
Editing Task for Image Captioning and Visual Question Answering. README
Personality Editing
The proposed task takes the preliminary attempt to edit LLMs' personalities by editing their opinions on specific topics, given that an individual's opinions can reflect aspects of their personality traits. We draw upon the established BIG FIVE theory as a basis for constructing our dataset and assessing the LLMs' personality expressions. README
Evaluation
Logits-based
- ES: evaluating the editing success rate based on the logits of pre-generated text.
- DD: evaluating whether the model changes opinions on other topics based on the logits of pre-generated text.
Generation-based
- Acc: the accuracy of the generated text after editing the model on target personality.
- TPEI: measuring whether generated opinion text from the edited model leans more towards the target personality.
- PAE: utilizing GPT-4 to evaluate the personality traits in generated text.
While for assessing Acc and TPEI, you can download the trained classifier from here.
The knowledge editing process generally impacts the predictions for a broad set of inputs that are closely associated with the edit example, called the editing scope.
A successful edit should adjust the model’s behavior within the editing scope while remaining unrelated inputs:
$$ f_{\theta_{e}}(x) = \begin{cases} y_e & \text{if } x \in I(x_e,y_e) \ f_{\theta}(x) & \text{if } x \in O(x_e, y_e) \end{cases} $$
-
Reliability
: the success rate of editing with a given editing descriptor -
Generalization
: the success rate of editing within the editing scope -
Locality
: whether the model's output changes after editing for unrelated inputs -
Portability
: the success rate of editing for reasoning/application(one hop, synonym, logical generalization) -
Efficiency
: time and memory consumption
EasyEdit is a Python package for edit Large Language Models (LLM) like GPT-J
, Llama
, GPT-NEO
, GPT2
, T5
(support models from 1B to 65B), the objective of which is to alter the behavior of LLMs efficiently within a specific domain without negatively impacting performance across other inputs. It is designed to be easy to use and easy to extend.
-
EasyEdit contains a unified framework for Editor, Method and Evaluate, respectively representing the editing scenario, editing technique, and evaluation method.
-
Each Knowledge Editing scenario comprises of three components:
-
Editor
: such as BaseEditor(Factual Knowledge and Generation Editor) for LM, MultiModalEditor(MultiModal Knowledge). -
Method
: the specific knowledge editing technique used(such as ROME, MEND, ..). -
Evaluate
: Metrics for evaluating knowledge editing performance.-
Reliability
,Generalization
,Locality
,Portability
-
-
-
The current supported knowledge editing techniques are as follows:
- Memory-based: SERAC, IKE, GRACE, MELO, WISE
- Meta-learning: MEND, InstructEdit, MALMEN
- Locate-then-edit: KN, ROME, MEMIT, PMET, DINM, R-ROME, EMMET
- FT-L
Note 1: Due to the limited compatibility of this toolkit, some knowledge editing methods including T-Patcher, KE, CaliNet are not supported.
Note 2: Similarly, the MALMEN method is only partially supported due to the same reasons and will continue to be improved.
You can choose different editing methods according to your specific needs.
Method | T5 | GPT-2 | GPT-J | GPT-NEO | LlaMA | Baichuan | ChatGLM | InternLM | Qwen | Mistral |
---|---|---|---|---|---|---|---|---|---|---|
FT | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
AdaLoRA | ✅ | ✅ | ||||||||
SERAC | ✅ | ✅ | ✅ | ✅ | ||||||
IKE | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
MEND | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
KN | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
ROME | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
r-ROME | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
MEMIT | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
EMMET | ✅ | ✅ | ✅ | |||||||
GRACE | ✅ | ✅ | ✅ | |||||||
MELO | ✅ | |||||||||
PMET | ✅ | ✅ | ||||||||
InstructEdit | ✅ | ✅ | ||||||||
DINM | ✅ | ✅ | ✅ | |||||||
WISE | ✅ | ✅ | ✅ | ✅ | ✅ | |||||
Defer | ✅ | ✅ | ✅ | |||||||
AlphaEdit | ✅ | ✅ | ✅ |
❗️❗️ If you intend to use Mistral, please update the
transformers
library to version 4.34.0 manually. You can use the following code:pip install transformers==4.34.0
.
Work | Description | Path |
---|---|---|
InstructEdit | InstructEdit: Instruction-based Knowledge Editing for Large Language Models | Quick Start |
DINM | Detoxifying Large Language Models via Knowledge Editing | Quick Start |
WISE | WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models | Quick Start |
ConceptEdit | Editing Conceptual Knowledge for Large Language Models | Quick Start |
MMEdit | Can We Edit Multimodal Large Language Models? | Quick Start |
PersonalityEdit | Editing Personality For Large Language Models | Quick Start |
PROMPT | PROMPT-based knowledge editing methods | Quick Start |
Benchmark: KnowEdit [Hugging Face][WiseModel][ModelScope]
❗️❗️ To be noted, KnowEdit is constructed by re-organizing and extending existing datasests including WikiBio, ZsRE, WikiDataCounterfact, WikiDataRecent, convsent, Sanitation to make a comprehensive evaluation for knowledge editing. Special thanks to the builders and maintainers of the those datasets.
Please note that Counterfact and WikiDataCounterfact are not the same dataset.
Task | Knowledge Insertion | Knowledge Modification | Knowledge Erasure | |||
---|---|---|---|---|---|---|
Datasets | Wikirecent | ZsRE | WikiBio | WikiDatacounterfact | Convsent | Sanitation |
Type | Fact | Question Answering | Hallucination | Counterfact | Sentiment | Unwanted Info |
# Train | 570 | 10,000 | 592 | 1,455 | 14,390 | 80 |
# Test | 1,266 | 1301 | 1,392 | 885 | 800 | 80 |
We provide detailed scripts for user to easily use KnowEdit, please refer to examples.
dataset description
- ZsRE: is a context-free question-answering task. Given a question based on the subject and relation, the model is expected to provide the correct object as the answer.
- Wikirecent: This dataset specifically focuses on triplets that have been recently inserted into WikiData after July 2022.
- WikiBio: The original dataset was created by prompting GPT-3 to generate 238 Wikipedia-style biographies using subjects from the WikiBio.
- WikiDatacounterfact: Since tail entities are often not captured by models, and therefore are not suitable for testing modification edits, RippleEdit collects triplets about popular entities, where the subject corresponds to one of the top-viewed pages in Wikipedia.
- Convsent: This is a sentiment editing task that assesses the model's ability to modify a dialog agent's sentiment on a specific topic without affecting its responses to other topics.
- Sanitation: This dataset specifically addresses privacy concerns associated with learned language models.
dataset structure
knowedit
├── WikiBio
│ ├── wikibio-test-all.json
│ └── wikibio-train-all.json
├── ZsRE
│ └── ZsRE-test-all.json
├── wiki_counterfact
│ ├── test_cf.json
│ └── train_cf.json
├── convsent
│ ├── blender_test.json
│ ├── blender_train.json
│ └── blender_val.json
├── convsent
│ ├── trivia_qa_test.json
│ └── trivia_qa_train.json
└── wiki_recent
├── recent_test.json
└── recent_train.json
dataset | HuggingFace | WiseModel | ModelScope | Description |
---|---|---|---|---|
CKnowEdit | [HuggingFace] | [WiseModel] | [ModelScope] | dataset for editing Chinese Knowledge |
- Here, you can follow CKnowEdit.md to find more details about CKnowEdit and run Chinese knowledge editing experiments.
dataset description
CKnowEdit is a high-quality Chinese-language dataset for knowledge editing which is highly characterized by the Chinese language, with all data sourced from Chinese knowledge bases. It is meticulously designed to more deeply discern the nuances and challenges inherent in the comprehension of the Chinese language by current LLMs, providing a robust resource for refining Chinese-specific knowledge within LLMs.
The field descriptions for the data in CKnowEdit are as follows:
"prompt": query inputed to the model (str)
"target_old": the incorrect response previously generated by the model (str)
"target_new": the accurate answer of the prompt (str)
"portability_prompt": new prompts related to the target knowledge (list or None)
"portability_answer": accurate answers corresponding to the portability_prompt (list or None)
"locality_prompt": new prompts unrelated to the target knowledge (list or None)
"locality_answer": accurate answers corresponding to the locality_prompt (list or None)
"rephrase": alternative ways to phrase the original prompt (list)
dataset structure
CknowEdit
├── Chinese Literary Knowledge
│ ├── Ancient Poetry
│ ├── Proverbs
│ └── Idioms
├── Chinese Linguistic Knowledge
│ ├── Phonetic Notation
│ └── Classical Chinese
├── Chinese Geographical Knowledge
└── Ruozhiba
dataset | Google Drive | BaiduNetDisk | Description |
---|---|---|---|
ZsRE plus | [Google Drive] | [BaiduNetDisk] | Question Answering dataset using question rephrasings |
Counterfact plus | [Google Drive] | [BaiduNetDisk] | Counterfact dataset using Entity replacement |
We provide zsre and counterfact datasets to verify the effectiveness of knowledge editing. You can download them here. [Google Drive], [BaiduNetDisk].
- For locality, in addition to testing unrelated instances, we also provide tests on distracting (reference: Detecting Edit Failures...), other attribution, and other downstream tasks (such as commonsense reasoning).
- For portability, it tests whether the model can apply edited instances for inference. We provide evaluations for one-hop reasoning, subject alias, and inverse relation (eg, a one-to-one relationship between spouses should be bidirectionally edited).
dataset description
editing-data
├── counterfact
│ ├── counterfact-edit.json
│ ├── counterfact-train.json
│ └── counterfact-val.json
├── locality
│ ├── Commonsense Task
│ │ ├── piqa_valid-labels.lst
│ │ └── piqa_valid.jsonl
│ ├── Distracting Neighbor
│ │ └── counterfact_distracting_neighbor.json
│ └── Other Attribution
│ └── counterfact_other_attribution.json
├── portability
│ ├── Inverse Relation
│ │ └── zsre_inverse_relation.json
│ ├── One Hop
│ │ ├── counterfact_portability_gpt4.json
│ │ └── zsre_mend_eval_portability_gpt4.json
│ └── Subject Replace
│ ├── counterfact_subject_replace.json
│ └── zsre_subject_replace.json
└── zsre
├── zsre_mend_eval.json
├── zsre_mend_train_10000.json
└── zsre_mend_train.json
- counterfact: original counterfact dataset using Entity replacement
- zsre: original question answering dataset using question rephrasings
- locality (evaluation for locality, see details in this paper)
- Commonsense Task: evaluation for other downstream tasks such as commonsense task
- Distracting Neighbor: test on distracting neighborhood (reference: Detecting Edit Failures...)
- Other Attribution
- portability
- Inverse Relation: evaluation for one-to-one relationship such as
spouse
- One Hop: evaluation for one-hop reasoning
- Subject Replace: evaluation for synonym replacement
- Inverse Relation: evaluation for one-to-one relationship such as
dataset | Google Drive | HuggingFace Dataset | Description |
---|---|---|---|
ConceptEdit | [Google Drive] | [HuggingFace Dataset] | dataset for editing conceptual knowledge |
- Here, you can follow ConceptEdit.md to run concept editing experiments.
dataset description
data
└──concept_data.json
├──final_gpt2_inter.json
├──final_gpt2_intra.json
├──final_gptj_inter.json
├──final_gptj_intra.json
├──final_llama2chat_inter.json
├──final_llama2chat_intra.json
├──final_mistral_inter.json
└──final_mistral_intra.json
Concept Specific Evaluation Metrics
-
Instance Change
: capturing the intricacies of these instance-level changes -
Concept Consistency
: the semantic similarity of generated concept definition
dataset | Google Drive | BaiduNetDisk | Description |
---|---|---|---|
E-IC | [Google Drive] | [BaiduNetDisk] | dataset for editing Image Captioning |
E-VQA | [Google Drive] | [BaiduNetDisk] | dataset for editing Visual Question Answering |
- All images used in E-IC and E-VQA are available for download at Google Drive
- For locality, it is the same as factual editing in order to measure whether unrelated facts retain their outputs.
- For multimodal locality, it assesses the impact of editing on the visual module, which is similar to regular locality.
dataset description
editing-data
├── caption
│ ├── caption_train_edit.json
│ └── caption_eval_edit.json
├── locality
│ ├── NQ dataset
│ │ ├── train.json
│ │ └── validation.json
├── multimodal_locality
│ ├── OK-VQA dataset
│ │ ├── okvqa_loc.json
└── vqa
├── vqa_train.json
└── vqa_eval.json
- Multimodal locality (evaluation for multimodal locality, see dataset's details in this paper)
dataset | HuggingFace Dataset | Description |
---|---|---|
SafeEdit | [HuggingFace Dataset] | dataset for detoxifying LLMs |
- Here, you can follow SafeEdit.md to run detoxification editing experiments.
dataset description
data
└──SafeEdit_train.json
└──SafeEdit_val.json
└──SafeEdit_test.json
Detoxifying Specific Evaluation Metrics
-
Defense Duccess (DS)
: the detoxification success rate of edited LLM for adversarial input (attack prompt + harmful question), which is used to modify LLM. -
Defense Generalization (DG)
: the detoxification success rate of edited LLM for out-of-domain malicious inputs. -
General Performance
: the side effects for unrelated task performance.
Method | Description | GPT-2 | LlaMA |
---|---|---|---|
IKE | In-Context Learning (ICL) Edit | [Colab-gpt2] | [Colab-llama] |
ROME | Locate-Then-Edit Neurons | [Colab-gpt2] | [Colab-llama] |
MEMIT | Locate-Then-Edit Neurons | [Colab-gpt2] | [Colab-llama] |
Note: Please use Python 3.9+ for EasyEdit To get started, simply install conda and run:
git clone https://github.com/zjunlp/EasyEdit.git
conda create -n EasyEdit python=3.9.7
...
pip install -r requirements.txt
Our results are all based on the default configuration
llama-2-7B | chatglm2 | gpt-j-6b | gpt-xl | |
---|---|---|---|---|
FT | 60GB | 58GB | 55GB | 7GB |
SERAC | 42GB | 32GB | 31GB | 10GB |
IKE | 52GB | 38GB | 38GB | 10GB |
MEND | 46GB | 37GB | 37GB | 13GB |
KN | 42GB | 39GB | 40GB | 12GB |
ROME | 31GB | 29GB | 27GB | 10GB |
MEMIT | 33GB | 31GB | 31GB | 11GB |
AdaLoRA | 29GB | 24GB | 25GB | 8GB |
GRACE | 27GB | 23GB | 6GB | |
WISE | 34GB | 27GB | 7GB |
-
Edit large language models(LLMs) around 5 seconds
-
Following example shows you how to perform editing with EasyEdit. More examples and tutorials can be found at examples
BaseEditor
is the class for Language Modality Knowledge Editing. You can choose the appropriate editing method based on your specific needs.
- Due to different transformer versions and different GPU models, the editing results may fluctuate slightly.
With the modularity and flexibility of EasyEdit
, you can easily use it to edit model.
Step1: Define a PLM as the object to be edited.
Choose the PLM to be edited. EasyEdit
supports partial models(T5
, GPTJ
, GPT-NEO
, LlaMA
so far) retrievable on HuggingFace. The corresponding configuration file directory is hparams/YUOR_METHOD/YOUR_MODEL.YAML
, such as hparams/MEND/gpt2-xl.yaml
, set the corresponding model_name
to select the object for knowledge editing.
model_name: gpt2-xl
model_class: GPT2LMHeadModel
tokenizer_class: GPT2Tokenizer
tokenizer_name: gpt2-xl
model_parallel: false # true for multi-GPU editing
Step2: Choose the appropriate Knowledge Editing Method
## In this case, we use MEND method, so you should import `MENDHyperParams`
from easyeditor import MENDHyperParams
## Loading config from hparams/MEMIT/gpt2-xl.yaml
hparams = MENDHyperParams.from_hparams('./hparams/MEND/gpt2-xl')
Step3: Provide the edit descriptor and edit target
## edit descriptor: prompt that you want to edit
prompts = [
'What university did Watts Humphrey attend?',
'Which family does Ramalinaceae belong to',
'What role does Denny Herzig play in football?'
]
## You can set `ground_truth` to None !!!(or set to original output)
ground_truth = ['Illinois Institute of Technology', 'Lecanorales', 'defender']
## edit target: expected output
target_new = ['University of Michigan', 'Lamiinae', 'winger']
Step4: Combine them into a BaseEditor
EasyEdit
provides a simple and unified way to init Editor
, like huggingface: from_hparams.
## Construct Language Model Editor
editor = BaseEditor.from_hparams(hparams)
Step5: Provide the data for evaluation Note that the data for portability and locality are both optional(set to None for basic editing success rate evaluation only). The data format for both is a dict, for each measurement dimension, you need to provide the corresponding prompt and its corresponding ground truth. Here is an example of the data:
locality_inputs = {
'neighborhood':{
'prompt': ['Joseph Fischhof, the', 'Larry Bird is a professional', 'In Forssa, they understand'],
'ground_truth': ['piano', 'basketball', 'Finnish']
},
'distracting': {
'prompt': ['Ray Charles, the violin Hauschka plays the instrument', 'Grant Hill is a professional soccer Magic Johnson is a professional', 'The law in Ikaalinen declares the language Swedish In Loviisa, the language spoken is'],
'ground_truth': ['piano', 'basketball', 'Finnish']
}
}
In the above example, we evaluate the performance of the editing methods about "neighborhood" and "distracting".
Step6: Edit and Evaluation
Done! We can conduct Edit and Evaluation for your model to be edited. The edit
function will return a series of metrics related to the editing process as well as the modified model weights. [sequential_edit=True
for continuous editing]
metrics, edited_model, _ = editor.edit(
prompts=prompts,
ground_truth=ground_truth,
target_new=target_new,
locality_inputs=locality_inputs,
sequential_edit=False # True: start continuous editing ✈️
)
## metrics: edit success, rephrase success, locality e.g.
## edited_model: post-edit model
The maximum input length for EasyEdit is 512. If this length is exceeded, you will encounter the error "CUDA error: device-side assert triggered." You can modify the maximum length in the following file:LINK
Step7: RollBack In sequential editing, if you are not satisfied with the outcome of one of your edits and you do not wish to lose your previous edits, you can use the rollback feature to undo your previous edit. Currently, we only support the GRACE method. All you need to do is a single line of code, using the edit_key to revert your edit.
editor.rolllback('edit_key')
In EasyEdit, we default to using target_new as the edit_key
We specify the return metrics as dict
format, including model prediction evaluations before and after editing. For each edit, it will include the following metrics:
-
rewrite_acc
$\rightarrow$ Reliablilty -
rephrase_acc
$\rightarrow$ Generalization -
locality
$\rightarrow$ Locality -
portablility
$\rightarrow$ Portablility
{
"post": {
"rewrite_acc": ,
"rephrase_acc": ,
"locality": {
"YOUR_LOCALITY_KEY": ,
//...
},
"portablility": {
"YOUR_PORTABILITY_KEY": ,
//...
},
},
"pre": {
"rewrite_acc": ,
"rephrase_acc": ,
"portablility": {
"YOUR_PORTABILITY_KEY": ,
//...
},
}
}
- For evaluation for Reliablilty, you only need to provide the corresponding editing
prompts
and editingtarget_new
. - For evaluation for Generalization,
rephrase_prompts
are required. - For evaluation for Locality and Portablility, you need to define the name of the corresponding metric, as well as
prompts
andground_truth
.-
Note: the length needs to be equal to the edit prompts
-
- meta-learning based:
MEND
- memory-based routing:
SERAC
For above editing methods, pre-training of corresponding meta-networks or classifiers is required. Therefore, in EasyEdit, we provide a unified framework for pretraining the relevant network structures. Take the training MEND for example:
- Step 1 and Step 2 are the same as the example above, which involves selecting the appropriate editing model and editing method.
Step3: Provide the edit training set
The currently supported and available datasets are: zsre
and counterfact
(Google Drive). Please place them in the "data" directory and initialize the dataset_class (ZsreDataset
for zsre and CounterFactDataset
for counterfact) to load the corresponding training set.
train_ds = ZsreDataset('./data/zsre_mend_train.json', config=training_hparams)
eval_ds = ZsreDataset('./data/zsre_mend_eval.json', config=training_hparams)
Step4: Combine them into a Trainer
trainer = EditTrainer(
config=training_hparams,
train_set=train_ds,
val_set=eval_ds
)
Step5: Run and Edit Done! We can conduct Run and Evaluation.
trainer.run()
- Run: The
CHECKPOINT
will be saved to the pathresults_dir
. - Edit: Set the
archive
field in the hparams file toCHECKPOINT
. EasyEdit will automatically load the corresponding pre-trained weights during the editing process(Go to edit).
Training Example
from easyeditor import EditTrainer, MENDTrainingHparams, ZsreDataset
training_hparams = MENDTrainingHparams.from_hparams('hparams/TRAINING/MEND/llama-7b.yaml')
train_ds = ZsreDataset('./data/zsre/zsre_mend_train.json', config=training_hparams)
eval_ds = ZsreDataset('./data/zsre/zsre_mend_eval.json', config=training_hparams)
trainer = EditTrainer(
config=training_hparams,
train_set=train_ds,
val_set=eval_ds
)
trainer.run()
KnowEdit is a benchmark dataset of knowledge editing for LLMs. You can easily obtain KnowEdit from HuggingFace, HuggingFace, and ModelScope.
dataset | HuggingFace | HuggingFace | ModelScope |
---|---|---|---|
KnowEdit | [HuggingFace] | [WiseModel] | [ModelScope] |
We provide detailed scripts for user to easily use KnowEdit, please refer to examples.
We present editing results of the four metrics on LlaMA-2-7B using EasyEdit. We adopt ZsRE as the test dataset.
❗️❗️Editing
llama-2-7B
requires 40G+ VRAM on GPU. (OOM solution)
Reliability | Generalization | Locality | Portability | |
---|---|---|---|---|
FT | 56.94 | 52.02 | 96.32 | 51.03 |
SERAC | 99.49 | 99.13 | 100.00 | 57.82 |
IKE | 100.00 | 99.98 | 69.19 | 67.56 |
MEND | 94.24 | 90.27 | 97.04 | 56.95 |
KN | 28.95 | 28.43 | 65.43 | 37.18 |
ROME | 92.45 | 87.04 | 99.63 | 57.47 |
MEMIT | 92.94 | 85.97 | 99.49 | 60.64 |
We also present editing results of KnowEdit on LlaMA-2-7B using EasyEdit.
DataSet | Metric | SERAC | ICE | AdaLoRA | MEND | ROME | MEMIT | FT-L | FT-M |
---|---|---|---|---|---|---|---|---|---|
WikiData_recent | |||||||||
Edit Succ. | 98.68 | 60.74 | 100.00 | 95.75 | 97.18 | 97.05 | 55.75 | 100.00 | |
Portability | 63.52 | 36.93 | 64.69 | 55.88 | 55.25 | 56.37 | 40.86 | 65.44 | |
Locality | 100.00 | 33.34 | 56.42 | 94.76 | 54.77 | 52.15 | 43.70 | 64.33 | |
Fluency | 553.19 | 531.01 | 579.57 | 557.11 | 579.66 | 573.89 | 529.24 | 574.32 | |
ZsRE | |||||||||
Edit Succ. | 99.67 | 66.01 | 100.00 | 96.74 | 96.77 | 95.37 | 53.93 | 99.98 | |
Portability | 56.48 | 63.94 | 58.03 | 60.41 | 52.63 | 52.67 | 45.64 | 60.31 | |
Locality | 30.23 | 23.14 | 75.76 | 92.79 | 53.67 | 48.32 | 73.42 | 89.78 | |
Fluency | 410.89 | 541.14 | 563.56 | 524.33 | 573.75 | 563.31 | 493.01 | 552.26 | |
WikiBio | |||||||||
Edit Succ. | 99.69 | 95.53 | 100.00 | 93.66 | 96.08 | 94.40 | 66.33 | 100.00 | |
Locality | 69.79 | 47.90 | 81.28 | 69.51 | 62.74 | 61.51 | 79.86 | 93.38 | |
Fluency | 606.95 | 632.92 | 618.45 | 609.39 | 617.69 | 616.65 | 606.95 | 612.69 | |
WikiData_counterfact | |||||||||
Edit Succ. | 99.99 | 69.83 | 100.00 | 80.03 | 98.57 | 98.05 | 45.15 | 100.00 | |
Portability | 76.07 | 45.32 | 69.89 | 52.01 | 55.92 | 58.56 | 33.60 | 74.36 | |
Locality | 98.96 | 32.38 | 70.31 | 94.38 | 51.97 | 46.62 | 50.48 | 76.76 | |
Fluency | 549.91 | 547.22 | 580.29 | 555.72 | 584.04 | 575.96 | 528.26 | 575.62 | |
ConvSent | |||||||||
Edit Succ. | 62.75 | 52.78 | 44.89 | 50.76 | 45.79 | 44.75 | 49.50 | 46.10 | |
Locality | 0.26 | 49.73 | 0.18 | 3.42 | 0.00 | 0.00 | 0.00 | 0.00 | |
Fluency | 458.21 | 621.45 | 606.42 | 379.43 | 606.32 | 602.62 | 607.86 | 592.52 | |
Sanitation | |||||||||
Edit Succ. | 0.00 | 72.50 | 2.50 | 0.00 | 85.00 | 48.75 | 0.00 | 75.00 | |
Locality | 100.00 | 56.58 | 65.50 | 5.29 | 50.31 | 67.47 | 14.78 | 47.07 | |
Fluency | 416.29 | 794.15 | 330.44 | 407.18 | 465.12 | 466.10 | 439.10 | 416.29 |
❗️❗️ Please note that if you wish to reproduce the results regarding Rome on Knowedi, ensure that
fp16: False
.
For the locality metric, we calculate the score based on the proportion of tokens that remain unchanged before and after editing. For example, if the output tokens before editing are [29, 234, 334] and after editing are [29, 234, 333], the locality score for this data would be 66.67. For the portability metric, we calculate it by taking the average of all sub-scores under the portability category.
TO DO
In next version, we plan to:- Explore and integrate more robust editing methods, focusing on
locality
andportability
metrics. - Provide a comprehensive evaluation suite for editing methods, including fact modification, fact erasure and hallucination erasure.
- Provide a causal analysis component for analyzing knowledge storage mechanisms.
- knowledge editing for other tasks(except factual editing), like
personality editing
, etc.
Meanwhile, we will offer long-term maintenance to fix bugs, solve issues and meet new requests. So if you have any problems, please put issues to us.
Please cite our paper if you use EasyEdit in your work.
@article{zhang2024comprehensive,
title={A Comprehensive Study of Knowledge Editing for Large Language Models},
author={Zhang, Ningyu and Yao, Yunzhi and Tian, Bozhong and Wang, Peng and Deng, Shumin and Wang, Mengru and Xi, Zekun and Mao, Shengyu and Zhang, Jintian and Ni, Yuansheng and others},
journal={arXiv preprint arXiv:2401.01286},
year={2024}
}
@article{wang2023easyedit,
title={Easyedit: An easy-to-use knowledge editing framework for large language models},
author={Wang, Peng and Zhang, Ningyu and Xie, Xin and Yao, Yunzhi and Tian, Bozhong and Wang, Mengru and Xi, Zekun and Cheng, Siyuan and Liu, Kangwei and Zheng, Guozhou and others},
journal={arXiv preprint arXiv:2308.07269},
year={2023}
}
@article{yao2023editing,
title={Editing Large Language Models: Problems, Methods, and Opportunities},
author={Yao, Yunzhi and Wang, Peng and Tian, Bozhong and Cheng, Siyuan and Li, Zhoubo and Deng, Shumin and Chen, Huajun and Zhang, Ningyu},
journal={arXiv preprint arXiv:2305.13172},
year={2023}
}
@article{cheng2023edit,
title={Can We Edit Multimodal Large Language Models?},
author={Cheng, Siyuan and Tian, Bozhong and Liu, Qingbin and Chen, Xi and Wang, Yongheng and Chen, Huajun and Zhang, Ningyu},
journal={arXiv preprint arXiv:2310.08475},
year={2023}
}
@article{mao2023editing,
title={Editing personality for llms},
author={Mao, Shengyu and Zhang, Ningyu and Wang, Xiaohan and Wang, Mengru and Yao, Yunzhi and Jiang, Yong and Xie, Pengjun and Huang, Fei and Chen, Huajun},
journal={arXiv preprint arXiv:2310.02168},
year={2023}
}
@article{wang2024wise,
title={WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models},
author={Wang, Peng and Li, Zexi and Zhang, Ningyu and Xu, Ziwen and Yao, Yunzhi and Jiang, Yong and Xie, Pengjun and Huang, Fei and Chen, Huajun},
journal={arXiv preprint arXiv:2405.14768},
year={2024}
}
We thank all the contributors to this project, more contributors are welcome!
- AlphaEdit
- ROME
- FastEdit
- GRACE
- MELO
- PMET
- VLKEB
- PitfallsKnowledgeEditing
- BiasEdit
- WikiLLM
- PEAK
- Debugger
- LTE
- r-ROME
- dive-into-llms
🙌 We would like to express our heartfelt gratitude for the contribution of FastEdit, ROME, GRACE, MELO, PMET to our project, as we have utilized portions of their source code in our project. Many thanks to all the colleagues in the community for submitting issues and providing technical support. Appreciation is also extended to all PR contributors, and issue feedback providers during the EasyEdit version iterations, especially ancelia06 for correcting the grammar of README.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for EasyEdit
Similar Open Source Tools
EasyEdit
EasyEdit is a Python package for edit Large Language Models (LLM) like `GPT-J`, `Llama`, `GPT-NEO`, `GPT2`, `T5`(support models from **1B** to **65B**), the objective of which is to alter the behavior of LLMs efficiently within a specific domain without negatively impacting performance across other inputs. It is designed to be easy to use and easy to extend.
EVE
EVE is an official PyTorch implementation of Unveiling Encoder-Free Vision-Language Models. The project aims to explore the removal of vision encoders from Vision-Language Models (VLMs) and transfer LLMs to encoder-free VLMs efficiently. It also focuses on bridging the performance gap between encoder-free and encoder-based VLMs. EVE offers a superior capability with arbitrary image aspect ratio, data efficiency by utilizing publicly available data for pre-training, and training efficiency with a transparent and practical strategy for developing a pure decoder-only architecture across modalities.
IDvs.MoRec
This repository contains the source code for the SIGIR 2023 paper 'Where to Go Next for Recommender Systems? ID- vs. Modality-based Recommender Models Revisited'. It provides resources for evaluating foundation, transferable, multi-modal, and LLM recommendation models, along with datasets, pre-trained models, and training strategies for IDRec and MoRec using in-batch debiased cross-entropy loss. The repository also offers large-scale datasets, code for SASRec with in-batch debias cross-entropy loss, and information on joining the lab for research opportunities.
portkey-python-sdk
The Portkey Python SDK is a control panel for AI apps that allows seamless integration of Portkey's advanced features with OpenAI methods. It provides features such as AI gateway for unified API signature, interoperability, automated fallbacks & retries, load balancing, semantic caching, virtual keys, request timeouts, observability with logging, requests tracing, custom metadata, feedback collection, and analytics. Users can make requests to OpenAI using Portkey SDK and also use async functionality. The SDK is compatible with OpenAI SDK methods and offers Portkey-specific methods like feedback and prompts. It supports various providers and encourages contributions through Github issues or direct contact via email or Discord.
PromptFuzz
**Description:** PromptFuzz is an automated tool that generates high-quality fuzz drivers for libraries via a fuzz loop constructed on mutating LLMs' prompts. The fuzz loop of PromptFuzz aims to guide the mutation of LLMs' prompts to generate programs that cover more reachable code and explore complex API interrelationships, which are effective for fuzzing. **Features:** * **Multiply LLM support** : Supports the general LLMs: Codex, Inocder, ChatGPT, and GPT4 (Currently tested on ChatGPT). * **Context-based Prompt** : Construct LLM prompts with the automatically extracted library context. * **Powerful Sanitization** : The program's syntax, semantics, behavior, and coverage are thoroughly analyzed to sanitize the problematic programs. * **Prioritized Mutation** : Prioritizes mutating the library API combinations within LLM's prompts to explore complex interrelationships, guided by code coverage. * **Fuzz Driver Exploitation** : Infers API constraints using statistics and extends fixed API arguments to receive random bytes from fuzzers. * **Fuzz engine integration** : Integrates with grey-box fuzz engine: LibFuzzer. **Benefits:** * **High branch coverage:** The fuzz drivers generated by PromptFuzz achieved a branch coverage of 40.12% on the tested libraries, which is 1.61x greater than _OSS-Fuzz_ and 1.67x greater than _Hopper_. * **Bug detection:** PromptFuzz detected 33 valid security bugs from 49 unique crashes. * **Wide range of bugs:** The fuzz drivers generated by PromptFuzz can detect a wide range of bugs, most of which are security bugs. * **Unique bugs:** PromptFuzz detects uniquely interesting bugs that other fuzzers may miss. **Usage:** 1. Build the library using the provided build scripts. 2. Export the LLM API KEY if using ChatGPT or GPT4. 3. Generate fuzz drivers using the `fuzzer` command. 4. Run the fuzz drivers using the `harness` command. 5. Deduplicate and analyze the reported crashes. **Future Works:** * **Custom LLMs suport:** Support custom LLMs. * **Close-source libraries:** Apply PromptFuzz to close-source libraries by fine tuning LLMs on private code corpus. * **Performance** : Reduce the huge time cost required in erroneous program elimination.
llm4ad
LLM4AD is an open-source Python-based platform leveraging Large Language Models (LLMs) for Automatic Algorithm Design (AD). It provides unified interfaces for methods, tasks, and LLMs, along with features like evaluation acceleration, secure evaluation, logs, GUI support, and more. The platform was originally developed for optimization tasks but is versatile enough to be used in other areas such as machine learning, science discovery, game theory, and engineering design. It offers various search methods and algorithm design tasks across different domains. LLM4AD supports remote LLM API, local HuggingFace LLM deployment, and custom LLM interfaces. The project is licensed under the MIT License and welcomes contributions, collaborations, and issue reports.
sktime
sktime is a Python library for time series analysis that provides a unified interface for various time series learning tasks such as classification, regression, clustering, annotation, and forecasting. It offers time series algorithms and tools compatible with scikit-learn for building, tuning, and validating time series models. sktime aims to enhance the interoperability and usability of the time series analysis ecosystem by empowering users to apply algorithms across different tasks and providing interfaces to related libraries like scikit-learn, statsmodels, tsfresh, PyOD, and fbprophet.
skpro
skpro is a library for supervised probabilistic prediction in python. It provides `scikit-learn`-like, `scikit-base` compatible interfaces to: * tabular **supervised regressors for probabilistic prediction** \- interval, quantile and distribution predictions * tabular **probabilistic time-to-event and survival prediction** \- instance-individual survival distributions * **metrics to evaluate probabilistic predictions** , e.g., pinball loss, empirical coverage, CRPS, survival losses * **reductions** to turn `scikit-learn` regressors into probabilistic `skpro` regressors, such as bootstrap or conformal * building **pipelines and composite models** , including tuning via probabilistic performance metrics * symbolic **probability distributions** with value domain of `pandas.DataFrame`-s and `pandas`-like interface
HuatuoGPT-II
HuatuoGPT2 is an innovative domain-adapted medical large language model that excels in medical knowledge and dialogue proficiency. It showcases state-of-the-art performance in various medical benchmarks, surpassing GPT-4 in expert evaluations and fresh medical licensing exams. The open-source release includes HuatuoGPT2 models in 7B, 13B, and 34B versions, training code for one-stage adaptation, partial pre-training and fine-tuning instructions, and evaluation methods for medical response capabilities and professional pharmacist exams. The tool aims to enhance LLM capabilities in the Chinese medical field through open-source principles.
COLD-Attack
COLD-Attack is a framework designed for controllable jailbreaks on large language models (LLMs). It formulates the controllable attack generation problem and utilizes the Energy-based Constrained Decoding with Langevin Dynamics (COLD) algorithm to automate the search of adversarial LLM attacks with control over fluency, stealthiness, sentiment, and left-right-coherence. The framework includes steps for energy function formulation, Langevin dynamics sampling, and decoding process to generate discrete text attacks. It offers diverse jailbreak scenarios such as fluent suffix attacks, paraphrase attacks, and attacks with left-right-coherence.
spiceai
Spice is a portable runtime written in Rust that offers developers a unified SQL interface to materialize, accelerate, and query data from any database, data warehouse, or data lake. It connects, fuses, and delivers data to applications, machine-learning models, and AI-backends, functioning as an application-specific, tier-optimized Database CDN. Built with industry-leading technologies such as Apache DataFusion, Apache Arrow, Apache Arrow Flight, SQLite, and DuckDB. Spice makes it fast and easy to query data from one or more sources using SQL, co-locating a managed dataset with applications or machine learning models, and accelerating it with Arrow in-memory, SQLite/DuckDB, or attached PostgreSQL for fast, high-concurrency, low-latency queries.
qserve
QServe is a serving system designed for efficient and accurate Large Language Models (LLM) on GPUs with W4A8KV4 quantization. It achieves higher throughput compared to leading industry solutions, allowing users to achieve A100-level throughput on cheaper L40S GPUs. The system introduces the QoQ quantization algorithm with 4-bit weight, 8-bit activation, and 4-bit KV cache, addressing runtime overhead challenges. QServe improves serving throughput for various LLM models by implementing compute-aware weight reordering, register-level parallelism, and fused attention memory-bound techniques.
langtrace
Langtrace is an open source observability software that lets you capture, debug, and analyze traces and metrics from all your applications that leverage LLM APIs, Vector Databases, and LLM-based Frameworks. It supports Open Telemetry Standards (OTEL), and the traces generated adhere to these standards. Langtrace offers both a managed SaaS version (Langtrace Cloud) and a self-hosted option. The SDKs for both Typescript/Javascript and Python are available, making it easy to integrate Langtrace into your applications. Langtrace automatically captures traces from various vendors, including OpenAI, Anthropic, Azure OpenAI, Langchain, LlamaIndex, Pinecone, and ChromaDB.
spark-nlp
Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. It provides simple, performant, and accurate NLP annotations for machine learning pipelines that scale easily in a distributed environment. Spark NLP comes with 36000+ pretrained pipelines and models in more than 200+ languages. It offers tasks such as Tokenization, Word Segmentation, Part-of-Speech Tagging, Named Entity Recognition, Dependency Parsing, Spell Checking, Text Classification, Sentiment Analysis, Token Classification, Machine Translation, Summarization, Question Answering, Table Question Answering, Text Generation, Image Classification, Image to Text (captioning), Automatic Speech Recognition, Zero-Shot Learning, and many more NLP tasks. Spark NLP is the only open-source NLP library in production that offers state-of-the-art transformers such as BERT, CamemBERT, ALBERT, ELECTRA, XLNet, DistilBERT, RoBERTa, DeBERTa, XLM-RoBERTa, Longformer, ELMO, Universal Sentence Encoder, Llama-2, M2M100, BART, Instructor, E5, Google T5, MarianMT, OpenAI GPT2, Vision Transformers (ViT), OpenAI Whisper, and many more not only to Python and R, but also to JVM ecosystem (Java, Scala, and Kotlin) at scale by extending Apache Spark natively.
TrustLLM
TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.
OpenAI-CLIP-Feature
This repository provides code for extracting image and text features using OpenAI CLIP models, supporting both global and local grid visual features. It aims to facilitate multi visual-and-language downstream tasks by allowing users to customize input and output grid resolution easily. The extracted features have shown comparable or superior results in image captioning tasks without hyperparameter tuning. The repo supports various CLIP models and provides detailed information on supported settings and results on MSCOCO image captioning. Users can get started by setting up experiments with the extracted features using X-modaler.
For similar tasks
EasyEdit
EasyEdit is a Python package for edit Large Language Models (LLM) like `GPT-J`, `Llama`, `GPT-NEO`, `GPT2`, `T5`(support models from **1B** to **65B**), the objective of which is to alter the behavior of LLMs efficiently within a specific domain without negatively impacting performance across other inputs. It is designed to be easy to use and easy to extend.
For similar jobs
EasyEdit
EasyEdit is a Python package for edit Large Language Models (LLM) like `GPT-J`, `Llama`, `GPT-NEO`, `GPT2`, `T5`(support models from **1B** to **65B**), the objective of which is to alter the behavior of LLMs efficiently within a specific domain without negatively impacting performance across other inputs. It is designed to be easy to use and easy to extend.