EasyInstruct
[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.
Stars: 381
EasyInstruct is a Python package proposed as an easy-to-use instruction processing framework for Large Language Models (LLMs) like GPT-4, LLaMA, ChatGLM in your research experiments. EasyInstruct modularizes instruction generation, selection, and prompting, while also considering their combination and interaction.
README:
An Easy-to-use Instruction Processing Framework for Large Language Models.
Project • Paper • Demo • Overview • Installation • Quickstart • How To Use • Docs • Video • Citation • Contributors
- 2024-06-04, EasyInstruct is accepted by ACL 2024 System Demonstration Track. 🎉🎉
- 2024-02-06 We release a new paper: "EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models" with an HF demo EasyInstruct.
- 2024-02-06 We release a preliminary tool EasyDetect for hallucination detection, with a demo.
- 2024-02-05 We release version 0.1.2, supporting for new features and optimising the function interface.
- 2023-12-09 The paper "When Do Program-of-Thoughts Work for Reasoning?" (supported by EasyInstruct), is accepted by AAAI 2024!
- 2023-10-28 We release version 0.1.1, supporting for new features of instruction generation and instruction selection.
- 2023-08-09 We release version 0.0.6, supporting Cohere API calls.
- 2023-07-12 We release EasyEdit, an easy-to-use framework to edit Large Language Models.
Previous news
- 2023-5-23 We release version 0.0.5, removing requirement of llama-cpp-python.
- 2023-5-16 We release version 0.0.4, fixing some problems.
- 2023-4-21 We release version 0.0.3, check out our documentations for more details.
- 2023-3-25 We release version 0.0.2, suporting IndexPrompt, MMPrompt, IEPrompt and more LLMs
- 2023-3-13 We release version 0.0.1, supporting in-context learning, chain-of-thought with ChatGPT.
This repository is a subproject of KnowLM.
EasyInstruct is a Python package which is proposed as an easy-to-use instruction processing framework for Large Language Models(LLMs) like GPT-4, LLaMA, ChatGLM in your research experiments. EasyInstruct modularizes instruction generation, selection, and prompting, while also considering their combination and interaction.
-
The current supported instruction generation techniques are as follows:
Methods Description Self-Instruct The method that randomly samples a few instructions from a human-annotated seed tasks pool as demonstrations and prompts an LLM to generate more instructions and corresponding input-output pairs. Evol-Instruct The method that incrementally upgrades an initial set of instructions into more complex instructions by prompting an LLM with specific prompts. Backtranslation The method that creates an instruction following training instance by predicting an instruction that would be correctly answered by a portion of a document of the corpus. KG2Instruct The method that creates an instruction following training instance by predicting an instruction that would be correctly answered by a portion of a document of the corpus. -
The current supported instruction selection metrics are as follows:
Metrics Notation Description Length $Len$ The bounded length of every pair of instruction and response. Perplexity $PPL$ The exponentiated average negative log-likelihood of response. MTLD $MTLD$ Measure of textual lexical diversity, the mean length of sequential words in a text that maintains a minimum threshold TTR score. ROUGE $ROUGE$ Recall-Oriented Understudy for Gisting Evaluation, a set of metrics used for evaluating similarities between sentences. GPT score $GPT$ The score of whether the output is a good example of how AI Assistant should respond to the user's instruction, provided by ChatGPT. CIRS $CIRS$ The score using the abstract syntax tree to encode structural and logical attributes, to measure the correlation between code and reasoning abilities. -
API service providers and their corresponding LLM products that are currently available:
Model Description Default Version OpenAI GPT-3.5 A set of models that improve on GPT-3 and can understand as well as generate natural language or code. gpt-3.5-turboGPT-4 A set of models that improve on GPT-3.5 and can understand as well as generate natural language or code. gpt-4Anthropic Claude A next-generation AI assistant based on Anthropic’s research into training helpful, honest, and harmless AI systems. claude-2.0Claude-Instant A lighter, less expensive, and much faster option than Claude. claude-instant-1.2Cohere Command A flagship text generation model of Cohere trained to follow user commands and to be instantly useful in practical business applications. commandCommand-Light A light version of Command models that are faster but may produce lower-quality generated text. command-light
Installation from git repo branch:
pip install git+https://github.com/zjunlp/EasyInstruct@main
Installation for local development:
git clone https://github.com/zjunlp/EasyInstruct
cd EasyInstruct
pip install -e .
Installation using PyPI (not the latest version):
pip install easyinstruct -i https://pypi.org/simple
We provide two ways for users to quickly get started with EasyInstruct. You can either use the shell script or the Gradio app based on your specific needs.
Users can easily configure the parameters of EasyInstruct in a YAML-style file or just quickly use the default parameters in the configuration files we provide. Following is an example of the configuration file for Self-Instruct:
generator:
SelfInstructGenerator:
target_dir: data/generations/
data_format: alpaca
seed_tasks_path: data/seed_tasks.jsonl
generated_instructions_path: generated_instructions.jsonl
generated_instances_path: generated_instances.jsonl
num_instructions_to_generate: 100
engine: gpt-3.5-turbo
num_prompt_instructions: 8More example configuration files can be found at configs.
Users should first specify the configuration file and provide their own OpenAI API key. Then, run the following shell script to launch the instruction generation or selection process.
config_file=""
openai_api_key=""
python demo/run.py \
--config $config_file\
--openai_api_key $openai_api_key \We provide a Gradio app for users to quickly get started with EasyInstruct. You can run the following command to launch the Gradio app locally on the port 8080 (if available).
python demo/app.pyWe also host a running gradio app in HuggingFace Spaces. You can try it out here.
Please refer to our documentations for more details.
The Generators module streamlines the process of instruction data generation, allowing for the generation of instruction data based on seed data. You can choose the appropriate generator based on your specific needs.
BaseGeneratoris the base class for all generators.
You can also easily inherit this base class to customize your own generator class. Just override the
__init__andgeneratemethod.
SelfInstructGeneratoris the class for the instruction generation method of Self-Instruct. See Self-Instruct: Aligning Language Model with Self Generated Instructions for more details.
Example
from easyinstruct import SelfInstructGenerator
from easyinstruct.utils.api import set_openai_key
# Step1: Set your own API-KEY
set_openai_key("YOUR-KEY")
# Step2: Declare a generator class
generator = SelfInstructGenerator(num_instructions_to_generate=10)
# Step3: Generate self-instruct data
generator.generate()
BacktranslationGeneratoris the class for the instruction generation method of Instruction Backtranslation. See Self-Alignment with Instruction Backtranslation for more details.
Example
from easyinstruct import BacktranslationGenerator
from easyinstruct.utils.api import set_openai_key
# Step1: Set your own API-KEY
set_openai_key("YOUR-KEY")
# Step2: Declare a generator class
generator = BacktranslationGenerator(num_instructions_to_generate=10)
# Step3: Generate backtranslation data
generator.generate()
EvolInstructGeneratoris the class for the instruction generation method of EvolInstruct. See WizardLM: Empowering Large Language Models to Follow Complex Instructions for more details.
Example
from easyinstruct import EvolInstructGenerator
from easyinstruct.utils.api import set_openai_key
# Step1: Set your own API-KEY
set_openai_key("YOUR-KEY")
# Step2: Declare a generator class
generator = EvolInstructGenerator(num_instructions_to_generate=10)
# Step3: Generate evolution data
generator.generate()
KG2InstructGeneratoris the class for the instruction generation method of KG2Instruct. See InstructIE: A Chinese Instruction-based Information Extraction Dataset for more details.
The Selectors module standardizes the instruction selection process, enabling the extraction of high-quality instruction datasets from raw, unprocessed instruction data. The raw data can be sourced from publicly available instruction datasets or generated by the framework itself. You can choose the appropriate selector based on your specific needs.
BaseSelectoris the base class for all selectors.
You can also easily inherit this base class to customize your own selector class. Just override the
__init__and__process__method.
Deduplicatoris the class for eliminating duplicate instruction samples that could adversely affect both pre-training stability and the performance of LLMs.Deduplicatorcan also enables efficient use and optimization of storage space.
LengthSelectoris the class for selecting instruction samples based on the length of the instruction. Instructions that are too long or too short can affect data quality and are not conducive to instruction tuning.
RougeSelectoris the class for selecting instruction samples based on the ROUGE metric which is often used for evaluating the quality of automated generation of text.
GPTScoreSelectoris the class for selecting instruction samples based on the GPT score, which reflects whether the output is a good example of how AI Assistant should respond to the user's instruction, provided by ChatGPT.
PPLSelectoris the class for selecting instruction samples based on the perplexity, which is the exponentiated average negative log-likelihood of response.
MTLDSelectoris the class for selecting instruction samples based on the MTLD, which is short for Measure of Textual Lexical Diversity.
CodeSelectoris the class for selecting code instruction samples based on the Complexity-Impacted Reasoning Score (CIRS), which combines structural and logical attributes, to measure the correlation between code and reasoning abilities. See When Do Program-of-Thoughts Work for Reasoning? for more details.
Example
from easyinstruct import CodeSelector
# Step1: Specify your source file of code instructions
src_file = "data/code_example.json"
# Step2: Declare a code selecter class
selector = CodeSelector(
source_file_path=src_file,
target_dir="data/selections/",
manually_partion_data=True,
min_boundary = 0.125,
max_boundary = 0.5,
automatically_partion_data = True,
k_means_cluster_number = 2,
)
# Step3: Process the code instructions
selector.process()
MultiSelectoris the class for combining multiple appropricate selectors based on your specific needs.
The Prompts module standardizes the instruction prompting step, where user requests are constructed as instruction prompts and sent to specific LLMs to obtain responses. You can choose the appropriate prompting method based on your specific needs.
Please check out link for more detials.
The Engines module standardizes the instruction execution process, enabling the execution of instruction prompts on specific locally deployed LLMs. You can choose the appropriate engine based on your specific needs.
Please check out link for more detials.
Please cite our repository if you use EasyInstruct in your work.
@article{ou2024easyinstruct,
title={EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models},
author={Ou, Yixin and Zhang, Ningyu and Gui, Honghao and Xu, Ziwen and Qiao, Shuofei and Bi, Zhen and Chen, Huajun},
journal={arXiv preprint arXiv:2402.03049},
year={2024}
}
@misc{knowlm,
author = {Ningyu Zhang and Jintian Zhang and Xiaohan Wang and Honghao Gui and Kangwei Liu and Yinuo Jiang and Xiang Chen and Shengyu Mao and Shuofei Qiao and Yuqi Zhu and Zhen Bi and Jing Chen and Xiaozhuan Liang and Yixin Ou and Runnan Fang and Zekun Xi and Xin Xu and Lei Li and Peng Wang and Mengru Wang and Yunzhi Yao and Bozhong Tian and Yin Fang and Guozhou Zheng and Huajun Chen},
title = {KnowLM: An Open-sourced Knowledgeable Large Langugae Model Framework},
year = {2023},
url = {http://knowlm.zjukg.cn/},
}
@article{bi2023program,
title={When do program-of-thoughts work for reasoning?},
author={Bi, Zhen and Zhang, Ningyu and Jiang, Yinuo and Deng, Shumin and Zheng, Guozhou and Chen, Huajun},
journal={arXiv preprint arXiv:2308.15452},
year={2023}
}We will offer long-term maintenance to fix bugs, solve issues and meet new requests. So if you have any problems, please put issues to us.
Other Related Projects
🙌 We would like to express our heartfelt gratitude for the contribution of Self-Instruct to our project, as we have utilized portions of their source code in our project.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for EasyInstruct
Similar Open Source Tools
EasyInstruct
EasyInstruct is a Python package proposed as an easy-to-use instruction processing framework for Large Language Models (LLMs) like GPT-4, LLaMA, ChatGLM in your research experiments. EasyInstruct modularizes instruction generation, selection, and prompting, while also considering their combination and interaction.
bigcodebench
BigCodeBench is an easy-to-use benchmark for code generation with practical and challenging programming tasks. It aims to evaluate the true programming capabilities of large language models (LLMs) in a more realistic setting. The benchmark is designed for HumanEval-like function-level code generation tasks, but with much more complex instructions and diverse function calls. BigCodeBench focuses on the evaluation of LLM4Code with diverse function calls and complex instructions, providing precise evaluation & ranking and pre-generated samples to accelerate code intelligence research. It inherits the design of the EvalPlus framework but differs in terms of execution environment and test evaluation.
Biomni
Biomni is a general-purpose biomedical AI agent designed to autonomously execute a wide range of research tasks across diverse biomedical subfields. By integrating cutting-edge large language model (LLM) reasoning with retrieval-augmented planning and code-based execution, Biomni helps scientists dramatically enhance research productivity and generate testable hypotheses.
TokenFormer
TokenFormer is a fully attention-based neural network architecture that leverages tokenized model parameters to enhance architectural flexibility. It aims to maximize the flexibility of neural networks by unifying token-token and token-parameter interactions through the attention mechanism. The architecture allows for incremental model scaling and has shown promising results in language modeling and visual modeling tasks. The codebase is clean, concise, easily readable, state-of-the-art, and relies on minimal dependencies.
evalverse
Evalverse is an open-source project designed to support Large Language Model (LLM) evaluation needs. It provides a standardized and user-friendly solution for processing and managing LLM evaluations, catering to AI research engineers and scientists. Evalverse supports various evaluation methods, insightful reports, and no-code evaluation processes. Users can access unified evaluation with submodules, request evaluations without code via Slack bot, and obtain comprehensive reports with scores, rankings, and visuals. The tool allows for easy comparison of scores across different models and swift addition of new evaluation tools.
sec-code-bench
SecCodeBench is a benchmark suite for evaluating the security of AI-generated code, specifically designed for modern Agentic Coding Tools. It addresses challenges in existing security benchmarks by ensuring test case quality, employing precise evaluation methods, and covering Agentic Coding Tools. The suite includes 98 test cases across 5 programming languages, focusing on functionality-first evaluation and dynamic execution-based validation. It offers a highly extensible testing framework for end-to-end automated evaluation of agentic coding tools, generating comprehensive reports and logs for analysis and improvement.
OpenMusic
OpenMusic is a repository providing an implementation of QA-MDT, a Quality-Aware Masked Diffusion Transformer for music generation. The code integrates state-of-the-art models and offers training strategies for music generation. The repository includes implementations of AudioLDM, PixArt-alpha, MDT, AudioMAE, and Open-Sora. Users can train or fine-tune the model using different strategies and datasets. The model is well-pretrained and can be used for music generation tasks. The repository also includes instructions for preparing datasets, training the model, and performing inference. Contact information is provided for any questions or suggestions regarding the project.
qa-mdt
This repository provides an implementation of QA-MDT, integrating state-of-the-art models for music generation. It offers a Quality-Aware Masked Diffusion Transformer for enhanced music generation. The code is based on various repositories like AudioLDM, PixArt-alpha, MDT, AudioMAE, and Open-Sora. The implementation allows for training and fine-tuning the model with different strategies and datasets. The repository also includes instructions for preparing datasets in LMDB format and provides a script for creating a toy LMDB dataset. The model can be used for music generation tasks, with a focus on quality injection to enhance the musicality of generated music.
RLAIF-V
RLAIF-V is a novel framework that aligns MLLMs in a fully open-source paradigm for super GPT-4V trustworthiness. It maximally exploits open-source feedback from high-quality feedback data and online feedback learning algorithm. Notable features include achieving super GPT-4V trustworthiness in both generative and discriminative tasks, using high-quality generalizable feedback data to reduce hallucination of different MLLMs, and exhibiting better learning efficiency and higher performance through iterative alignment.
premsql
PremSQL is an open-source library designed to help developers create secure, fully local Text-to-SQL solutions using small language models. It provides essential tools for building and deploying end-to-end Text-to-SQL pipelines with customizable components, ideal for secure, autonomous AI-powered data analysis. The library offers features like Local-First approach, Customizable Datasets, Robust Executors and Evaluators, Advanced Generators, Error Handling and Self-Correction, Fine-Tuning Support, and End-to-End Pipelines. Users can fine-tune models, generate SQL queries from natural language inputs, handle errors, and evaluate model performance against predefined metrics. PremSQL is extendible for customization and private data usage.
sec-parser
The `sec-parser` project simplifies extracting meaningful information from SEC EDGAR HTML documents by organizing them into semantic elements and a tree structure. It helps in parsing SEC filings for financial and regulatory analysis, analytics and data science, AI and machine learning, causal AI, and large language models. The tool is especially beneficial for AI, ML, and LLM applications by streamlining data pre-processing and feature extraction.
MathVerse
MathVerse is an all-around visual math benchmark designed to evaluate the capabilities of Multi-modal Large Language Models (MLLMs) in visual math problem-solving. It collects high-quality math problems with diagrams to assess how well MLLMs can understand visual diagrams for mathematical reasoning. The benchmark includes 2,612 problems transformed into six versions each, contributing to 15K test samples. It also introduces a Chain-of-Thought (CoT) Evaluation strategy for fine-grained assessment of output answers.
llm-leaderboard
Nejumi Leaderboard 3 is a comprehensive evaluation platform for large language models, assessing general language capabilities and alignment aspects. The evaluation framework includes metrics for language processing, translation, summarization, information extraction, reasoning, mathematical reasoning, entity extraction, knowledge/question answering, English, semantic analysis, syntactic analysis, alignment, ethics/moral, toxicity, bias, truthfulness, and robustness. The repository provides an implementation guide for environment setup, dataset preparation, configuration, model configurations, and chat template creation. Users can run evaluation processes using specified configuration files and log results to the Weights & Biases project.
Easy-Translate
Easy-Translate is a script designed for translating large text files with a single command. It supports various models like M2M100, NLLB200, SeamlessM4T, LLaMA, and Bloom. The tool is beginner-friendly and offers seamless and customizable features for advanced users. It allows acceleration on CPU, multi-CPU, GPU, multi-GPU, and TPU, with support for different precisions and decoding strategies. Easy-Translate also provides an evaluation script for translations. Built on HuggingFace's Transformers and Accelerate library, it supports prompt usage and loading huge models efficiently.
RTL-Coder
RTL-Coder is a tool designed to outperform GPT-3.5 in RTL code generation by providing a fully open-source dataset and a lightweight solution. It targets Verilog code generation and offers an automated flow to generate a large labeled dataset with over 27,000 diverse Verilog design problems and answers. The tool addresses the data availability challenge in IC design-related tasks and can be used for various applications beyond LLMs. The tool includes four RTL code generation models available on the HuggingFace platform, each with specific features and performance characteristics. Additionally, RTL-Coder introduces a new LLM training scheme based on code quality feedback to further enhance model performance and reduce GPU memory consumption.
exospherehost
Exosphere is an open source infrastructure designed to run AI agents at scale for large data and long running flows. It allows developers to define plug and playable nodes that can be run on a reliable backbone in the form of a workflow, with features like dynamic state creation at runtime, infinite parallel agents, persistent state management, and failure handling. This enables the deployment of production agents that can scale beautifully to build robust autonomous AI workflows.
For similar tasks
EasyInstruct
EasyInstruct is a Python package proposed as an easy-to-use instruction processing framework for Large Language Models (LLMs) like GPT-4, LLaMA, ChatGLM in your research experiments. EasyInstruct modularizes instruction generation, selection, and prompting, while also considering their combination and interaction.
forevervm
foreverVM is a tool that provides an API for running arbitrary, stateful Python code securely. It revolves around the concepts of machines and instructions, where machines represent stateful Python processes and instructions are Python statements and expressions that can be executed on these machines. Users can interact with machines, run instructions, and receive results. The tool ensures that machines are managed efficiently by automatically swapping them from memory to disk when idle and back when needed, allowing for running REPLs 'forever'. Users can easily get started with foreverVM using the CLI and an API token, and can leverage the SDK for more advanced functionalities.
For similar jobs
EasyInstruct
EasyInstruct is a Python package proposed as an easy-to-use instruction processing framework for Large Language Models (LLMs) like GPT-4, LLaMA, ChatGLM in your research experiments. EasyInstruct modularizes instruction generation, selection, and prompting, while also considering their combination and interaction.

