Reflection_Tuning

[ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning

Stars: 124

Visit

Reflection-Tuning is a project focused on improving the quality of instruction-tuning data through a reflection-based method. It introduces Selective Reflection-Tuning, where the student model can decide whether to accept the improvements made by the teacher model. The project aims to generate high-quality instruction-response pairs by defining specific criteria for the oracle model to follow and respond to. It also evaluates the efficacy and relevance of instruction-response pairs using the r-IFD metric. The project provides code for reflection and selection processes, along with data and model weights for both V1 and V2 methods.

README:

Reflection-Tuning: An Approach for Data Recycling (ACL'24)

[Method V1] Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning (NIPS'23 Wrokshop)
[Method V2] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning (ACL'24)
Chinese Version: [知乎]

This is the repo for the Reflection-Tuning project, which introduces a reflection-based method to improve the quality of instruction-tuning data.

The repo contains:

Data: (The detailed description will be found in Section Data and Model Weights V2 )

Recycled Data (V1): Alpaca, WizardLM70k

Selective Recycled Data (V2): Alpaca, WizardLM70k

Overall Reflected Data: Alpaca, WizardLM70k, WizardLM V2

Overall Reflected Data IFD scores:Alpaca llama2 7b, Alpaca llama2 13b, WizardLM70k llama2 7b, WizardLM70k llama2 13b

(Feel free to email Ming (Homepage, Email) for any questions or feedback.)

News

[2024/05] Our Selective Reflection-Tuning has been accepted to the ACL 2024 findings!
[2024/02] We release the paper of our Selective Reflection-Tuning.
[2023/12] We add the complete data of Selective Instruction-Tuning (V2 method) and add the evaluation results.
[2023/12] We add the description of Selective Instruction-Tuning (V2 method).
[2023/10] The Reflection-Tuning has been accepted by the NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following.
[2023/10] We released the V2 model and pushed models to the Alpaca Eval leaderboard.
[2023/10] We released codes for this project.

Overview
Highlights
Selective Reflection-Tuning
Install
Code for Reflection
Code for Selection
Data and Model Weights V1
Data and Model Weights V2
Prompt and Hyperparameters
ToDo
Citation
Our Related Works

Overview

Reflection-Tuning (V1)

We propose a reflection-based method for improving the quality of instruction-response pairs. Given the initial base dataset, we are motivated to generate a high-quality version of each data point with an oracle model, chatGPT for instance. However, a common problem with using LLMs as judges is the failure to obtain diverse results. To overcome this potential problem, inspired by Chain-of-Thought prompting, we further define several specific criteria for the oracle model to follow, and respond to those specific criteria with critical responses, respectively. Then the responses to these criteria can serve as bridges (chain of thought) to generate new instruction-response pairs that are satisfied.

Selective Reflection-Tuning (V2)

In the original Reflection-Tuning, we propose a data improvement method through Reflection. However, two research questions arise:

Is the teacher-refined data compatible with the needs of the student model?
How does the student model decide which enhanced data are most needed and critical to its training?

To answer the above questions, we propose Selective Reflection-Tuning, in which the student model can decide whether to accept the improvement of the teacher model.

Highlights

Reflection-Tuning (V1)

In Reflection-Tuning V1, we propose a reflection method that can improve the quality of the instruction-tuning dataset, which is a general method and can be utilized on almost ANY instruction-tuning dataset.
We implement our method on both Alpaca and WizardLM datasets and release the newly-generated high-quality recycled datasets.

Selective Reflection-Tuning (V2)

We propose an interactive selection pipeline where the oracle model and student model cooperate to build a more coherent and model-compatible instruction-following dataset, which can be further adapted into other self-improvement scenarios.
Our selectively recycled data has a supreme quality, with only instruction tuning on a few thousand of automatically generated data, our models achieve promising performances compared to models with dozens of thousands of data.
We present a nuanced evaluation schema r-IFD that quantifies the efficacy and relevance of instruction-response pairs.

Selective Reflection-Tuning

Introduction

Existing methods of data enhancement usually do not take a critical criterion into account: Is the teacher-refined data compatible with the needs of the student model? These approaches typically do not account for the inherent randomness and potential degradation associated with the generative models' output, leading to an oversight in how the student model responds to these "improved" data samples. Consequently, a mechanism for the student model to selectively integrate these enhancements has been notably absent. To bridge this gap, our work introduces an interactive pipeline wherein an oracle generative model engages in a reflection process to enhance both the instruction and response of a data sample. The student model then evaluates whether to incorporate these improvements based on its unique attributes. This pipeline is versatile and can be adapted to various contexts where data enhancement is needed.

Then, another pivotal question arises: How does the student model decide which enhanced data are most needed and critical to its training? This question underpins the challenge of autonomously evaluating the quality of instructions and responses. Common practices involve utilizing sophisticated models like GPT-4 for assessment purposes or employing a secondary judge model equipped with evaluative capabilities. These methods, however, present limitations: they fail to address the discrepancies between the evaluating model and the actual student model undergoing training. Particularly in the latter approach, even though the judge model and the student model share the same structural framework, their weight distributions diverge once endowed with basic evaluative functions. Consequently, the preferences of the judge model may not align with the real student model's requirements. To circumvent these issues, we adopt a statistical method, utilizing the Instruction-Following Difficulty (IFD) score proposed by Cherry LLM. This score is derived directly from the raw student model, thereby mitigating potential domain shifts and ensuring that the evaluation is better aligned with the student model’s learning context. We further introduce a reversed version of IFD named reversed-IFD (r-IFD). This metric evaluates how much the response contributes to predicting the corresponding instruction.

Method

We utilize the IFD score to select reflected instructions and utilize the r-IFD score to select reflected responses.

Results

Below is the performance for pair-wise comparison between our model and other classic models on the WizardLM test set. We follow the implementation from AlpaGasus and Cherry LLM, which greatly eliminates the potential position bias of GPT4. The prompts can be found in our paper, and the data, code, and use cases for evaluation can be found in Cherry LLM Repo.

Below are the performance results on the Alapca Eval leaderboard. “Data” represents the number of data used for fine-tuning. “RLHF/AIF” represents whether the model utilizes an additional RLHF or RLAIF process.

Below are comparisons between model performances and data used for fine-tuning on the Alapca Eval benchmark and the open LLM leaderboard. We utilize star markers to represent our models, dot markers to represent other instruction-tuned models and triangle markers to represent RLHF/AIF models. Blue markers represent 7B models, red markers represent 13B models and purple markers represent models with larger weights.

It is astonishing that with less than 1k automatically generated instruction samples (2%, 926 samples), our model achieves really promising performance (74.29 win rate on alpaca eval rate, 57.80 averaged score on Open LLM Leaderboard), showing the supreme quality of our sRecycled data.

Install

Install the dependencies with pip install -r requirements.txt

Code for Reflection

Note:

Reflecting on the whole dataset containing dozens of thousands of data will consume a lot, so we recommend using some tiny datasets for the beginning, for example, cherry data from Cherry LLM. Experiments show that simply reflecting on a subset of high-quality data can also get a promising performance.
In the below scripts, we directly run on data/cherry_alpaca_v1/cherry_alpaca_5_percent_compressed.json which contains only approximately 3k Alpaca data.

Reflection on Instruction

Reflection

python reflection_code/reflecn_instruction.py \
    --data_path data/cherry_alpaca_v1/cherry_alpaca_5_percent_compressed.json \
    --save_path cherry_alpaca_5_percent_reflect_ins_raw.json \
    --api_key xxx

--data_path: The targeted dataset in the Alpaca format
--save_path: The path to save the raw reflection texts
--api_key: Your openAI key

Extract the instruction-response pairs:

python reflection_code/reflect_instruction_postprocess.py \
    --raw_data_path cherry_alpaca_5_percent_reflect_ins_raw.json \
    --ori_data_path data/cherry_alpaca_v1/cherry_alpaca_5_percent_compressed.json \
    --save_path cherry_alpaca_5_percent_reflect_ins.json \
    --save_intermediate_path cherry_alpaca_5_percent_reflect_ins_mid.json \
    --api_key xxx

--raw_data_path: The path that saves the raw reflection texts
--ori_data_path: The original targeted dataset in the Alpaca format
--save_path: The path to save formated dataset in the Alpaca format
--save_intermediate_path: The path to save the middle results
--api_key: Your openAI key

Reflection on Response

Reflection

python reflection_code/reflect_response.py \
    --data_path data/cherry_alpaca_v1/cherry_alpaca_5_percent_compressed.json \
    --save_path cherry_alpaca_5_percent_reflect_res_raw.json \
    --api_key xxx

Extract the instruction-response pairs:

python reflection_code/reflect_response_postprocess.py \
    --raw_data_path cherry_alpaca_5_percent_reflect_res_raw.json \
    --ori_data_path data/cherry_alpaca_v1/cherry_alpaca_5_percent_compressed.json \
    --save_path cherry_alpaca_5_percent_reflect_res.json \
    --save_intermediate_path cherry_alpaca_5_percent_reflect_res_mid.json \
    --api_key xxx

Note: When reflecting on the instruction, we first compress the instruction and input it into one single instruction for easier processing by using chatGPT. The whole compressed Alpaca data can be found in the data folder.
Note: The extraction of reflection results is based on regular expression and, thus is not perfect. We will release the raw output before the extraction in the future.

Code for Selection

Generate Data Statistics

python selection_code/data_analysis.py \
    --data_path data/cherry_alpaca_v1/cherry_alpaca_5_percent_compressed.json \
    --save_path cherry_alpaca_5_percent_analysis.jsonl \
    --model_name_or_path meta-llama/Llama-2-7b-hf

This code calculates the necessary statistics for calculating IFD scores and r-IFD scores.
Please feel free to customize your own training prompt and model.

Put the Statistics to Original Data

python selection_code/put_analysis_to_data.py \
    --pt_data_path cherry_alpaca_5_percent_analysis.jsonl \
    --json_data_path data/cherry_alpaca_v1/cherry_alpaca_5_percent_compressed.json \
    --json_save_path cherry_alpaca_5_percent_with_analysis.json

After obtaining data with IFD scores and r-IFD scores, you can compare these scores to make the customized selection.

Data and Model Weights V1

The following table provides a comparison between our recycled models (V1) and baseline models on the AlpacaEval Leaderboard and Huggingface Open LLM Leaderboard.

The Recycled Alpaca Data can be found here: [hf-Link]
The Recycled WizardLM (70k) Data can be found here: [hf-Link]

	AlpacaEval	Avg	ARC	HellaSwag	MMLU	TruthfulQA	Model
Alpaca 7B	26.46	50.21	42.65	76.91	41.73	39.55	/
Recycled Alpaca 7B V1.0	76.99	56.18	53.92	77.68	47.55	45.55	[hf-Link]
Recycled Alpaca 13B V1.0	83.42	58.93	58.70	80.80	53.11	43.12	[Link]

WizardLM 7B	67.64	54.18	51.60	77.70	42.70	44.70	/
Recycled WizardLM 7B V1.0	78.88	56.21	53.92	77.05	48.35	45.52	[hf-Link]

Data and Model Weights V2

In the repo, we name our Selective Reflection-Tuning as the V2 method for simplicity.

The following table provides a comparison between our sRecycled models (V2) and baseline models on the AlpacaEval Leaderboard and Huggingface Open LLM Leaderboard.

The data used for training llama2-7b model can be directly found here:
Selective Recycled Data (V2): Alpaca, WizardLM70k
The above data contains the model-selected data from data reflected on both instruction and response and data reflected on response.

The complete data reflected on instruction, response or both can be found here:
Overall Reflected Data: Alpaca, WizardLM70k, WizardLM V2

The complete data statistics on (loss, and perplexity) on llama2-7b and llama2-13b model can be found here, which can be directly used for computing IFD scores:
Overall Reflected Data IFD scores:Alpaca llama2 7b, Alpaca llama2 13b, WizardLM70k llama2 7b, WizardLM70k llama2 13b

	AlpacaEval	Avg	ARC	HellaSwag	MMLU	TruthfulQA	Model
Alpaca 7B	26.46	50.21	42.65	76.91	41.73	39.55	/
Recycled Alpaca 7B V2.0	79.58	56.05	54.01	78.07	46.69	45.41	[hf-Link]

WizardLM 7B	67.64	54.18	51.60	77.70	42.70	44.70	/
Recycled WizardLM 7B V2.0	83.48	56.79	54.78	77.86	45.63	48.91	[hf-Link]

Prompt and Hyperparameters

We use the prompt from FastChat:

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Hi ASSISTANT: Hello.</s>USER: Who are you? ASSISTANT: I am ...</s>......

Hyperparameter	Global Batch Size	Learning rate	Epochs	Max length	Weight decay	Warmup Rate
Recycled Models (7B)	128	2e-5	3	2048	0	0.03
Recycled Models (13B)	128	2e-5	3	2048	0	0.03

ToDo

[x] Release the code, data, and models for V1.
[x] Release the method for Selective Reflection-Tuning (V2).
[x] Release the data for Selective Reflection-Tuning (V2).
[x] Release the models for Selective Reflection-Tuning (V2).

Citation

Please consider citing our paper if you think our codes, data, or models are useful. Thank you!

@inproceedings{li2023reflectiontuning,
  title={Reflection-Tuning: Recycling Data for Better Instruction-Tuning},
  author={Ming Li and Lichang Chen and Jiuhai Chen and Shwai He and Tianyi Zhou},
  booktitle={NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following},
  year={2023},
  url={https://openreview.net/forum?id=xaqoZZqkPU}
}

@inproceedings{li-etal-2024-selective,
    title = "Selective Reflection-Tuning: Student-Selected Data Recycling for {LLM} Instruction-Tuning",
    author = "Li, Ming  and
      Chen, Lichang  and
      Chen, Jiuhai  and
      He, Shwai  and
      Gu, Jiuxiang  and
      Zhou, Tianyi",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand and virtual meeting",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.958",
    pages = "16189--16211",
}

@inproceedings{li-etal-2024-quantity,
    title = "From Quantity to Quality: Boosting {LLM} Performance with Self-Guided Data Selection for Instruction Tuning",
    author = "Li, Ming  and
      Zhang, Yong  and
      Li, Zhitao  and
      Chen, Jiuhai  and
      Chen, Lichang  and
      Cheng, Ning  and
      Wang, Jianzong  and
      Zhou, Tianyi  and
      Xiao, Jing",
    editor = "Duh, Kevin  and
      Gomez, Helena  and
      Bethard, Steven",
    booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    month = jun,
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.naacl-long.421",
    pages = "7595--7628",
}

@inproceedings{li-etal-2024-superfiltering,
    title = "Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning",
    author = "Li, Ming  and
      Zhang, Yong  and
      He, Shwai  and
      Li, Zhitao  and
      Zhao, Hongyu  and
      Wang, Jianzong  and
      Cheng, Ning  and
      Zhou, Tianyi",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.769",
    pages = "14255--14273",
}

Our Related Works

If you are interested in Data Selection for Instruction Tuning, please see Cherry_LLM and Superfiltering.
If you are interested in human/LLM-free Data Augmentation for Instruction Tuning, please see Mosaic-IT and RuleR.
If you are interested in Data Improvement for Instruction Tuning, please see Reflection_Tuning.
If you are interested in Knowledge Distillation in the LLM era, please see this Survey.

For Tasks:

Click tags to check more tools for each tasks

improve data quality generate high-quality pairs evaluate instruction-response pairs enhance instruction and response selectively integrate data enhancements

For Jobs:

data scientist machine learning engineer research scientist artificial intelligence specialist natural language processing engineer

Alternative AI tools for Reflection_Tuning

Similar Open Source Tools

Reflection_Tuning

github

: 124

graphiti

github

: 3.2k

k2

K2 (GeoLLaMA) is a large language model for geoscience, trained on geoscience literature and fine-tuned with knowledge-intensive instruction data. It outperforms baseline models on objective and subjective tasks. The repository provides K2 weights, core data of GeoSignal, GeoBench benchmark, and code for further pretraining and instruction tuning. The model is available on Hugging Face for use. The project aims to create larger and more powerful geoscience language models in the future.

github

: 153

beeai-framework

BeeAI Framework is a versatile tool for building production-ready multi-agent systems. It offers flexibility in orchestrating agents, seamless integration with various models and tools, and production-grade controls for scaling. The framework supports Python and TypeScript libraries, enabling users to implement simple to complex multi-agent patterns, connect with AI services, and optimize token usage and resource management.

github

: 2.2k

basiclingua-LLM-Based-NLP

BasicLingua is a Python library that provides functionalities for linguistic tasks such as tokenization, stemming, lemmatization, and many others. It is based on the Gemini Language Model, which has demonstrated promising results in dealing with text data. BasicLingua can be used as an API or through a web demo. It is available under the MIT license and can be used in various projects.

github

: 77

MathCoder

MathCoder is a repository focused on enhancing mathematical reasoning by fine-tuning open-source language models to use code for modeling and deriving math equations. It introduces MathCodeInstruct dataset with solutions interleaving natural language, code, and execution results. The repository provides MathCoder models capable of generating code-based solutions for challenging math problems, achieving state-of-the-art scores on MATH and GSM8K datasets. It offers tools for model deployment, inference, and evaluation, along with a citation for referencing the work.

github

: 173

humanlayer

HumanLayer is a Python toolkit designed to enable AI agents to interact with humans in tool-based and asynchronous workflows. By incorporating humans-in-the-loop, agentic tools can access more powerful and meaningful tasks. The toolkit provides features like requiring human approval for function calls, human as a tool for contacting humans, omni-channel contact capabilities, granular routing, and support for various LLMs and orchestration frameworks. HumanLayer aims to ensure human oversight of high-stakes function calls, making AI agents more reliable and safe in executing impactful tasks.

github

: 596

CogVideo

CogVideo is an open-source repository that provides pretrained text-to-video models for generating videos based on input text. It includes models like CogVideoX-2B and CogVideo, offering powerful video generation capabilities. The repository offers tools for inference, fine-tuning, and model conversion, along with demos showcasing the model's capabilities through CLI, web UI, and online experiences. CogVideo aims to facilitate the creation of high-quality videos from textual descriptions, catering to a wide range of applications.

github

: 11.0k

raga-llm-hub

Raga LLM Hub is a comprehensive evaluation toolkit for Language and Learning Models (LLMs) with over 100 meticulously designed metrics. It allows developers and organizations to evaluate and compare LLMs effectively, establishing guardrails for LLMs and Retrieval Augmented Generation (RAG) applications. The platform assesses aspects like Relevance & Understanding, Content Quality, Hallucination, Safety & Bias, Context Relevance, Guardrails, and Vulnerability scanning, along with Metric-Based Tests for quantitative analysis. It helps teams identify and fix issues throughout the LLM lifecycle, revolutionizing reliability and trustworthiness.

github

: 92

ScholarCopilot

Scholar Copilot is an intelligent academic writing assistant that enhances the research writing process through AI-powered text completion and citation suggestions. It aims to streamline academic writing while maintaining high scholarly standards. The tool provides features such as smart text generation with next-3-sentence suggestions, full section auto-completion, and context-aware writing. It also offers intelligent citation management with real-time citation suggestions, one-click citation insertion, and citation Bibtex generation. Scholar Copilot employs a unified model architecture that integrates retrieval and generation through a dynamic switching mechanism, ensuring coherent text generation with appropriate citation points.

github

: 86

nous

Nous is an open-source TypeScript platform for autonomous AI agents and LLM based workflows. It aims to automate processes, support requests, review code, assist with refactorings, and more. The platform supports various integrations, multiple LLMs/services, CLI and web interface, human-in-the-loop interactions, flexible deployment options, observability with OpenTelemetry tracing, and specific agents for code editing, software engineering, and code review. It offers advanced features like reasoning/planning, memory and function call history, hierarchical task decomposition, and control-loop function calling options. Nous is designed to be a flexible platform for the TypeScript community to expand and support different use cases and integrations.

github

: 766

repromodel

ReproModel is an open-source toolbox designed to boost AI research efficiency by enabling researchers to reproduce, compare, train, and test AI models faster. It provides standardized models, dataloaders, and processing procedures, allowing researchers to focus on new datasets and model development. With a no-code solution, users can access benchmark and SOTA models and datasets, utilize training visualizations, extract code for publication, and leverage an LLM-powered automated methodology description writer. The toolbox helps researchers modularize development, compare pipeline performance reproducibly, and reduce time for model development, computation, and writing. Future versions aim to facilitate building upon state-of-the-art research by loading previously published study IDs with verified code, experiments, and results stored in the system.

github

: 151

Mooncake

Mooncake is a serving platform for Kimi, a leading LLM service provided by Moonshot AI. It features a KVCache-centric disaggregated architecture that separates prefill and decoding clusters, leveraging underutilized CPU, DRAM, and SSD resources of the GPU cluster. Mooncake's scheduler balances throughput and latency-related SLOs, with a prediction-based early rejection policy for highly overloaded scenarios. It excels in long-context scenarios, achieving up to a 525% increase in throughput while handling 75% more requests under real workloads.

github

: 3.0k

ludwig

Ludwig is a declarative deep learning framework designed for scale and efficiency. It is a low-code framework that allows users to build custom AI models like LLMs and other deep neural networks with ease. Ludwig offers features such as optimized scale and efficiency, expert level control, modularity, and extensibility. It is engineered for production with prebuilt Docker containers, support for running with Ray on Kubernetes, and the ability to export models to Torchscript and Triton. Ludwig is hosted by the Linux Foundation AI & Data.

github

: 11.0k

langkit

LangKit is an open-source text metrics toolkit for monitoring language models. It offers methods for extracting signals from input/output text, compatible with whylogs. Features include text quality, relevance, security, sentiment, toxicity analysis. Installation via PyPI. Modules contain UDFs for whylogs. Benchmarks show throughput on AWS instances. FAQs available.

github

: 823

UltraRAG

The UltraRAG framework is a researcher and developer-friendly RAG system solution that simplifies the process from data construction to model fine-tuning in domain adaptation. It introduces an automated knowledge adaptation technology system, supporting no-code programming, one-click synthesis and fine-tuning, multidimensional evaluation, and research-friendly exploration work integration. The architecture consists of Frontend, Service, and Backend components, offering flexibility in customization and optimization. Performance evaluation in the legal field shows improved results compared to VanillaRAG, with specific metrics provided. The repository is licensed under Apache-2.0 and encourages citation for support.

github

: 101

For similar tasks

Reflection_Tuning

github

: 124

prometheus-eval

Prometheus-Eval is a repository dedicated to evaluating large language models (LLMs) in generation tasks. It provides state-of-the-art language models like Prometheus 2 (7B & 8x7B) for assessing in pairwise ranking formats and achieving high correlation scores with benchmarks. The repository includes tools for training, evaluating, and using these models, along with scripts for fine-tuning on custom datasets. Prometheus aims to address issues like fairness, controllability, and affordability in evaluations by simulating human judgments and proprietary LM-based assessments.

github

: 581

LLMs

LLMs is a Chinese large language model technology stack for practical use. It includes high-availability pre-training, SFT, and DPO preference alignment code framework. The repository covers pre-training data cleaning, high-concurrency framework, SFT dataset cleaning, data quality improvement, and security alignment work for Chinese large language models. It also provides open-source SFT dataset construction, pre-training from scratch, and various tools and frameworks for data cleaning, quality optimization, and task alignment.

github

: 97

ProX

ProX is a lm-based data refinement framework that automates the process of cleaning and improving data used in pre-training large language models. It offers better performance, domain flexibility, efficiency, and cost-effectiveness compared to traditional methods. The framework has been shown to improve model performance by over 2% and boost accuracy by up to 20% in tasks like math. ProX is designed to refine data at scale without the need for manual adjustments, making it a valuable tool for data preprocessing in natural language processing tasks.

github

: 164

For similar jobs

LLM-Agent-Survey

Autonomous agents are designed to achieve specific objectives through self-guided instructions. With the emergence and growth of large language models (LLMs), there is a growing trend in utilizing LLMs as fundamental controllers for these autonomous agents. This repository conducts a comprehensive survey study on the construction, application, and evaluation of LLM-based autonomous agents. It explores essential components of AI agents, application domains in natural sciences, social sciences, and engineering, and evaluation strategies. The survey aims to be a resource for researchers and practitioners in this rapidly evolving field.

github

: 2.2k

AIProductHome

AI Product Home is a repository dedicated to collecting various AI commercial or open-source products. It provides assistance in submitting issues, self-recommendation, correcting resources, and more. The repository also features AI tools like Build Naidia, Autopod, Rytr, Mubert, and a virtual town driven by AI. It includes sections for AI models, chat dialogues, AI assistants, code assistance, artistic creation, content creation, and more. The repository covers a wide range of AI-related tools and resources for users interested in AI products and services.

github

: 711

AI-Catalog

AI-Catalog is a curated list of AI tools, platforms, and resources across various domains. It serves as a comprehensive repository for users to discover and explore a wide range of AI applications. The catalog includes tools for tasks such as text-to-image generation, summarization, prompt generation, writing assistance, code assistance, developer tools, low code/no code tools, audio editing, video generation, 3D modeling, search engines, chatbots, email assistants, fun tools, gaming, music generation, presentation tools, website builders, education assistants, autonomous AI agents, photo editing, AI extensions, deep face/deep fake detection, text-to-speech, startup tools, SQL-related AI tools, education tools, and text-to-video conversion.

github

: 361

awesome-ai-repositories

A curated list of open source repositories for AI Engineers. The repository provides a comprehensive collection of tools and frameworks for various AI-related tasks such as AI Gateway, AI Workload Manager, Copilot Development, Dataset Engineering, Evaluation, Fine Tuning, Function Calling, Graph RAG, Guardrails, Local Model Inference, LLM Agent Framework, Model Serving, Observability, Pre Training, Prompt Engineering, RAG Framework, Security, Structured Extraction, Structured Generation, Vector DB, and Voice Agent.

github

: 73

AI-Bootcamp

The AI Bootcamp is a comprehensive training program focusing on real-world applications to equip individuals with the skills and knowledge needed to excel as AI engineers. The bootcamp covers topics such as Real-World PyTorch, Machine Learning Projects, Fine-tuning Tiny LLM, Deployment of LLM to Production, AI Agents with GPT-4 Turbo, CrewAI, Llama 3, and more. Participants will learn foundational skills in Python for AI, ML Pipelines, Large Language Models (LLMs), AI Agents, and work on projects like RagBase for private document chat.

github

: 518

easyAi

EasyAi is a lightweight, beginner-friendly Java artificial intelligence algorithm framework. It can be seamlessly integrated into Java projects with Maven, requiring no additional environment configuration or dependencies. The framework provides pre-packaged modules for image object detection and AI customer service, as well as various low-level algorithm tools for deep learning, machine learning, reinforcement learning, heuristic learning, and matrix operations. Developers can easily develop custom micro-models tailored to their business needs.

github

: 75

awesome-ai-newsletters

Awesome AI Newsletters is a curated list of AI-related newsletters that provide the latest news, trends, tools, and insights in the field of Artificial Intelligence. It includes a variety of newsletters covering general AI news, prompts for marketing and productivity, AI job opportunities, and newsletters tailored for professionals in the AI industry. Whether you are a beginner looking to stay updated on AI advancements or a professional seeking to enhance your knowledge and skills, this repository offers a collection of valuable resources to help you navigate the world of AI.

github

: 56

Reflection_Tuning

github

: 124

Reflection_Tuning

README:

Reflection-Tuning: An Approach for Data Recycling (ACL'24)

News

Contents

Overview

Reflection-Tuning (V1)

Selective Reflection-Tuning (V2)

Highlights

Reflection-Tuning (V1)

Selective Reflection-Tuning (V2)

Selective Reflection-Tuning

Introduction

Method

Results

Install

Code for Reflection

Note:

Reflection on Instruction

Reflection on Response

Code for Selection

Data and Model Weights V1

Data and Model Weights V2

Prompt and Hyperparameters

ToDo

Citation

Our Related Works

For Tasks:

For Jobs:

Alternative AI tools for Reflection_Tuning

Similar Open Source Tools

Reflection_Tuning

graphiti

k2

beeai-framework

basiclingua-LLM-Based-NLP

MathCoder

humanlayer

CogVideo

raga-llm-hub

ScholarCopilot

nous

repromodel

Mooncake

ludwig

langkit

UltraRAG

For similar tasks

Reflection_Tuning

prometheus-eval

LLMs

ProX

For similar jobs

LLM-Agent-Survey

AIProductHome

AI-Catalog

awesome-ai-repositories

AI-Bootcamp

easyAi

awesome-ai-newsletters

Reflection_Tuning