long-llms-learning

long-llms-learning

A repository sharing the literatures about long-context large language models, including the methodologies and the evaluation benchmarks

Stars: 205

Visit
 screenshot

A repository sharing the panorama of the methodology literature on Transformer architecture upgrades in Large Language Models for handling extensive context windows, with real-time updating the newest published works. It includes a survey on advancing Transformer architecture in long-context large language models, flash-ReRoPE implementation, latest news on data engineering, lightning attention, Kimi AI assistant, chatglm-6b-128k, gpt-4-turbo-preview, benchmarks like InfiniteBench and LongBench, long-LLMs-evals for evaluating methods for enhancing long-context capabilities, and LLMs-learning for learning technologies and applicated tasks about Large Language Models.

README:

long-llms-learning

survey

A repository sharing the panorama of the methodology literature on Transformer architecture upgrades in Large Language Models for handling extensive context windows, with real-time updating the newest published works.

Overview

Survey

For a clear taxonomy and more insights about the methodology, you can refer to our survey: Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey with a overview shown below

Overview of the survey

Flash-ReRoPE

We have augmented the great work rerope by Su with flash-attn kernel to combine rerope's infinite postional extrapolation capability with flash-attn's efficience, named as flash-rerope.

You can find and use the implementation as a flash-attn-like interface function here, with a simple precision and flops test script here.

Or you can further see how to implement llama attention module with flash-rerope here.

Latest News

Latest Works

Latest Baselines

Latest Benchmarks

More to Learn

Long-LLMs-Evals

  • We've also released a building repo long-llms-evals as a pipeline to evaluate various methods designed for general / specific LLMs to enhance their long-context capabilities on well-known long-context benchmarks.

LLMs-Learning

  • This repo is also a sub-track for another repo llms-learning, where you can learn more technologies and applicated tasks about the full-stack of Large Language Models.

Table of Contents

Contribution

If you want to make contribution to this repo, you can just make a pr / email us with the link to the paper(s) or use the format as below:

  • (un)read paper format:
#### <paper title> [(UN)READ]

paper link: [here](<link address>)

xxx link: [here](<link address>)

citation:
<bibtex citation>

Citation

If you find the survey or this repo helpful in your research or work, you can cite our paper as below:

@misc{huang2024advancing,
      title={Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey}, 
      author={Yunpeng Huang and Jingwei Xu and Junyu Lai and Zixu Jiang and Taolue Chen and Zenan Li and Yuan Yao and Xiaoxing Ma and Lijuan Yang and Hao Chen and Shupeng Li and Penghao Zhao},
      year={2024},
      eprint={2311.12351},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for long-llms-learning

Similar Open Source Tools

For similar tasks

For similar jobs