
SLMs-Survey
Survey of Small Language Models from Penn State, ...
Stars: 135

SLMs-Survey is a comprehensive repository that includes papers and surveys on small language models. It covers topics such as technology, on-device applications, efficiency, enhancements for LLMs, and trustworthiness. The repository provides a detailed overview of existing SLMs, their architecture, enhancements, and specific applications in various domains. It also includes information on SLM deployment optimization techniques and the synergy between SLMs and LLMs.
README:
A Comprehensive Survey of Small Language Models: Technology, On-Device Applications, Efficiency, Enhancements for LLMs, and Trustworthiness
This repo includes the papers discussed in our latest survey paper on small language models.
📖 Read the full paper here: Paper Link
- 2024/12/28: The second version of our survey is on Arxiv!
- 2024/11/04: The first version of our survey is on Arxiv!
If our survey is useful for your research, please kindly cite our paper:
@article{wang2024comprehensive,
title={A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness},
author={Wang, Fali and Zhang, Zhiwei and Zhang, Xianren and Wu, Zongyu and Mo, Tzuhao and Lu, Qiuhao and Wang, Wanjing and Li, Rui and Xu, Junjie and Tang, Xianfeng and others},
journal={arXiv preprint arXiv:2411.03350},
year={2024}
}
Model | #Params | Date | Paradigm | Domain | Code | HF Model | Paper/Blog |
---|---|---|---|---|---|---|---|
PhoneLM | 0.5B; 1.5B | 2024.11 | Pre-train | Generic | Github | HF | Paper |
Llama 3.2 | 1B; 3B | 2024.9 | Pre-train | Generic | Github | HF | Blog |
Qwen 1 | 1.8B; 7B; 14B; 72B | 2023.12 | Pre-train | Generic | Github | HF | Paper |
Qwen 1.5 | 0.5B; 1.8B; 4B; 7B; 14B; 32B; 72B | 2024.2 | Pre-train | Generic | Github | HF | Paper |
Qwen 2 | 0.5B; 1.5B; 7B; 57B; 72B | 2024.6 | Pre-train | Generic | Github | HF | Paper |
Qwen 2.5 | 0.5B; 1.5B; 3B; 7B; 14B; 32B; 72B | 2024.9 | Pre-train | Generic | Github | HF | Paper |
Gemma | 2B; 7B | 2024.2 | Pre-train | Generic | HF | Paper | |
Gemma 2 | 2B; 9B; 27B | 2024.7 | Pre-train | Generic | HF | Paper | |
SmolLM | 135M; 360M; 1.7B | 2024.7 | Pre-train | Generic | Github | HF | Blog |
H2O-Danube3 | 500M; 4B | 2024.7 | Pre-train | Generic | HF | Paper | |
LLM-Neo | 1B | 2024.11 | Continous Training | Generic | HF | Paper | |
Fox-1 | 1.6B | 2024.6 | Pre-train | Generic | HF | Blog | |
Rene | 1.3B | 2024.5 | Pre-train | Generic | HF | Paper | |
MiniCPM | 1.2B; 2.4B | 2024.4 | Pre-train | Generic | Github | HF | Paper |
OLMo | 1B; 7B | 2024.2 | Pre-train | Generic | Github | HF | Paper |
TinyLlama | 1B | 2024.1 | Pre-train | Generic | Github | HF | Paper |
Phi-1 | 1.3B | 2023.6 | Pre-train | Coding | HF | Paper | |
Phi-1.5 | 1.3B | 2023.9 | Pre-train | Generic | HF | Paper | |
Phi-2 | 2.7B | 2023.12 | Pre-train | Generic | HF | Paper | |
Phi-3 | 3.8B; 7B; 14B | 2024.4 | Pre-train | Generic | HF | Paper | |
Phi-3.5 | 3.8B; 4.2B; 6.6B | 2024.4 | Pre-train | Generic | HF | Paper | |
OpenELM | 270M; 450M; 1.1B; 3B | 2024.4 | Pre-train | Generic | Github | HF | Paper |
MobiLlama | 0.5B; 0.8B | 2024.2 | Pre-train | Generic | Github | HF | Paper |
MobileLLM | 125M; 350M | 2024.2 | Pre-train | Generic | Github | HF | Paper |
StableLM | 3B; 7B | 2023.4 | Pre-train | Generic | Github | HF | Paper |
StableLM 2 | 1.6B | 2024.2 | Pre-train | Generic | Github | HF | Paper |
Cerebras-GPT | 111M-13B | 2023.4 | Pre-train | Generic | HF | Paper | |
BLOOM, BLOOMZ | 560M; 1.1B; 1.7B; 3B; 7.1B; 176B | 2022.11 | Pre-train | Generic | HF | Paper | |
Galactica | 125M; 1.3B; 6.7B | 2022.11 | Pre-train | Scientific | HF | Paper | |
OPT | 125M; 350M; 1.3B; 2.7B; 5.7B | 2022.5 | Pre-train | Generic | HF | Paper | |
XGLM | 1.7B; 2.9B; 7.5B | 2021.12 | Pre-train | Generic | Github | HF | Paper |
GPT-Neo | 125M; 350M; 1.3B; 2.7B | 2021.5 | Pre-train | Generic | Github | Paper | |
Megatron-gpt2 | 355M; 2.5B; 8.3B | 2019.9 | Pre-train | Generic | Github | Paper, Blog | |
MINITRON | 4B; 8B; 15B | 2024.7 | Pruning and Distillation | Generic | Github | HF | Paper |
MiniMix | 7B | 2024.7 | Pre-train | Generic | Github | HF | Paper |
MiniMA-2 | 1B; 3B | 2023.12 | Pre-train | Generic | Github | HF | Paper |
MiniMA | 3B | 2023.11 | Pruning and Distillation | Generic | Github | HF | Paper |
Orca 2 | 7B | 2023.11 | Distillation | Generic | HF | Paper | |
Dolly-v2 | 3B; 7B; 12B | 2023.4 | Instruction tuning | Generic | Github | HF | Blog |
LaMini-LM | 61M-7B | 2023.4 | Distillation | Generic | Github | HF | Blog |
Specialized FlanT5 | 250M; 760M; 3B | 2023.1 | Instruction Tuning | Generic (math) | Github | - | Paper |
FlanT5 | 80M; 250M; 780M; 3B | 2022.10 | Instruction Tuning | Generic | Gihub | HF | Paper |
T5 | 60M; 220M; 770M; 3B; 11B | 2019.9 | Pre-train | Generic | Github | HF | Paper |
- Transformer: Attention is all you need. Ashish Vaswani. NeurIPS 2017.
- Mamba 1: Mamba: Linear-time sequence modeling with selective state spaces. Albert Gu and Tri Dao. COLM 2024. [Paper].
- Mamba 2: Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality. Tri Dao and Albert Gu. ICML 2024. [Paper] [Code]
- Hymba: A Hybrid-head Architecture for Small Language Models. Xin Dong, Yonggan Fu, Shizhe Diao, Wonmin Byeon, Zijia Chen, Ameya Sunil Mahabaleshwarkar, Shih-Yang Liu, Matthijs Van Keirsbilck, Min-Hung Chen, Yoshi Suhara, Yingyan Lin, Jan Kautz, Pavlo Molchanov. arXiv 2024.11. [Paper] [HF]
- xLSTM: Extended Long Short-Term Memory. Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes Brandstetter, Sepp Hochreiter. arXiv 2024.12. [Paper] [Code]
- MobiLlama: "MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT". Omkar Thawakar, Ashmal Vayani, Salman Khan, Hisham Cholakal, Rao M. Anwer, Michael Felsberg, Tim Baldwin, Eric P. Xing, Fahad Shahbaz Khan. arXiv 2024. [Paper] [Github] [HuggingFace]
- MobileLLM: "MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases". Zechun Liu, Changsheng Zhao, Forrest Iandola, Chen Lai, Yuandong Tian, Igor Fedorov, Yunyang Xiong, Ernie Chang, Yangyang Shi, Raghuraman Krishnamoorthi, Liangzhen Lai, Vikas Chandra ICML 2024. [Paper] [Github] [HuggingFace]
- Rethinking optimization and architecture for tiny language models. Yehui Tang, Fangcheng Liu, Yunsheng Ni, Yuchuan Tian, Zheyuan Bai, Yi-Qi Hu, Sichao Liu, Shangling Jui, Kai Han, and Yunhe Wang. ICML 2024. [Paper] [Code]
- MindLLM: "MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications". Yizhe Yang, Huashan Sun, Jiawei Li, Runheng Liu, Yinghao Li, Yuhang Liu, Heyan Huang, Yang Gao. arXiv 2023. [Paper] [HuggingFace]
- Direct preference optimization: Your language model is secretly a reward model. Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. NeurIPS, 2024. [Paper] [Code]
- Enhancing chat language models by scaling high-quality instructional conversations. Ning Ding, Yulin Chen, Bokai Xu, Yujia Qin, Zhi Zheng, Shengding Hu, Zhiyuan Liu, Maosong Sun, and Bowen Zhou. EMNLP 2023. [Paper] [Code]
- SlimOrca: An Open Dataset of GPT-4 Augmented FLAN Reasoning Traces, with Verification. Wing Lian, Guan Wang, Bleys Goodson, Eugene Pentland, Austin Cook, Chanvichet Vong, and "Teknium". Huggingface, 2023. [Data]
- Stanford Alpaca: An Instruction-following LLaMA model. Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. GitHub, 2023. [Blog] [Github] [HuggingFace]
- OpenChat: Advancing Open-source Language Models with Mixed-Quality Data. Guan Wang, Sijie Cheng, Xianyuan Zhan, Xiangang Li, Sen Song, and Yang Liu. ICLR, 2024. [Paper] [Code] [HuggingFace]
- Training language models to follow instructions with human feedback. Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, Ryan Lowe. NeurIPS, 2022. [Paper]
- RLHF: "Training language models to follow instructions with human feedback". Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe. 2022. [Paper]
- MobileBERT: "MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices". Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, Denny Zhou. ACL 2020. [Paper] [Github] [HuggingFace]
- Language models are unsupervised multitask learners. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever. OpenAI Blog, 2019. [Paper]
- TinyStory: "TinyStories: How Small Can Language Models Be and Still Speak Coherent English?". Ronen Eldan, Yuanzhi Li. 2023. [Paper] [HuggingFace]
- AS-ES: "AS-ES Learning: Towards Efficient CoT Learning in Small Models". Nuwa Xi, Yuhan Chen, Sendong Zhao, Haochun Wang, Bing Qin, Ting Liu. 2024. [Paper]
- Self-Amplify: "Self-AMPLIFY: Improving Small Language Models with Self Post Hoc Explanations". Milan Bhan, Jean-Noel Vittaut, Nicolas Chesneau, Marie-Jeanne Lesot. 2024. [Paper]
- Large Language Models Can Self-Improve. Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, and Jiawei Han. EMNLP 2023. [Paper]
- Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing. Ye Tian, Baolin Peng, Linfeng Song, Lifeng Jin, Dian Yu, Haitao Mi, and Dong Yu. NeurIPS 2024. [Paper] [Code]
- GKD: "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes". Rishabh Agarwal et al. ICLR 2024. [Paper]
- DistilLLM: "DistiLLM: Towards Streamlined Distillation for Large Language Models". Jongwoo Ko et al. ICML 2024. [Paper] [Github]
- Adapt-and-Distill: "Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains". Yunzhi Yao et al. ACL2021. [Paper] [Github]
- AKL: "Rethinking kullback-leibler divergence in knowledge distillation for large language models". Taiqiang Wu, Chaofan Tao, Jiahao Wang, Runming Yang, Zhe Zhao, Ngai Wong. Arxiv 2024. [Paper] [Github]
- Weight-inherited distillation for task-agnostic bert compression Taiqiang Wu, Cheng Hou, Shanshan Lao, Jiayi Li, Ngai Wong, Zhe Zhao, Yujiu Yang NAACL, 2024, [Paper] [Code]
- SmoothQuant: "SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models". Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, Song Han. ICML 2023. [Paper] [Github][Slides][Video]
- BiLLM: "BiLLM: Pushing the Limit of Post-Training Quantization for LLMs". Wei Huang, Yangdong Liu, Haotong Qin, Ying Li, Shiming Zhang, Xianglong Liu, Michele Magno, Xiaojuan Qi. 2024. [Paper] [Github]
- LLM-QAT: "LLM-QAT: Data-Free Quantization Aware Training for Large Language Models". Zechun Liu, Barlas Oguz, Changsheng Zhao, Ernie Chang, Pierre Stock, Yashar Mehdad, Yangyang Shi, Raghuraman Krishnamoorthi, Vikas Chandra. 2023. [Paper]
- PB-LLM: "PB-LLM: Partially Binarized Large Language Models". Zhihang Yuan, Yuzhang Shang, Zhen Dong. ICLR 2024. [Paper] [Github]
- OneBit: "OneBit: Towards Extremely Low-bit Large Language Models". Yuzhuang Xu, Xu Han, Zonghan Yang, Shuo Wang, Qingfu Zhu, Zhiyuan Liu, Weidong Liu, Wanxiang Che. NeurIPS 2024. [Paper]
- BitNet: "BitNet: Scaling 1-bit Transformers for Large Language Models". Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, Furu Wei. 2023. [Paper]
- BitNet b1.58: "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits". Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei. 2024. [Paper]
- SqueezeLLM: "SqueezeLLM: Dense-and-Sparse Quantization". Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer. ICML 2024. [Paper] [Github]
- JSQ: "Compressing Large Language Models by Joint Sparsification and Quantization". Jinyang Guo, Jianyu Wu, Zining Wang, Jiaheng Liu, Ge Yang, Yifu Ding, Ruihao Gong, Haotong Qin, Xianglong Liu. ICML 2024. [Paper] [Github]
- FrameQuant: "FrameQuant: Flexible Low-Bit Quantization for Transformers". Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang, Vikas Singh. 2024. [Paper] [Github]
- BiLLM: "BiLLM: Pushing the Limit of Post-Training Quantization for LLMs". Wei Huang, Yangdong Liu, Haotong Qin, Ying Li, Shiming Zhang, Xianglong Liu, Michele Magno, Xiaojuan Qi. 2024. [Paper] [Github]
- LQER: "LQER: Low-Rank Quantization Error Reconstruction for LLMs". Cheng Zhang, Jianyi Cheng, George A. Constantinides, Yiren Zhao. ICML 2024. [Paper] [Github]
- I-LLM: "I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models". Xing Hu, Yuan Cheng, Dawei Yang, Zhihang Yuan, Jiangyong Yu, Chen Xu, Sifan Zhou. 2024. [Paper] [Github]
- PV-Tuning: "PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression". Vladimir Malinovskii, Denis Mazur, Ivan Ilin, Denis Kuznedelev, Konstantin Burlachenko, Kai Yi, Dan Alistarh, Peter Richtarik. 2024. [Paper]
- PEQA: "Memory-efficient fine-tuning of compressed large language models via sub-4-bit integer quantization". Jeonghoon Kim, Jung Hyun Lee, Sungdong Kim, Joonsuk Park, Kang Min Yoo, Se Jung Kwon, Dongsoo Lee. NIPS 2023. [Paper]
- QLoRA: "QLORA: efficient finetuning of quantized LLMs". Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke ZettlemoyerAuthors Info & Claims. NIPS 2023. [Paper] [Github]
- "Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples!". Yubo Ma, Yixin Cao, YongChing Hong, Aixin Sun. EMNLP 2023. [Paper] [Github]
- MoQE: "Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness". Young Jin Kim, Raffy Fahim, Hany Hassan Awadalla. 2023. [Paper]
- SLM-RAG: "Can Small Language Models With Retrieval-Augmented Generation Replace Large Language Models When Learning Computer Science?". Suqing Liu, Zezhu Yu, Feiran Huang, Yousef Bulbulia, Andreas Bergen, Michael Liut. ITiCSE 2024. [Paper]
- Alpaca: "Alpaca: A Strong, Replicable Instruction-Following Model". Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, Tatsunori B. Hashimoto. 2023. [Paper] [Github] [HuggingFace] [Website]
- Stable Beluga 7B: "Stable Beluga 2". Mahan, Dakota and Carlow, Ryan and Castricato, Louis and Cooper, Nathan and Laforte, Christian. 2023. [HuggingFace]
- Fine-tuned BioGPT Guo et al.: "Improving Small Language Models on PubMedQA via Generative Data Augmentation". Zhen Guo, Peiqi Wang, Yanwei Wang, Shangdi Yu. 2023. [Paper]
- Financial SLMs: "Fine-tuning Smaller Language Models for Question Answering over Financial Documents". Karmvir Singh Phogat Karmvir Singh Phogat, Sai Akhil Puranam, Sridhar Dasaratha, Chetan Harsha, Shashishekar Ramakrishna. 2024. [Paper]
- ColBERT: "ColBERT Retrieval and Ensemble Response Scoring for Language Model Question Answering". Alex Gichamba, Tewodros Kederalah Idris, Brian Ebiyau, Eric Nyberg, Teruko Mitamura. IEEE 2024. [Paper]
- T-SAS: "Test-Time Self-Adaptive Small Language Models for Question Answering". Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Hwang, Jong Park. ACL 2023. [Paper] [Github]
- Rationale Ranking: "Answering Unseen Questions With Smaller Language Models Using Rationale Generation and Dense Retrieval". Tim Hartill, Diana Benavides-Prado, Michael Witbrock, Patricia J. Riddle. 2023. [Paper]
- Phi-3.5-mini: "Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone". Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, ..., Chunyu Wang, Guanhua Wang, Lijuan Wang et al. 2024. [Paper] [HuggingFace] [Website]
- TinyLlama: "TinyLlama: An Open-Source Small Language Model". Peiyuan Zhang, Guangtao Zeng, Tianduo Wang, Wei Lu. 2024. [Paper] [HuggingFace] [Chat Demo] [Discord]
- CodeLlama: "Code Llama: Open Foundation Models for Code". Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, ..., Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve. 2024. [Paper] [HuggingFace]
- CodeGemma: "CodeGemma: Open Code Models Based on Gemma". CodeGemma Team: Heri Zhao, Jeffrey Hui, Joshua Howland, Nam Nguyen, Siqi Zuo, Andrea Hu, Christopher A. Choquette-Choo, Jingyue Shen, Joe Kelley, Kshitij Bansal, ..., Kathy Korevec, Kelly Schaefer, Scott Huffman. 2024. [Paper] [HuggingFace]
- PromptRec: "Could Small Language Models Serve as Recommenders? Towards Data-centric Cold-start Recommendations". Xuansheng Wu, Huachi Zhou, Yucheng Shi, Wenlin Yao, Xiao Huang, Ninghao Liu. 2024. [Paper] [Github]
- SLIM: "Can Small Language Models be Good Reasoners for Sequential Recommendation?". Yuling Wang, Changxin Tian, Binbin Hu, Yanhua Yu, Ziqi Liu, Zhiqiang Zhang, Jun Zhou, Liang Pang, Xiao Wang. 2024. [Paper]
- BiLLP: "Large Language Models are Learnable Planners for Long-Term Recommendation". Wentao Shi, Xiangnan He, Yang Zhang, Chongming Gao, Xinyue Li, Jizhi Zhang, Qifan Wang, Fuli Feng. 2024. [Paper]
- ONCE: "ONCE: Boosting Content-based Recommendation with Both Open- and Closed-source Large Language Models". Qijiong Liu, Nuo Chen, Tetsuya Sakai, Xiao-Ming Wu. WSDM 2024. [Paper] [Github]
- RecLoRA: "Lifelong Personalized Low-Rank Adaptation of Large Language Models for Recommendation". Jiachen Zhu, Jianghao Lin, Xinyi Dai, Bo Chen, Rong Shan, Jieming Zhu, Ruiming Tang, Yong Yu, Weinan Zhang. 2024. [Paper]
- Content encoder: "Pre-training Tasks for Embedding-based Large-scale Retrieval". Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang, Yiming Yang, Sanjiv Kumar. ICLR 2020. [Paper]
- Poly-encoders: "Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring". Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, Jason Weston. ICLR 2020. [Paper]
- Twin-BERT: "TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval". Wenhao Lu, Jian Jiao, Ruofei Zhang. 2020. [Paper]
- H-ERNIE: "H-ERNIE: A Multi-Granularity Pre-Trained Language Model for Web Search". Xiaokai Chu, Jiashu Zhao, Lixin Zou, Dawei Yin. SIGIR 2022. [Paper]
- Ranker: "Passage Re-ranking with BERT". Rodrigo Nogueira, Kyunghyun Cho. 2019. [Paper] [Github]
- Rewriter: "Query Rewriting for Retrieval-Augmented Large Language Models". Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, Nan Duan. EMNLP2023. [Paper] [Github]
- Octopus: "Octopus: On-device language model for function calling of software APIs". Wei Chen, Zhiyuan Li, Mingyuan Ma. 2024. [Paper] [HuggingFace]
- MobileAgent: "Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration". Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang. 2024. [Paper] [Github] [HuggingFace]
- Revolutionizing Mobile Interaction: "Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT LLM on Mobile". Samuel Carreira, Tomás Marques, José Ribeiro, Carlos Grilo. 2023. [Paper]
- AutoDroid: "AutoDroid: LLM-powered Task Automation in Android". Hao Wen, Yuanchun Li, Guohong Liu, Shanhui Zhao, Tao Yu, Toby Jia-Jun Li, Shiqi Jiang, Yunhao Liu, Yaqin Zhang, Yunxin Liu. 2023. [Paper]
- On-device Agent for Text Rewriting: "Towards an On-device Agent for Text Rewriting". Yun Zhu, Yinxiao Liu, Felix Stahlberg, Shankar Kumar, Yu-hui Chen, Liangchen Luo, Lei Shu, Renjie Liu, Jindong Chen, Lei Meng. 2023. [Paper]
- EDGE-LLM: "EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting". Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reedy Bommu, Yang Katie Zhao, Yingyan Celine Lin. 2024. [Paper] [Github]
- LLM-PQ: "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization". Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Chuan Wu. 2024. [Paper] [Github]
- AWQ: "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration". Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, Song Han. MLSys 2024. [Paper] [Github]
- MobileAIBench: "MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases". Rithesh Murthy, Liangwei Yang, Juntao Tan, Tulika Manoj Awalgaonkar, Yilun Zhou, Shelby Heinecke, Sachin Desai, Jason Wu, Ran Xu, Sarah Tan, Jianguo Zhang, Zhiwei Liu, Shirley Kokane, Zuxin Liu, Ming Zhu, Huan Wang, Caiming Xiong, Silvio Savaresel. 2024. [Paper] [Github]
- MobileLLM: "MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases". Zechun Liu, Changsheng Zhao, Forrest Iandola, Chen Lai, Yuandong Tian, Igor Fedorov, Yunyang Xiong, Ernie Chang, Yangyang Shi, Raghuraman Krishnamoorthi, Liangzhen Lai, Vikas Chandra. ICML 2024. [Paper] [Github] [HuggingFace]
- EdgeMoE: "EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models". Rongjie Yi, Liwei Guo, Shiyun Wei, Ao Zhou, Shangguang Wang, Mengwei Xu. 2023. [Paper] [Github]
- GEAR: "GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM". Hao Kang, Qingru Zhang, Souvik Kundu, Geonhwa Jeong, Zaoxing Liu, Tushar Krishna, Tuo Zhao. 2024. [Paper] [Github]
- DMC: "Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference". Piotr Nawrot, Adrian Łańcucki, Marcin Chochowski, David Tarjan, Edoardo M. Ponti. 2024. [Paper]
- Transformer-Lite: "Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs". Luchang Li, Sheng Qian, Jie Lu, Lunxi Yuan, Rui Wang, Qin Xie. 2024. [Paper]
- LLMaaS: "LLM as a System Service on Mobile Devices". Wangsong Yin, Mengwei Xu, Yuanchun Li, Xuanzhe Liu. 2024. [Paper]
- EdgeMoE: "EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models". Rongjie Yi, Liwei Guo, Shiyun Wei, Ao Zhou, Shangguang Wang, Mengwei Xu. 2023. [Paper] [Github]
- LLMCad: "LLMCad: Fast and Scalable On-device Large Language Model Inference". Daliang Xu, Wangsong Yin, Xin Jin, Ying Zhang, Shiyun Wei, Mengwei Xu, Xuanzhe Liu. 2023. [Paper]
- LinguaLinked: "LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices". Junchen Zhao, Yurun Song, Simeng Liu, Ian G. Harris, Sangeetha Abdu Jyothi. 2023 [Paper]
- Calibrating Large Language Models Using Their Generations Only. Dennis Ulmer, Martin Gubri, Hwaran Lee, Sangdoo Yun, Seong Joon Oh. ACL 2024 Long, [pdf] [code]
- Pareto Optimal Learning for Estimating Large Language Model Errors. Theodore Zhao, Mu Wei, J. Samuel Preston, Hoifung Poon. ACL 2024 Long, [pdf]
- The Internal State of an LLM Knows When It’s Lying. Amos Azaria, Tom Mitchell. EMNLP 2023 Findings. [pdf]
- Small agent can also rock! empowering small language models as hallucination detector. Xiaoxue Cheng, Junyi Li, Wayne Xin Zhao, Hongzhi Zhang, Fuzheng Zhang, Di Zhang, Kun Gai, Ji-Rong Wen. EMNLP 2024 Long. [pdf]
- Reconfidencing llms from the grouping loss perspective. Lihu Chen, Alexandre Perez-Lebel, Fabian M. Suchanek, Gaël Varoquaux. EMNLP 2024 Findings. [pdf]
- Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs. Jiejun Tan, Zhicheng Dou, Yutao Zhu, Peidong Guo, Kun Fang, Ji-Rong Wen. ACL 2024 Long. [pdf] [code] [huggingface]
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh Hajishirzi. ICLR 2024 Oral. [pdf] [huggingface] [code] [website] [model] [data]
- LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression. Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu. ICLR 2024 Workshop ME-FoMo Poster. [pdf]
- Corrective Retrieval Augmented Generation. Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, Zhen-Hua Ling. arXiv 2024.1. [pdf] [code]
- Self-Knowledge Guided Retrieval Augmentation for Large Language Models. Yile Wang, Peng Li, Maosong Sun, Yang Liu. EMNLP 2023 Findings. [pdf] [code]
- In-Context Retrieval-Augmented Language Models. Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, Yoav Shoham. TACL 2023. [pdf] [code]
- RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback. Liu, Yanming and Peng, Xinyue and Zhang, Xuhong and Liu, Weihao and Yin, Jianwei and Cao, Jiannan and Du, Tianyu. ACL 2024 Findings. [pdf]
- Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop {KGQA}. Wenyu Huang, Guancheng Zhou, Hongru Wang, Pavlos Vougiouklis, Mirella Lapata, Jeff Z. Pan. EMNLP 2024 Findings. [pdf]
- Canwen Xu, Yichong Xu, Shuohang Wang, Yang Liu, Chenguang Zhu, and Julian McAuley. Small models are valuable plug-ins for large language models. ACL 2024 Findings. [pdf]
- Linyi Yang, Shuibai Zhang, Zhuohao Yu, Guangsheng Bao, Yidong Wang, Jindong Wang, Ruochen Xu, Wei Ye, Xing Xie, Weizhu Chen, and Yue Zhang. Supervised Knowledge Makes Large Language Models Better In-context Learners. ICLR 2024 Poster. [pdf]
- Zhuofeng Wu, He Bai, Aonan Zhang, Jiatao Gu, VG Vydiswaran, Navdeep Jaitly, and Yizhe Zhang. Divide-or-Conquer? Which Part Should You Distill Your LLM? EMNLP 2024 Findings. [pdf]
- Tianlin Li, Qian Liu, Tianyu Pang, Chao Du, Qing Guo, Yang Liu, and Min Lin. Purifying large language models by ensembling a small language model. arXiv 2024. [pdf]
- Yiming Zhang, Nicholas Carlini, and Daphne Ippolito. Effective Prompt Extraction from Language Models. COLM 2024 [pdf]
- Zeyang Sha and Yang Zhang. Prompt stealing attacks against large language models. arXiv (2024). [pdf]
- Collin Zhang, John X Morris, and Vitaly Shmatikov. Extracting Prompts by Inverting LLM Outputs. [pdf]
- Eric Mitchell, Rafael Rafailov, Archit Sharma, Chelsea Finn, and Christopher D Manning. 2024. An Emulator for Fine-tuning Large Language Models using Small Language Models. ICLR 2024. [pdf]
- Alisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, and Noah A Smith. 2024. Tuning language models by proxy. COLM 2024. [pdf]
- Dheeraj Mekala, Alex Nguyen, and Jingbo Shang. 2024. Smaller language models are capable of selecting instruction-tuning training data for larger language models. ACL 2024 Findings. [pdf]
- Yongheng Deng, Ziqing Qiao, Ju Ren, Yang Liu, and Yaoxue Zhang. 2023. Mutual enhancement of large and small language models with cross-silo knowledge transfer. arXiv 2023. [pdf]
- SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models. Yu Yang · Siddhartha Mishra · Jeffrey Chiang · Baharan Mirzasoleiman. NIPS 2024 Poster. [pdf]
- Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models. Zhanhui Zhou · Zhixuan Liu · Jie Liu · Zhichen Dong · Chao Yang · Yu Qiao. NIPS 2024 Poster. [pdf]
- Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations. Meta arXiv 2024 [pdf]
- SLM as Guardian: Pioneering AI Safety with Small Language Model. Ohjoon Kwon, Donghyeon Jeon, Nayoung Choi, Gyu-Hwung Cho, Hwiyeol Jo, Changbong Kim, Hyunwoo Lee, Inho Kang, Sun Kim, Taiwoo Park. EMNLP 2024. [pdf]
- Kun Zhao, Bohao Yang, Chen Tang, Chenghua Lin, and Liang Zhan. 2024. SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation. ACL 2024 Findings. [pdf]
- Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. Lorenz Kuhn, Yarin Gal, Sebastian Farquhar. ICLR 2023. [pdf]
- Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. Potsawee Manakul, Adian Liusie, Mark Gales. EMNLP 2023 Main. [pdf]
- Proxylm: Predicting language model performance on multilingual tasks via proxy models. David Anugraha, Genta Indra Winata, Chenyue Li, Patrick Amadeus Irawan, En-Shiun Annie Lee. arXiv 2024. [pdf]
- Factscore: Fine-grained atomic evaluation of factual precision in long-form text generation. Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Koh, Mohit Iyyer, Luke Zettlemoyer, Hannaneh Hajishirzi. EMNLP 2023 Main. [pdf]
- Look before you leap: An exploratory study of uncertainty measurement for large language models. Yuheng Huang, Jiayang Song, Zhijie Wang, Shengming Zhao, Huaming Chen, Felix Juefei-Xu, Lei Ma arXiv 2023. [pdf]
- CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following. Kaiyan Zhang, Jianyu Wang, Ermo Hua, Biqing Qi, Ning Ding, Bowen Zhou. arXiv 2024.6.. [pdf]
- When Large Language Model Agents Meet 6G Networks: Perception, Grounding, and Alignment. Minrui Xu; Dusit Niyato; Jiawen Kang; Zehui Xiong; Shiwen Mao; Zhu Han. IEEE Wireless Communications, 2024. [pdf]
- Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding. Benjamin Bergner, Andrii Skliar, Amelie Royer, Tijmen Blankevoort, Yuki Asano, Babak Ehteshami Bejnordi. arXiv, 2024.7. [pdf]
- Synergy-of-Thoughts: Eliciting Efficient Reasoning in Hybrid Language Models. Yu Shang, Yu Li, Fengli Xu, Yong Li. arXiv, 2024.8. [pdf]
- Hybrid SLM and LLM for Edge-Cloud Collaborative Inference. Zixu Hao, Huiqiang Jiang, Shiqi Jiang, Ju Ren, Ting Cao. EdgeFM 2024. [pdf]
- LLMCad: Fast and Scalable On-device Large Language Model Inference. Daliang Xu, Wangsong Yin, Xin Jin, Ying Zhang, Shiyun Wei, Mengwei Xu, Xuanzhe Liu. arXiv 2023.9. [pdf]
- DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, Christopher Potts. arXiv 2023.10. [pdf]
- Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples!. Yubo Ma, Yixin Cao, YongChing Hong, Aixin Sun. arXiv 2023.10. [pdf]
- Mutual Enhancement of Large and Small Language Models with Cross-Silo Knowledge Transfer. Yongheng Deng, Ziqing Qiao, Ju Ren, Yang Liu, Yaoxue Zhang. arXiv 2023.12. [pdf]
- Small LLMs Are Weak Tool Learners: A Multi-LLM Agent. Weizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, Fei Huang. EMNLP 2024 Main. [pdf]
- Synergizing Large Language Models and Pre-Trained Smaller Models for Conversational Intent Discovery. Jinggui Liang, Lizi Liao, Hao Fei, Jing Jiang. ACL 2024 Findings. [pdf]
- Improving Large Models with Small Models: Lower Costs and Better Performance. Dong Chen, Shuo Zhang, Yueting Zhuang, Siliang Tang, Qidong Liu, Hua Wang, Mingliang Xu. arXiv 2024.6. [pdf]
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for SLMs-Survey
Similar Open Source Tools

SLMs-Survey
SLMs-Survey is a comprehensive repository that includes papers and surveys on small language models. It covers topics such as technology, on-device applications, efficiency, enhancements for LLMs, and trustworthiness. The repository provides a detailed overview of existing SLMs, their architecture, enhancements, and specific applications in various domains. It also includes information on SLM deployment optimization techniques and the synergy between SLMs and LLMs.

awesome-large-audio-models
This repository is a curated list of awesome large AI models in audio signal processing, focusing on the application of large language models to audio tasks. It includes survey papers, popular large audio models, automatic speech recognition, neural speech synthesis, speech translation, other speech applications, large audio models in music, and audio datasets. The repository aims to provide a comprehensive overview of recent advancements and challenges in applying large language models to audio signal processing, showcasing the efficacy of transformer-based architectures in various audio tasks.

Odyssey
Odyssey is a framework designed to empower agents with open-world skills in Minecraft. It provides an interactive agent with a skill library, a fine-tuned LLaMA-3 model, and an open-world benchmark for evaluating agent capabilities. The framework enables agents to explore diverse gameplay opportunities in the vast Minecraft world by offering primitive and compositional skills, extensive training data, and various long-term planning tasks. Odyssey aims to advance research on autonomous agent solutions by providing datasets, model weights, and code for public use.

LLM-Tool-Survey
This repository contains a collection of papers related to tool learning with large language models (LLMs). The papers are organized according to the survey paper 'Tool Learning with Large Language Models: A Survey'. The survey focuses on the benefits and implementation of tool learning with LLMs, covering aspects such as task planning, tool selection, tool calling, response generation, benchmarks, evaluation, challenges, and future directions in the field. It aims to provide a comprehensive understanding of tool learning with LLMs and inspire further exploration in this emerging area.

Efficient-Multimodal-LLMs-Survey
Efficient Multimodal Large Language Models: A Survey provides a comprehensive review of efficient and lightweight Multimodal Large Language Models (MLLMs), focusing on model size reduction and cost efficiency for edge computing scenarios. The survey covers the timeline of efficient MLLMs, research on efficient structures and strategies, and their applications, while also discussing current limitations and future directions.

Efficient-Multimodal-LLMs-Survey
Efficient Multimodal Large Language Models: A Survey provides a comprehensive review of efficient and lightweight Multimodal Large Language Models (MLLMs), focusing on model size reduction and cost efficiency for edge computing scenarios. The survey covers the timeline of efficient MLLMs, research on efficient structures and strategies, and applications. It discusses current limitations and future directions in efficient MLLM research.

PyTorch-Tutorial-2nd
The second edition of "PyTorch Practical Tutorial" was completed after 5 years, 4 years, and 2 years. On the basis of the essence of the first edition, rich and detailed deep learning application cases and reasoning deployment frameworks have been added, so that this book can more systematically cover the knowledge involved in deep learning engineers. As the development of artificial intelligence technology continues to emerge, the second edition of "PyTorch Practical Tutorial" is not the end, but the beginning, opening up new technologies, new fields, and new chapters. I hope to continue learning and making progress in artificial intelligence technology with you in the future.

DAMO-ConvAI
DAMO-ConvAI is the official repository for Alibaba DAMO Conversational AI. It contains the codebase for various conversational AI models and tools developed by Alibaba Research. These models and tools cover a wide range of tasks, including natural language understanding, natural language generation, dialogue management, and knowledge graph construction. DAMO-ConvAI is released under the MIT license and is available for use by researchers and developers in the field of conversational AI.

HuatuoGPT-II
HuatuoGPT2 is an innovative domain-adapted medical large language model that excels in medical knowledge and dialogue proficiency. It showcases state-of-the-art performance in various medical benchmarks, surpassing GPT-4 in expert evaluations and fresh medical licensing exams. The open-source release includes HuatuoGPT2 models in 7B, 13B, and 34B versions, training code for one-stage adaptation, partial pre-training and fine-tuning instructions, and evaluation methods for medical response capabilities and professional pharmacist exams. The tool aims to enhance LLM capabilities in the Chinese medical field through open-source principles.

RLHF-Reward-Modeling
This repository, RLHF-Reward-Modeling, is dedicated to training reward models for DRL-based RLHF (PPO), Iterative SFT, and iterative DPO. It provides state-of-the-art performance in reward models with a base model size of up to 13B. The installation instructions involve setting up the environment and aligning the handbook. Dataset preparation requires preprocessing conversations into a standard format. The code can be run with Gemma-2b-it, and evaluation results can be obtained using provided datasets. The to-do list includes various reward models like Bradley-Terry, preference model, regression-based reward model, and multi-objective reward model. The repository is part of iterative rejection sampling fine-tuning and iterative DPO.

amber-train
Amber is the first model in the LLM360 family, an initiative for comprehensive and fully open-sourced LLMs. It is a 7B English language model with the LLaMA architecture. The model type is a language model with the same architecture as LLaMA-7B. It is licensed under Apache 2.0. The resources available include training code, data preparation, metrics, and fully processed Amber pretraining data. The model has been trained on various datasets like Arxiv, Book, C4, Refined-Web, StarCoder, StackExchange, and Wikipedia. The hyperparameters include a total of 6.7B parameters, hidden size of 4096, intermediate size of 11008, 32 attention heads, 32 hidden layers, RMSNorm ε of 1e^-6, max sequence length of 2048, and a vocabulary size of 32000.

EasyEdit
EasyEdit is a Python package for edit Large Language Models (LLM) like `GPT-J`, `Llama`, `GPT-NEO`, `GPT2`, `T5`(support models from **1B** to **65B**), the objective of which is to alter the behavior of LLMs efficiently within a specific domain without negatively impacting performance across other inputs. It is designed to be easy to use and easy to extend.

gateway
Gateway is a tool that streamlines requests to 100+ open & closed source models with a unified API. It is production-ready with support for caching, fallbacks, retries, timeouts, load balancing, and can be edge-deployed for minimum latency. It is blazing fast with a tiny footprint, supports load balancing across multiple models, providers, and keys, ensures app resilience with fallbacks, offers automatic retries with exponential fallbacks, allows configurable request timeouts, supports multimodal routing, and can be extended with plug-in middleware. It is battle-tested over 300B tokens and enterprise-ready for enhanced security, scale, and custom deployments.

Q-Bench
Q-Bench is a benchmark for general-purpose foundation models on low-level vision, focusing on multi-modality LLMs performance. It includes three realms for low-level vision: perception, description, and assessment. The benchmark datasets LLVisionQA and LLDescribe are collected for perception and description tasks, with open submission-based evaluation. An abstract evaluation code is provided for assessment using public datasets. The tool can be used with the datasets API for single images and image pairs, allowing for automatic download and usage. Various tasks and evaluations are available for testing MLLMs on low-level vision tasks.

PIXIU
PIXIU is a project designed to support the development, fine-tuning, and evaluation of Large Language Models (LLMs) in the financial domain. It includes components like FinBen, a Financial Language Understanding and Prediction Evaluation Benchmark, FIT, a Financial Instruction Dataset, and FinMA, a Financial Large Language Model. The project provides open resources, multi-task and multi-modal financial data, and diverse financial tasks for training and evaluation. It aims to encourage open research and transparency in the financial NLP field.

IDvs.MoRec
This repository contains the source code for the SIGIR 2023 paper 'Where to Go Next for Recommender Systems? ID- vs. Modality-based Recommender Models Revisited'. It provides resources for evaluating foundation, transferable, multi-modal, and LLM recommendation models, along with datasets, pre-trained models, and training strategies for IDRec and MoRec using in-batch debiased cross-entropy loss. The repository also offers large-scale datasets, code for SASRec with in-batch debias cross-entropy loss, and information on joining the lab for research opportunities.
For similar tasks

awesome-llm-understanding-mechanism
This repository is a collection of papers focused on understanding the internal mechanism of large language models (LLM). It includes research on topics such as how LLMs handle multilingualism, learn in-context, and handle factual associations. The repository aims to provide insights into the inner workings of transformer-based language models through a curated list of papers and surveys.

Foundations-of-LLMs
Foundations-of-LLMs is a comprehensive book aimed at readers interested in large language models, providing systematic explanations of foundational knowledge and introducing cutting-edge technologies. The book covers traditional language models, evolution of large language model architectures, prompt engineering, parameter-efficient fine-tuning, model editing, and retrieval-enhanced generation. Each chapter uses an animal as a theme to explain specific technologies, enhancing readability. The content is based on the author team's exploration and understanding of the field, with continuous monthly updates planned. The book includes a 'Paper List' for each chapter to track the latest advancements in related technologies.

SLMs-Survey
SLMs-Survey is a comprehensive repository that includes papers and surveys on small language models. It covers topics such as technology, on-device applications, efficiency, enhancements for LLMs, and trustworthiness. The repository provides a detailed overview of existing SLMs, their architecture, enhancements, and specific applications in various domains. It also includes information on SLM deployment optimization techniques and the synergy between SLMs and LLMs.

100days_AI
The 100 Days in AI repository provides a comprehensive roadmap for individuals to learn Artificial Intelligence over a period of 100 days. It covers topics ranging from basic programming in Python to advanced concepts in AI, including machine learning, deep learning, and specialized AI topics. The repository includes daily tasks, resources, and exercises to ensure a structured learning experience. By following this roadmap, users can gain a solid understanding of AI and be prepared to work on real-world AI projects.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.