data:image/s3,"s3://crabby-images/74c83/74c83df2ebf176f02fdd6a78b77f5efae33d2d47" alt="Xwin-LM"
Xwin-LM
Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment
Stars: 982
data:image/s3,"s3://crabby-images/7568c/7568ca952c63c8cb317fa1e1ec2836ac5027eb51" alt="screenshot"
Xwin-LM is a powerful and stable open-source tool for aligning large language models, offering various alignment technologies like supervised fine-tuning, reward models, reject sampling, and reinforcement learning from human feedback. It has achieved top rankings in benchmarks like AlpacaEval and surpassed GPT-4. The tool is continuously updated with new models and features.
README:
Step up your LLM alignment with Xwin-LM!
Xwin-LM aims to develop and open-source alignment technologies for large language models, including supervised fine-tuning (SFT), reward models (RM), reject sampling, reinforcement learning from human feedback (RLHF), etc. Our first release, built-upon on the Llama2 base models, ranked TOP-1 on AlpacaEval. Notably, it's the first to surpass GPT-4 on this benchmark. The project will be continuously updated.
- 💥 [May, 2024] The Xwin-Math-70B-V1.1 and Xwin-Math-7B-V1.1 model achieve 51.9 and 44.7 pass@1 on the MATH benchmark and 90.6 and 84.4 pass@1 on the GSM8K benchmark. These are new SoTA models based on LLaMA-2!
- 💥 [Jan, 2024] We update XwinLM-V0.3 on AlpacaEval, ranking as top-1 among open-source models.
- 💥 [Nov, 2023] The Xwin-Math-70B-V1.0 model achieves 31.8 pass@1 on the MATH benchmark and 87.0 pass@1 on the GSM8K benchmark. This performance places it first amongst all open-source models!
- 💥 [Nov, 2023] The Xwin-Math-7B-V1.0 and Xwin-Math-13B-V1.0 models achieve 66.6 and 76.2 pass@1 on the GSM8K benchmark, ranking as top-1 among all LLaMA-2 based 7B and 13B open-source models, respectively!
- 💥 [Nov, 2023] We released XwinCoder-7B, XwinCoder-13B, XwinCoder-34B. Our XwinCoder-34B reached 74.2 on HumanEval and it achieves comparable performance as GPT-3.5-turbo on 6 benchmarks.
- 💥 [Oct 12, 2023] Xwin-LM-7B-V0.2 and Xwin-LM-13B-V0.2 have been released, with improved comparison data and RL training (i.e., PPO). Their winrates v.s. GPT-4 have increased significantly, reaching 59.83% (7B model) and 70.36% (13B model) respectively. The 70B model will be released soon.
- 💥 [Sep, 2023] We released Xwin-LM-70B-V0.1, which has achieved a win-rate against Davinci-003 of 95.57% on AlpacaEval benchmark, ranking as TOP-1 on AlpacaEval. It was the FIRST model surpassing GPT-4 on AlpacaEval. Also note its winrate v.s. GPT-4 is 60.61.
- 🔍 [Sep, 2023] RLHF plays crucial role in the strong performance of Xwin-LM-V0.1 release!
- 💥 [Sep, 2023] We released Xwin-LM-13B-V0.1, which has achieved 91.76% win-rate on AlpacaEval, ranking as top-1 among all 13B models.
- 💥 [Sep, 2023] We released Xwin-LM-7B-V0.1, which has achieved 87.82% win-rate on AlpacaEval, ranking as top-1 among all 7B models.
Model | Checkpoint | Report | License |
---|---|---|---|
Xwin-LM-7B-V0.2 | 🤗 HF Link | 📃Paper Link | Llama 2 License |
Xwin-LM-13B-V0.2 | 🤗 HF Link | Llama 2 License | |
Xwin-LM-7B-V0.1 | 🤗 HF Link | Llama 2 License | |
Xwin-LM-13B-V0.1 | 🤗 HF Link | Llama 2 License | |
Xwin-LM-70B-V0.1 | 🤗 HF Link | Llama 2 License | |
Xwin-Coder-7B | 🤗 HF Link | 📃Brief introduction | Llama 2 License |
Xwin-Coder-13B | 🤗 HF Link | Llama 2 License | |
Xwin-Coder-34B | 🤗 HF Link | Llama 2 License | |
Xwin-Math-7B-V1.1 | 🤗 HF Link | 📃Paper Link | Llama 2 License |
Xwin-Math-70B-V1.1 | 🤗 HF Link | Llama 2 License | |
Xwin-Math-7B-V1.0 | 🤗 HF Link | Llama 2 License | |
Xwin-Math-13B-V1.0 | 🤗 HF Link | Llama 2 License | |
Xwin-Math-70B-V1.0 | 🤗 HF Link | Llama 2 License |
Xwin-LM performance on AlpacaEval.
The table below displays the performance of Xwin-LM on AlpacaEval, where evaluates its win-rate against Text-Davinci-003 across 805 questions. To provide a comprehensive evaluation, we present, for the first time, the win-rate against ChatGPT and GPT-4 as well. Our Xwin-LM model family establish a new state-of-the-art performance across all metrics. Notably, Xwin-LM-70B-V0.1 has eclipsed GPT-4 for the first time, achieving an impressive win-rate of 95.57% to Text-Davinci-003 and 60.61% to GPT-4.
Model | AlpacaEval (winrate %) | AlpacaEval (winrate %) | AlpacaEval (winrate %) |
---|---|---|---|
v.s. Text-Davinci-003 | v.s. ChatGPT | v.s. GPT4 | |
Xwin-LM-70B-V0.1 | 95.57 | 87.50 | 60.61 |
GPT-4 | 95.28 | 84.66 | 50.00 |
Xwin-LM-13B-V0.2 | 93.22 | 87.14 | 70.36 |
WizardLM-70B-V1.0 | 92.91 | 80.19 | 46.70 |
Llama-2-70B-Chat | 92.66 | 81.97 | 51.19 |
Xwin-LM-13B-V0.1 | 91.76 | 81.79 | 55.30 |
ChatGPT | 89.37 | 50.00 | 16.60 |
Xwin-LM-7B-V0.2 | 89.31 | 79.60 | 59.83 |
WizardLM-13B-V1.2 | 89.17 | 75.75 | 41.91 |
Xwin-LM-7B-V0.1 | 87.35 | 76.40 | 47.57 |
Llama-2-13B-Chat | 81.09 | 64.22 | 30.92 |
Case study on AlpacaEval
Prompts | GPT4 response | Xwin-LM response |
---|---|---|
I've put a plate on top of an apple, and take the plate from kitchen to dinning room. Where is the apple now | The apple is now in the dining room, as it was underneath the plate when you moved the plate from the kitchen to the dining room. | The apple is still in the kitchen, as you have only moved the plate. The apple is under the plate, so if you want to bring the apple to the dining room, you should either move the apple directly or put the apple on the plate and then move the plate to the dining room. |
Answer the following question. How do you say "good evening" in French. |
In French, "good evening" is said as "bonsoir." | In French, "good evening" is translated to "bonsoir." This greeting is used in the evening and nighttime hours, typically after 6 p.m. It is a combination of "bon," which means "good," and "soir," which means "evening." To pronounce it, say "bone-swahr." |
The following table provides a comparison of Xwin-LMs with other LLMs on NLP foundation tasks in Open LLM Leaderboard.
Model | MMLU 5-shot | ARC 25-shot | TruthfulQA 0-shot | HellaSwag 10-shot | Average |
---|---|---|---|---|---|
Text-davinci-003 | 56.9 | 85.2 | 59.3 | 82.2 | 70.9 |
Vicuna-13b 1.1 | 51.3 | 53.0 | 51.8 | 80.1 | 59.1 |
Guanaco 30B | 57.6 | 63.7 | 50.7 | 85.1 | 64.3 |
WizardLM-7B 1.0 | 42.7 | 51.6 | 44.7 | 77.7 | 54.2 |
WizardLM-13B 1.0 | 52.3 | 57.2 | 50.5 | 81.0 | 60.2 |
WizardLM-30B 1.0 | 58.8 | 62.5 | 52.4 | 83.3 | 64.2 |
Llama-2-7B-Chat | 48.3 | 52.9 | 45.6 | 78.6 | 56.4 |
Llama-2-13B-Chat | 54.6 | 59.0 | 44.1 | 81.9 | 59.9 |
Llama-2-70B-Chat | 63.9 | 64.6 | 52.8 | 85.9 | 66.8 |
Xwin-LM-7B-V0.1 | 49.7 | 56.2 | 48.1 | 79.5 | 58.4 |
Xwin-LM-13B-V0.1 | 56.6 | 62.4 | 45.5 | 83.0 | 61.9 |
Xwin-LM-70B-V0.1 | 69.6 | 70.5 | 60.1 | 87.1 | 71.8 |
Xwin-LM-7B-V0.2 | 50.0 | 56.4 | 49.5 | 78.9 | 58.7 |
Xwin-LM-13B-V0.2 | 56.6 | 61.5 | 43.8 | 82.9 | 61.2 |
To obtain desired results, please strictly follow the conversation templates when utilizing our model for inference. Our model adopts the prompt format established by Vicuna and is equipped to support multi-turn conversations.
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Hi! ASSISTANT: Hello.</s>USER: Who are you? ASSISTANT: I am Xwin-LM.</s>......
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Xwin-LM/Xwin-LM-7B-V0.1")
tokenizer = AutoTokenizer.from_pretrained("Xwin-LM/Xwin-LM-7B-V0.1")
(
prompt := "A chat between a curious user and an artificial intelligence assistant. "
"The assistant gives helpful, detailed, and polite answers to the user's questions. "
"USER: Hello, can you help me? "
"ASSISTANT:"
)
inputs = tokenizer(prompt, return_tensors="pt")
samples = model.generate(**inputs, max_new_tokens=4096, temperature=0.7)
output = tokenizer.decode(samples[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(output)
# Of course! I'm here to help. Please feel free to ask your question or describe the issue you're having, and I'll do my best to assist you.
Because Xwin-LM is based on Llama2, it also offers support for rapid inference using vLLM. Please refer to vLLM for detailed installation instructions.
from vllm import LLM, SamplingParams
(
prompt := "A chat between a curious user and an artificial intelligence assistant. "
"The assistant gives helpful, detailed, and polite answers to the user's questions. "
"USER: Hello, can you help me? "
"ASSISTANT:"
)
sampling_params = SamplingParams(temperature=0.7, max_tokens=4096)
llm = LLM(model="Xwin-LM/Xwin-LM-7B-V0.1")
outputs = llm.generate([prompt,], sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(generated_text)
- [ ] Release the source code
- [ ] Release more capabilities, such as math, reasoning, and etc.
Please consider citing our work if you use the data or code in this repo.
@software{xwin-lm,
title = {Xwin-LM},
author = {Xwin-LM Team},
url = {https://github.com/Xwin-LM/Xwin-LM},
version = {pre-release},
year = {2023},
month = {9},
}
Thanks to Llama 2, FastChat, AlpacaFarm, and vLLM.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Xwin-LM
Similar Open Source Tools
data:image/s3,"s3://crabby-images/7568c/7568ca952c63c8cb317fa1e1ec2836ac5027eb51" alt="Xwin-LM Screenshot"
Xwin-LM
Xwin-LM is a powerful and stable open-source tool for aligning large language models, offering various alignment technologies like supervised fine-tuning, reward models, reject sampling, and reinforcement learning from human feedback. It has achieved top rankings in benchmarks like AlpacaEval and surpassed GPT-4. The tool is continuously updated with new models and features.
data:image/s3,"s3://crabby-images/0a722/0a72221d3b1ff0843a63ab5f8b068b69fa0d634f" alt="LlamaV-o1 Screenshot"
LlamaV-o1
LlamaV-o1 is a Large Multimodal Model designed for spontaneous reasoning tasks. It outperforms various existing models on multimodal reasoning benchmarks. The project includes a Step-by-Step Visual Reasoning Benchmark, a novel evaluation metric, and a combined Multi-Step Curriculum Learning and Beam Search Approach. The model achieves superior performance in complex multi-step visual reasoning tasks in terms of accuracy and efficiency.
data:image/s3,"s3://crabby-images/c228b/c228b1d4741abb8f1dd39cd4bc883ca2709e18a9" alt="Foundations-of-LLMs Screenshot"
Foundations-of-LLMs
Foundations-of-LLMs is a comprehensive book aimed at readers interested in large language models, providing systematic explanations of foundational knowledge and introducing cutting-edge technologies. The book covers traditional language models, evolution of large language model architectures, prompt engineering, parameter-efficient fine-tuning, model editing, and retrieval-enhanced generation. Each chapter uses an animal as a theme to explain specific technologies, enhancing readability. The content is based on the author team's exploration and understanding of the field, with continuous monthly updates planned. The book includes a 'Paper List' for each chapter to track the latest advancements in related technologies.
data:image/s3,"s3://crabby-images/4c2eb/4c2eba658c3b6dd32e562c8f1b69ff40f4b6165b" alt="HuatuoGPT-II Screenshot"
HuatuoGPT-II
HuatuoGPT2 is an innovative domain-adapted medical large language model that excels in medical knowledge and dialogue proficiency. It showcases state-of-the-art performance in various medical benchmarks, surpassing GPT-4 in expert evaluations and fresh medical licensing exams. The open-source release includes HuatuoGPT2 models in 7B, 13B, and 34B versions, training code for one-stage adaptation, partial pre-training and fine-tuning instructions, and evaluation methods for medical response capabilities and professional pharmacist exams. The tool aims to enhance LLM capabilities in the Chinese medical field through open-source principles.
data:image/s3,"s3://crabby-images/937e4/937e4eae999215ec259800997b17651ae2772cb9" alt="speechless Screenshot"
speechless
Speechless.AI is committed to integrating the superior language processing and deep reasoning capabilities of large language models into practical business applications. By enhancing the model's language understanding, knowledge accumulation, and text creation abilities, and introducing long-term memory, external tool integration, and local deployment, our aim is to establish an intelligent collaborative partner that can independently interact, continuously evolve, and closely align with various business scenarios.
data:image/s3,"s3://crabby-images/c7f75/c7f7547b2a8ace193aa85ed18cd4c3a9504c5fbc" alt="sktime Screenshot"
sktime
sktime is a Python library for time series analysis that provides a unified interface for various time series learning tasks such as classification, regression, clustering, annotation, and forecasting. It offers time series algorithms and tools compatible with scikit-learn for building, tuning, and validating time series models. sktime aims to enhance the interoperability and usability of the time series analysis ecosystem by empowering users to apply algorithms across different tasks and providing interfaces to related libraries like scikit-learn, statsmodels, tsfresh, PyOD, and fbprophet.
data:image/s3,"s3://crabby-images/a2df1/a2df1e399445b4a6942d1b6f66137d0bd4852af1" alt="IDvs.MoRec Screenshot"
IDvs.MoRec
This repository contains the source code for the SIGIR 2023 paper 'Where to Go Next for Recommender Systems? ID- vs. Modality-based Recommender Models Revisited'. It provides resources for evaluating foundation, transferable, multi-modal, and LLM recommendation models, along with datasets, pre-trained models, and training strategies for IDRec and MoRec using in-batch debiased cross-entropy loss. The repository also offers large-scale datasets, code for SASRec with in-batch debias cross-entropy loss, and information on joining the lab for research opportunities.
data:image/s3,"s3://crabby-images/07078/07078c49569cc16634395067f91f7cae59ad6e38" alt="spark-nlp Screenshot"
spark-nlp
Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. It provides simple, performant, and accurate NLP annotations for machine learning pipelines that scale easily in a distributed environment. Spark NLP comes with 36000+ pretrained pipelines and models in more than 200+ languages. It offers tasks such as Tokenization, Word Segmentation, Part-of-Speech Tagging, Named Entity Recognition, Dependency Parsing, Spell Checking, Text Classification, Sentiment Analysis, Token Classification, Machine Translation, Summarization, Question Answering, Table Question Answering, Text Generation, Image Classification, Image to Text (captioning), Automatic Speech Recognition, Zero-Shot Learning, and many more NLP tasks. Spark NLP is the only open-source NLP library in production that offers state-of-the-art transformers such as BERT, CamemBERT, ALBERT, ELECTRA, XLNet, DistilBERT, RoBERTa, DeBERTa, XLM-RoBERTa, Longformer, ELMO, Universal Sentence Encoder, Llama-2, M2M100, BART, Instructor, E5, Google T5, MarianMT, OpenAI GPT2, Vision Transformers (ViT), OpenAI Whisper, and many more not only to Python and R, but also to JVM ecosystem (Java, Scala, and Kotlin) at scale by extending Apache Spark natively.
data:image/s3,"s3://crabby-images/174e5/174e57da5af580b62df3b7b2c04bc16958339cc3" alt="Step-DPO Screenshot"
Step-DPO
Step-DPO is a method for enhancing long-chain reasoning ability of LLMs with a data construction pipeline creating a high-quality dataset. It significantly improves performance on math and GSM8K tasks with minimal data and training steps. The tool fine-tunes pre-trained models like Qwen2-7B-Instruct with Step-DPO, achieving superior results compared to other models. It provides scripts for training, evaluation, and deployment, along with examples and acknowledgements.
data:image/s3,"s3://crabby-images/ded60/ded6060d5547f51285729b5f7b9dc334c17e3829" alt="EasyEdit Screenshot"
EasyEdit
EasyEdit is a Python package for edit Large Language Models (LLM) like `GPT-J`, `Llama`, `GPT-NEO`, `GPT2`, `T5`(support models from **1B** to **65B**), the objective of which is to alter the behavior of LLMs efficiently within a specific domain without negatively impacting performance across other inputs. It is designed to be easy to use and easy to extend.
data:image/s3,"s3://crabby-images/3eb1e/3eb1e76c30fc24b483464e183b27886d24c0840a" alt="qserve Screenshot"
qserve
QServe is a serving system designed for efficient and accurate Large Language Models (LLM) on GPUs with W4A8KV4 quantization. It achieves higher throughput compared to leading industry solutions, allowing users to achieve A100-level throughput on cheaper L40S GPUs. The system introduces the QoQ quantization algorithm with 4-bit weight, 8-bit activation, and 4-bit KV cache, addressing runtime overhead challenges. QServe improves serving throughput for various LLM models by implementing compute-aware weight reordering, register-level parallelism, and fused attention memory-bound techniques.
data:image/s3,"s3://crabby-images/499a7/499a767799cdea18536258501f87d262e12c6cc6" alt="skpro Screenshot"
skpro
skpro is a library for supervised probabilistic prediction in python. It provides `scikit-learn`-like, `scikit-base` compatible interfaces to: * tabular **supervised regressors for probabilistic prediction** \- interval, quantile and distribution predictions * tabular **probabilistic time-to-event and survival prediction** \- instance-individual survival distributions * **metrics to evaluate probabilistic predictions** , e.g., pinball loss, empirical coverage, CRPS, survival losses * **reductions** to turn `scikit-learn` regressors into probabilistic `skpro` regressors, such as bootstrap or conformal * building **pipelines and composite models** , including tuning via probabilistic performance metrics * symbolic **probability distributions** with value domain of `pandas.DataFrame`-s and `pandas`-like interface
data:image/s3,"s3://crabby-images/5ff87/5ff8780582945460153f74daa6fdf89c7318c760" alt="gateway Screenshot"
gateway
Gateway is a tool that streamlines requests to 100+ open & closed source models with a unified API. It is production-ready with support for caching, fallbacks, retries, timeouts, load balancing, and can be edge-deployed for minimum latency. It is blazing fast with a tiny footprint, supports load balancing across multiple models, providers, and keys, ensures app resilience with fallbacks, offers automatic retries with exponential fallbacks, allows configurable request timeouts, supports multimodal routing, and can be extended with plug-in middleware. It is battle-tested over 300B tokens and enterprise-ready for enhanced security, scale, and custom deployments.
data:image/s3,"s3://crabby-images/c8cd0/c8cd059c103da48127900c5648d88129716a7d80" alt="EVE Screenshot"
EVE
EVE is an official PyTorch implementation of Unveiling Encoder-Free Vision-Language Models. The project aims to explore the removal of vision encoders from Vision-Language Models (VLMs) and transfer LLMs to encoder-free VLMs efficiently. It also focuses on bridging the performance gap between encoder-free and encoder-based VLMs. EVE offers a superior capability with arbitrary image aspect ratio, data efficiency by utilizing publicly available data for pre-training, and training efficiency with a transparent and practical strategy for developing a pure decoder-only architecture across modalities.
data:image/s3,"s3://crabby-images/6e26f/6e26fbe311a5e89fb3a97867fd7f8625a8a62770" alt="eko Screenshot"
eko
Eko is a lightweight and flexible command-line tool for managing environment variables in your projects. It allows you to easily set, get, and delete environment variables for different environments, making it simple to manage configurations across development, staging, and production environments. With Eko, you can streamline your workflow and ensure consistency in your application settings without the need for complex setup or configuration files.
For similar tasks
data:image/s3,"s3://crabby-images/72cbe/72cbefe2ea87d1453baa29c8ccb50919a1c25c37" alt="alignment-handbook Screenshot"
alignment-handbook
The Alignment Handbook provides robust training recipes for continuing pretraining and aligning language models with human and AI preferences. It includes techniques such as continued pretraining, supervised fine-tuning, reward modeling, rejection sampling, and direct preference optimization (DPO). The handbook aims to fill the gap in public resources on training these models, collecting data, and measuring metrics for optimal downstream performance.
data:image/s3,"s3://crabby-images/7568c/7568ca952c63c8cb317fa1e1ec2836ac5027eb51" alt="Xwin-LM Screenshot"
Xwin-LM
Xwin-LM is a powerful and stable open-source tool for aligning large language models, offering various alignment technologies like supervised fine-tuning, reward models, reject sampling, and reinforcement learning from human feedback. It has achieved top rankings in benchmarks like AlpacaEval and surpassed GPT-4. The tool is continuously updated with new models and features.
data:image/s3,"s3://crabby-images/84c3f/84c3f27264ba45a2ac344501d0e89f82cc1d8301" alt="Awesome-LLM-Preference-Learning Screenshot"
Awesome-LLM-Preference-Learning
The repository 'Awesome-LLM-Preference-Learning' is the official repository of a survey paper titled 'Towards a Unified View of Preference Learning for Large Language Models: A Survey'. It contains a curated list of papers related to preference learning for Large Language Models (LLMs). The repository covers various aspects of preference learning, including on-policy and off-policy methods, feedback mechanisms, reward models, algorithms, evaluation techniques, and more. The papers included in the repository explore different approaches to aligning LLMs with human preferences, improving mathematical reasoning in LLMs, enhancing code generation, and optimizing language model performance.
data:image/s3,"s3://crabby-images/60359/603595c86f3871a2e75cf6e6f93a5d6c42d20cdc" alt="LLM-Synthetic-Data Screenshot"
LLM-Synthetic-Data
LLM-Synthetic-Data is a repository focused on real-time, fine-grained LLM-Synthetic-Data generation. It includes methods, surveys, and application areas related to synthetic data for language models. The repository covers topics like pre-training, instruction tuning, model collapse, LLM benchmarking, evaluation, and distillation. It also explores application areas such as mathematical reasoning, code generation, text-to-SQL, alignment, reward modeling, long context, weak-to-strong generalization, agent and tool use, vision and language, factuality, federated learning, generative design, and safety.
data:image/s3,"s3://crabby-images/7e473/7e473e39deefaffb5eb9794132966880dc973192" alt="LLM-Tool-Survey Screenshot"
LLM-Tool-Survey
This repository contains a collection of papers related to tool learning with large language models (LLMs). The papers are organized according to the survey paper 'Tool Learning with Large Language Models: A Survey'. The survey focuses on the benefits and implementation of tool learning with LLMs, covering aspects such as task planning, tool selection, tool calling, response generation, benchmarks, evaluation, challenges, and future directions in the field. It aims to provide a comprehensive understanding of tool learning with LLMs and inspire further exploration in this emerging area.
data:image/s3,"s3://crabby-images/1fea4/1fea484e6bc793959e9c121488e832a4b3ed3d0b" alt="Awesome-AI-Agents Screenshot"
Awesome-AI-Agents
Awesome-AI-Agents is a curated list of projects, frameworks, benchmarks, platforms, and related resources focused on autonomous AI agents powered by Large Language Models (LLMs). The repository showcases a wide range of applications, multi-agent task solver projects, agent society simulations, and advanced components for building and customizing AI agents. It also includes frameworks for orchestrating role-playing, evaluating LLM-as-Agent performance, and connecting LLMs with real-world applications through platforms and APIs. Additionally, the repository features surveys, paper lists, and blogs related to LLM-based autonomous agents, making it a valuable resource for researchers, developers, and enthusiasts in the field of AI.
data:image/s3,"s3://crabby-images/11582/115828be95cb79a53d72a7b956c2dfac57befb6c" alt="ReST-MCTS Screenshot"
ReST-MCTS
ReST-MCTS is a reinforced self-training approach that integrates process reward guidance with tree search MCTS to collect higher-quality reasoning traces and per-step value for training policy and reward models. It eliminates the need for manual per-step annotation by estimating the probability of steps leading to correct answers. The inferred rewards refine the process reward model and aid in selecting high-quality traces for policy model self-training.
data:image/s3,"s3://crabby-images/4f76a/4f76af3a238be74b2468dfbd214c58d093803555" alt="LongCite Screenshot"
LongCite
LongCite is a tool that enables Large Language Models (LLMs) to generate fine-grained citations in long-context Question Answering (QA) scenarios. It provides models trained on GLM-4-9B and Meta-Llama-3.1-8B, supporting up to 128K context. Users can deploy LongCite chatbots, generate accurate responses, and obtain precise sentence-level citations. The tool includes components for model deployment, Coarse to Fine (CoF) pipeline for data construction, model training using LongCite-45k dataset, evaluation with LongBench-Cite benchmark, and citation generation.
For similar jobs
data:image/s3,"s3://crabby-images/7a828/7a828889d979cbf4be5a04454f679734bb36585f" alt="sweep Screenshot"
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
data:image/s3,"s3://crabby-images/cac11/cac1100b7e92d3c9c9529eacfe5a6e8d943d8f57" alt="teams-ai Screenshot"
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
data:image/s3,"s3://crabby-images/10f6b/10f6b939c21eecaacb4aeb678159f5a587a20256" alt="ai-guide Screenshot"
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
data:image/s3,"s3://crabby-images/8b8c3/8b8c30180bcfba25fde40a102b6ae98fd35704b8" alt="classifai Screenshot"
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
data:image/s3,"s3://crabby-images/c6b52/c6b52a0438e707c19f9dcb358608627496141f31" alt="chatbot-ui Screenshot"
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
data:image/s3,"s3://crabby-images/2fa15/2fa15d62e208bea0a119405a82ad37a6b24564c0" alt="BricksLLM Screenshot"
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
data:image/s3,"s3://crabby-images/e597e/e597e24a3c2657c376591c1e0da9159b22cd2ff2" alt="uAgents Screenshot"
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
data:image/s3,"s3://crabby-images/8ab69/8ab692a869eef895ffca840dda9b43d13f3cf958" alt="griptape Screenshot"
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.