
AReaL
Distributed RL System for LLM Reasoning
Stars: 84

AReaL (Ant Reasoning RL) is an open-source reinforcement learning system developed at the RL Lab, Ant Research. It is designed for training Large Reasoning Models (LRMs) in a fully open and inclusive manner. AReaL provides reproducible experiments for 1.5B and 7B LRMs, showcasing its scalability and performance across diverse computational budgets. The system follows an iterative training process to enhance model performance, with a focus on mathematical reasoning tasks. AReaL is equipped to adapt to different computational resource settings, enabling users to easily configure and launch training trials. Future plans include support for advanced models, optimizations for distributed training, and exploring research topics to enhance LRMs' reasoning capabilities.
README:
AReaL (Ant Reasoning RL) is an open-source and efficient reinforcement learning system developed at the RL Lab, Ant Research. AReaL inherits and adapts the Open-Source Project ReaLHF for training Large Reasoning Models (LRMs) that everyone can reproduce and contribute to. AReaL is part of our efforts from Ant Research to develop tools and systems for a fully open and inclusive AGI world.
AReaL Highlights
- 🛠️ Open & Reproducible: We will continuously release all code, datasets, and training recipes for training LRMs --- no hidden secrects or proprietary barriers.
- 🚀 Scalable Performance: AReaL can seamlessly adapt to different computational resource settings, ranging from 1 single node to hundreds of GPUs.
- 🌍 Community-Driven AGI: With a fully open-source commitment, we hope our efforts can benefit the entire community to accelerate AGI research.
[2025/02/24] Our initial release includes reproducible experiments based on our AReaL system for 1.5B and 7B LRMs, validated across diverse computational budgets. We will show that with AReaL, users can
- reliably train a 1.5B distilled model with RL to surpass o1-Preview on math reasoning within 40 hours. 🚀
- reliably perform an R1-Zero-style experiment with a 7B model, i.e., by running RL training on Qwen2.5-7B, and observe emergent thinking tokens and continuous model improvement on math reasoning.
Our experiments are conducted on 16 nodes, each equipped with 8 H800 GPUs. The results, along with the associated training curves, are presented below.
Figure 1. The training rewards and response lengths during RL training. The base model is DeepSeek-R1-Distill-Qwen-1.5B. The curves are averaged with a window size of 25.
We follow DeepScaleR to iteratively increase the output context length. The context length starts from 8K and is increased to 16K and 24K in the subsequent training process. The training reward continuously increases during RL training. We observe that the response length first shrinks in the 8K training stage, and then increases in the 16K and 24K training stages.
These three stages progressively enhanced the performance of the base model, as demonstrated below:
MATH500 | AIME 2024 | AMC 2023 | Download | |
---|---|---|---|---|
o1-Preview | 81.4 | 40.0 | - | - |
DeepSeek-R1-Distill-Qwen-1.5B | 82.8 | 28.8 | 62.9 | - |
DeepScaleR (Official) | 87.8 | 43.1 | 73.6 | - |
AReaL Stage 1: 8K (Ours) | 85.7 | 33.2 | 74.7 | 🤗 HuggingFace |
AReaL Stage 2: 16K (Ours) | 87.4 | 34.2 | 79.6 | 🤗 HuggingFace |
AReaL Stage 3: 24K (Ours) | 88.0 | 40.2 | 81.2 | 🤗 HuggingFace |
Table 1. Evaluation on a series of competition-level mathematics benchmarks, including AIME 2024, AMC 2023, and MATH-500. The results are reported using Pass@1 accuracy, which are averaged over 32 samples for each problem and evaluated with a temperature of 0.6.
AReaL is designed with accessibility in mind, enabling users to effortlessly reproduce results and extend their research.
Please refer to the tutorial under the examples
directory for step-by-step guidelines.
- Training: The model checkpoints from different stages are available at HuggingFace (Please refer Table 1 for the link). With these intermediate checkpoints for all three stages, users can start from any stage to advance their own investigations.
# Download the dataset
DATA_PATH=/storage/datasets/
cd $DATA_PATH
wget https://huggingface.co/datasets/inclusionAI/AReaL-RL-Data/resolve/main/data/prompts_for_r1_distilled.jsonl?download=true
wget https://huggingface.co/datasets/inclusionAI/AReaL-RL-Data/resolve/main/data/id2info.json?download=true
# Training in a Ray cluster with 16 nodes
# stage 1
MODEL_PATH=${path_to_DeepSeek-R1-Distill-Qwen-1.5B}
bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH/prompts_for_r1_distilled.jsonl $DATA_PATH/id2info.json 8192
# stage 2
MODEL_PATH=${model_path_from_stage_1}
bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH/prompts_for_r1_distilled.jsonl $DATA_PATH/id2info.json 16384
# stage 3
MODEL_PATH=${model_path_from_stage_2}
bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH/prompts_for_r1_distilled.jsonl $DATA_PATH/id2info.json 24000
-
Evaluation: We generate samples using vLLM with
enforce_eager=True
, which is critical to the final performance particularly when generating sequences longer than 16K.
# Evaluation
cd evalutation
python3 eval_and_aggregate.py --model_path $MODEL_PATH --max_gen_tokens 32768
Our RL training process draws on open-source experiences from DeepSeek-R1, DeepScaleR, and ReaLHF. Here we show our RL training setup:
-
Base Model: We use
DeepSeek-R1-Distill-Qwen-1.5B
as the base model. - Dataset: The RL training dataset consists of 40k high-quality mathematical reasoning tasks released by DeepScaleR. We are also actively developing better datasets suitable for training stronger and larger models in future releases.
- Reward Design: We assign rewards of +5 for correct responses and -5 for incorrect responses. The correctness of responses is evaluated by comparing the responses with the answers using Sympy, following the evaluation protocol of Qwen.
- RL Algorithm: We adopt a simplified variant of PPO as the RL algorithm. We make a major change to PPO by eliminating the critic to reduce computation resources. We empirically observe that the training performance does not degenerate even without the critic.
- Key Hyper-parameters: We set both the discount factor $\gamma$ and the GAE parameter $\lambda$ to 1. Such practices are also adopted by the Open-Reasoner-Zero project. We also set the coefficient for the KL rewards sufficiently small for effective training.
- Iterative Context Lengthening: We follow DeepScaleR to iteratively increase the context length, which results in a three-stage training process.
- Large Batch Size: We use a larger batch size for RL training than previous works. In each RL training step, we select 1024 mathematical questions and generate 8 rollouts for each question.
AReaL can easily adapt to different computational resource settings. With more training computes, AReaL can significantly enhance the speed of RL training, leading to a notable acceleration in research progress.
We have listed detailed hardware requirements and the approach to setting up environments under different computation resources. Users can easily configure your cluster and launch your training trials following our tutorials.
Figure 2. Total time consumption for RL training under varying computational resource settings for 10 epochs.
We start RL training from the base model Qwen2.5-7B with DeepSeek-R1-Zero style training. The initial training curves and results are presented below. We conducted experiments on the data released by Open-Reasoner-Zero. As the training progresses, both the rewards and the response lengths gradually grow. A similar trend is also revealed Open-Reasoner-Zero. The simultaneous growth of average response lengths and rewards can be considered as a critical sign of the emergent deep thinking capabilities of LRM to solve challenging reasoning problems.
Figure 3. The training curve of Qwen2.5-7B-Zero during RL training
We evaluate the performances of the intermediate checkpoints on the MATH500 and AIME24 datasets in the following figure. It appears that both the accuracy and the generated length continue to show an upward trend.
Figure 4. The test accuracy and response length evaluated on the MATH500 and AIME24 datasets.
We also conduct a experiment on the DeepScalseR dataset, which demonstrates a similar training trend. The evaluation results are presented in the following table.
MATH-500 | AIME 2024 | AMC 2023 | |
---|---|---|---|
Qwen2.5-7B | 34.0 | 2.0 | 17.8 |
Open-Reasoner-Zero-7B, Step=150 | ~77 | ~15 | - |
Qwen2.5-7B-Zero, Step=150 Dataset=DeepScaleR | 75.9 | 13.9 | 56.1 |
Qwen2.5-7B-Zero, Step=150 Dataset=ORZ | 77.3 | 13.9 | 57.1 |
Qwen2.5-7B-Zero, Step=200 Dataset=ORZ | 78.0 | 14.5 | 58.7 |
Table 2. Evaluation on AIME 2024, AMC 2023, and MATH-500. The results are reported using Pass@1 accuracy, which are averaged over 32 samples for each problem and evaluated with a temperature of 1.0. |
- Training:
# Training in a Ray cluster with 16 nodes
DATA_PATH=/storage/datasets/
cd $DATA_PATH
wget https://huggingface.co/datasets/inclusionAI/AReaL-RL-Data/resolve/main/data/prompts_for_zero.jsonl?download=true
wget https://huggingface.co/datasets/inclusionAI/AReaL-RL-Data/resolve/main/data/id2info.json?download=true
MODEL_PATH=${path_to_Qwen2.5-7B}
bash ./examples/train_7B_zero_n16_on_ray.sh $MODEL_PATH $DATA_PATH/prompts_for_zero.jsonl $DATA_PATH/id2info.json 24000
- Evaluation:
# Evaluation
cd evalutation
python3 eval_and_aggregate.py --model_path $MODEL_PATH --max_gen_tokens 32768 --prompt_type orz
AReaL is under active development. We will have major releases in a weekly manner. We also highly appreciate efforts from the community as well. Here we highlight our future research and development plan.
In addition to our continuous efforts to make AReaL more stable, we also plan on the following major improvements.
- [ ] Support for the latest vLLM, sglang, and megatron-core packages.
- [ ] Support RL training with Coding problems.
- [ ] Support RL training for larger MOE models.
- [ ] Optimizations for distributed training: expert parallel and zero-bubble pipelining.
- [ ] RL for vision-language models (VLM).
- [ ] Function calling and agent capabilities.
Based on AReaL, our team is also actively exploring the research frontier of LRMs. We highlight some of the major research topics we are working on.
-
Effective Training Datasets for LRMs How do we design training tasks that enable LRMs to progressively evolve sophisticated reasoning skills via RL?
- For more advanced distilled models, such as DeepSeek-R1-Distill-Qwen-7B and DeepSeek-R1-Distill-Qwen-32B, what kind of task distribution could drive further enhancements?
- How to properly curate the distillation dataset for cold-start to achieve the best RL results?
- Boosting Reasoning with Auxiliary Training Signals. Can auxiliary signals — such as Process Reward Models (PRMs) and expert-curated labels — be adopted in RL training to refine the reasoning processes of LRMs?
- Adaptive Response Lengths: Quality Over Quantity. How can RL training schemes teach LRMs to dynamically adjust response complexity based on task difficulty? The goal is to eliminate redundant "over-reasoning" for simpler inputs while preserving rigor for challenging problems.
- Sample Efficient RL for LRMs In addition to system-level acceleration, can we further speed up the LRM training process with a more sample-efficient advanced RL algorithm? Can we get the critic back to reduce gradient variance without compromising the computation efficiency? How to better encourage exploration?
We would like to remark that major contributors are from the Institute for Interdisciplinary Information Sciences, Tsinghua University.
Our team has received invaluable assistance from the Super Computing Technology (SCT) team at Ant Group, particularly in the realm of large-scale cluster operations and maintenance. Their expertise and support have been instrumental in our efforts, and we are deeply grateful for their contributions.
We also appreciate all the pioneer works from the community, particularly the ReaLHF project from OpenPsi Inc. and those other projects, including but not limited to, DeepScaleR, Open-Reasoner-Zero, OpenRLHF, VeRL.
@article{mei2024realhf,
title={ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation},
author={Mei, Zhiyu and Fu, Wei and Li, Kaiwei and Wang, Guangju and Zhang, Huanchen and Wu, Yi},
journal={arXiv preprint arXiv:2406.14088},
year={2024}
}
@misc{areal2025,
author = {RL Lab, Ant Research},
title = {AReaL: Ant Reasoning RL},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/inclusionAI/AReaL}},
}
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for AReaL
Similar Open Source Tools

AReaL
AReaL (Ant Reasoning RL) is an open-source reinforcement learning system developed at the RL Lab, Ant Research. It is designed for training Large Reasoning Models (LRMs) in a fully open and inclusive manner. AReaL provides reproducible experiments for 1.5B and 7B LRMs, showcasing its scalability and performance across diverse computational budgets. The system follows an iterative training process to enhance model performance, with a focus on mathematical reasoning tasks. AReaL is equipped to adapt to different computational resource settings, enabling users to easily configure and launch training trials. Future plans include support for advanced models, optimizations for distributed training, and exploring research topics to enhance LRMs' reasoning capabilities.

PURE
PURE (Process-sUpervised Reinforcement lEarning) is a framework that trains a Process Reward Model (PRM) on a dataset and fine-tunes a language model to achieve state-of-the-art mathematical reasoning capabilities. It uses a novel credit assignment method to calculate return and supports multiple reward types. The final model outperforms existing methods with minimal RL data or compute resources, achieving high accuracy on various benchmarks. The tool addresses reward hacking issues and aims to enhance long-range decision-making and reasoning tasks using large language models.

Reflection_Tuning
Reflection-Tuning is a project focused on improving the quality of instruction-tuning data through a reflection-based method. It introduces Selective Reflection-Tuning, where the student model can decide whether to accept the improvements made by the teacher model. The project aims to generate high-quality instruction-response pairs by defining specific criteria for the oracle model to follow and respond to. It also evaluates the efficacy and relevance of instruction-response pairs using the r-IFD metric. The project provides code for reflection and selection processes, along with data and model weights for both V1 and V2 methods.

CogVideo
CogVideo is an open-source repository that provides pretrained text-to-video models for generating videos based on input text. It includes models like CogVideoX-2B and CogVideo, offering powerful video generation capabilities. The repository offers tools for inference, fine-tuning, and model conversion, along with demos showcasing the model's capabilities through CLI, web UI, and online experiences. CogVideo aims to facilitate the creation of high-quality videos from textual descriptions, catering to a wide range of applications.

babilong
BABILong is a generative benchmark designed to evaluate the performance of NLP models in processing long documents with distributed facts. It consists of 20 tasks that simulate interactions between characters and objects in various locations, requiring models to distinguish important information from irrelevant details. The tasks vary in complexity and reasoning aspects, with test samples potentially containing millions of tokens. The benchmark aims to challenge and assess the capabilities of Large Language Models (LLMs) in handling complex, long-context information.

model2vec
Model2Vec is a technique to turn any sentence transformer into a really small static model, reducing model size by 15x and making the models up to 500x faster, with a small drop in performance. It outperforms other static embedding models like GLoVe and BPEmb, is lightweight with only `numpy` as a major dependency, offers fast inference, dataset-free distillation, and is integrated into Sentence Transformers, txtai, and Chonkie. Model2Vec creates powerful models by passing a vocabulary through a sentence transformer model, reducing dimensionality using PCA, and weighting embeddings using zipf weighting. Users can distill their own models or use pre-trained models from the HuggingFace hub. Evaluation can be done using the provided evaluation package. Model2Vec is licensed under MIT.

Macaw-LLM
Macaw-LLM is a pioneering multi-modal language modeling tool that seamlessly integrates image, audio, video, and text data. It builds upon CLIP, Whisper, and LLaMA models to process and analyze multi-modal information effectively. The tool boasts features like simple and fast alignment, one-stage instruction fine-tuning, and a new multi-modal instruction dataset. It enables users to align multi-modal features efficiently, encode instructions, and generate responses across different data types.

VideoLingo
VideoLingo is an all-in-one video translation and localization dubbing tool designed to generate Netflix-level high-quality subtitles. It aims to eliminate stiff machine translation, multiple lines of subtitles, and can even add high-quality dubbing, allowing knowledge from around the world to be shared across language barriers. Through an intuitive Streamlit web interface, the entire process from video link to embedded high-quality bilingual subtitles and even dubbing can be completed with just two clicks, easily creating Netflix-quality localized videos. Key features and functions include using yt-dlp to download videos from Youtube links, using WhisperX for word-level timeline subtitle recognition, using NLP and GPT for subtitle segmentation based on sentence meaning, summarizing intelligent term knowledge base with GPT for context-aware translation, three-step direct translation, reflection, and free translation to eliminate strange machine translation, checking single-line subtitle length and translation quality according to Netflix standards, using GPT-SoVITS for high-quality aligned dubbing, and integrating package for one-click startup and one-click output in streamlit.

Mooncake
Mooncake is a serving platform for Kimi, a leading LLM service provided by Moonshot AI. It features a KVCache-centric disaggregated architecture that separates prefill and decoding clusters, leveraging underutilized CPU, DRAM, and SSD resources of the GPU cluster. Mooncake's scheduler balances throughput and latency-related SLOs, with a prediction-based early rejection policy for highly overloaded scenarios. It excels in long-context scenarios, achieving up to a 525% increase in throughput while handling 75% more requests under real workloads.

humanlayer
HumanLayer is a Python toolkit designed to enable AI agents to interact with humans in tool-based and asynchronous workflows. By incorporating humans-in-the-loop, agentic tools can access more powerful and meaningful tasks. The toolkit provides features like requiring human approval for function calls, human as a tool for contacting humans, omni-channel contact capabilities, granular routing, and support for various LLMs and orchestration frameworks. HumanLayer aims to ensure human oversight of high-stakes function calls, making AI agents more reliable and safe in executing impactful tasks.

oat
Oat is a simple and efficient framework for running online LLM alignment algorithms. It implements a distributed Actor-Learner-Oracle architecture, with components optimized using state-of-the-art tools. Oat simplifies the experimental pipeline of LLM alignment by serving an Oracle online for preference data labeling and model evaluation. It provides a variety of oracles for simulating feedback and supports verifiable rewards. Oat's modular structure allows for easy inheritance and modification of classes, enabling rapid prototyping and experimentation with new algorithms. The framework implements cutting-edge online algorithms like PPO for math reasoning and various online exploration algorithms.

LLM-Pruner
LLM-Pruner is a tool for structural pruning of large language models, allowing task-agnostic compression while retaining multi-task solving ability. It supports automatic structural pruning of various LLMs with minimal human effort. The tool is efficient, requiring only 3 minutes for pruning and 3 hours for post-training. Supported LLMs include Llama-3.1, Llama-3, Llama-2, LLaMA, BLOOM, Vicuna, and Baichuan. Updates include support for new LLMs like GQA and BLOOM, as well as fine-tuning results achieving high accuracy. The tool provides step-by-step instructions for pruning, post-training, and evaluation, along with a Gradio interface for text generation. Limitations include issues with generating repetitive or nonsensical tokens in compressed models and manual operations for certain models.

MathCoder
MathCoder is a repository focused on enhancing mathematical reasoning by fine-tuning open-source language models to use code for modeling and deriving math equations. It introduces MathCodeInstruct dataset with solutions interleaving natural language, code, and execution results. The repository provides MathCoder models capable of generating code-based solutions for challenging math problems, achieving state-of-the-art scores on MATH and GSM8K datasets. It offers tools for model deployment, inference, and evaluation, along with a citation for referencing the work.

refly
Refly.AI is an open-source AI-native creation engine that empowers users to transform ideas into production-ready content. It features a free-form canvas interface with multi-threaded conversations, knowledge base integration, contextual memory, intelligent search, WYSIWYG AI editor, and more. Users can leverage AI-powered capabilities, context memory, knowledge base integration, quotes, and AI document editing to enhance their content creation process. Refly offers both cloud and self-hosting options, making it suitable for individuals, enterprises, and organizations. The tool is designed to facilitate human-AI collaboration and streamline content creation workflows.

ENOVA
ENOVA is an open-source service for Large Language Model (LLM) deployment, monitoring, injection, and auto-scaling. It addresses challenges in deploying stable serverless LLM services on GPU clusters with auto-scaling by deconstructing the LLM service execution process and providing configuration recommendations and performance detection. Users can build and deploy LLM with few command lines, recommend optimal computing resources, experience LLM performance, observe operating status, achieve load balancing, and more. ENOVA ensures stable operation, cost-effectiveness, efficiency, and strong scalability of LLM services.

fluid
Fluid is an open source Kubernetes-native Distributed Dataset Orchestrator and Accelerator for data-intensive applications, such as big data and AI applications. It implements dataset abstraction, scalable cache runtime, automated data operations, elasticity and scheduling, and is runtime platform agnostic. Key concepts include Dataset and Runtime. Prerequisites include Kubernetes version > 1.16, Golang 1.18+, and Helm 3. The tool offers features like accelerating remote file accessing, machine learning, accelerating PVC, preloading dataset, and on-the-fly dataset cache scaling. Contributions are welcomed, and the project is under the Apache 2.0 license with a vendor-neutral approach.
For similar tasks

AReaL
AReaL (Ant Reasoning RL) is an open-source reinforcement learning system developed at the RL Lab, Ant Research. It is designed for training Large Reasoning Models (LRMs) in a fully open and inclusive manner. AReaL provides reproducible experiments for 1.5B and 7B LRMs, showcasing its scalability and performance across diverse computational budgets. The system follows an iterative training process to enhance model performance, with a focus on mathematical reasoning tasks. AReaL is equipped to adapt to different computational resource settings, enabling users to easily configure and launch training trials. Future plans include support for advanced models, optimizations for distributed training, and exploring research topics to enhance LRMs' reasoning capabilities.

lighteval
LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron. We're releasing it with the community in the spirit of building in the open. Note that it is still very much early so don't expect 100% stability ^^' In case of problems or question, feel free to open an issue!

Firefly
Firefly is an open-source large model training project that supports pre-training, fine-tuning, and DPO of mainstream large models. It includes models like Llama3, Gemma, Qwen1.5, MiniCPM, Llama, InternLM, Baichuan, ChatGLM, Yi, Deepseek, Qwen, Orion, Ziya, Xverse, Mistral, Mixtral-8x7B, Zephyr, Vicuna, Bloom, etc. The project supports full-parameter training, LoRA, QLoRA efficient training, and various tasks such as pre-training, SFT, and DPO. Suitable for users with limited training resources, QLoRA is recommended for fine-tuning instructions. The project has achieved good results on the Open LLM Leaderboard with QLoRA training process validation. The latest version has significant updates and adaptations for different chat model templates.

Awesome-Text2SQL
Awesome Text2SQL is a curated repository containing tutorials and resources for Large Language Models, Text2SQL, Text2DSL, Text2API, Text2Vis, and more. It provides guidelines on converting natural language questions into structured SQL queries, with a focus on NL2SQL. The repository includes information on various models, datasets, evaluation metrics, fine-tuning methods, libraries, and practice projects related to Text2SQL. It serves as a comprehensive resource for individuals interested in working with Text2SQL and related technologies.

create-million-parameter-llm-from-scratch
The 'create-million-parameter-llm-from-scratch' repository provides a detailed guide on creating a Large Language Model (LLM) with 2.3 million parameters from scratch. The blog replicates the LLaMA approach, incorporating concepts like RMSNorm for pre-normalization, SwiGLU activation function, and Rotary Embeddings. The model is trained on a basic dataset to demonstrate the ease of creating a million-parameter LLM without the need for a high-end GPU.

StableToolBench
StableToolBench is a new benchmark developed to address the instability of Tool Learning benchmarks. It aims to balance stability and reality by introducing features such as a Virtual API System with caching and API simulators, a new set of solvable queries determined by LLMs, and a Stable Evaluation System using GPT-4. The Virtual API Server can be set up either by building from source or using a prebuilt Docker image. Users can test the server using provided scripts and evaluate models with Solvable Pass Rate and Solvable Win Rate metrics. The tool also includes model experiments results comparing different models' performance.

BetaML.jl
The Beta Machine Learning Toolkit is a package containing various algorithms and utilities for implementing machine learning workflows in multiple languages, including Julia, Python, and R. It offers a range of supervised and unsupervised models, data transformers, and assessment tools. The models are implemented entirely in Julia and are not wrappers for third-party models. Users can easily contribute new models or request implementations. The focus is on user-friendliness rather than computational efficiency, making it suitable for educational and research purposes.

AI-TOD
AI-TOD is a dataset for tiny object detection in aerial images, containing 700,621 object instances across 28,036 images. Objects in AI-TOD are smaller with a mean size of 12.8 pixels compared to other aerial image datasets. To use AI-TOD, download xView training set and AI-TOD_wo_xview, then generate the complete dataset using the provided synthesis tool. The dataset is publicly available for academic and research purposes under CC BY-NC-SA 4.0 license.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.