Open-dLLM

Open-dLLM

The most open diffusion language model for code generation β€” releasing pretraining, evaluation, inference, and checkpoints.

Stars: 237

Visit
 screenshot

Open-dLLM is the most open release of a diffusion-based large language model, providing pretraining, evaluation, inference, and checkpoints. It introduces Open-dCoder, the code-generation variant of Open-dLLM. The repo offers a complete stack for diffusion LLMs, enabling users to go from raw data to training, checkpoints, evaluation, and inference in one place. It includes pretraining pipeline with open datasets, inference scripts for easy sampling and generation, evaluation suite with various metrics, weights and checkpoints on Hugging Face, and transparent configs for full reproducibility.

README:

πŸ”₯ Open-dLLM: Open Diffusion Large Language Models

🌍 Languages: English | δΈ­ζ–‡

πŸ‘‰ TL;DR: Open-dLLM is the most open release of a diffusion-based large language model to date β€”
including pretraining, evaluation, inference, and checkpoints.

This repo introduces Open-dCoder, the code-generation variant of Open-dLLM.

GitHub Β Β Β Β  Notion Β Β Β Β  Hugging Face

πŸ’» Code Β  | Β  πŸ“– Blog Β  | Β  πŸ€— Model

πŸŽ₯ Demo

Quick Sort Demo

QuickSort generation using Open-dCoder (0.5B)

YouTube link Β Β Β Β  Bilibili link


✨ Highlights

  • πŸ‹οΈ Pretraining pipeline + open datasets
  • ⚑ Inference scripts β€” easy sampling & generation
  • πŸ“Š Evaluation suite β€” HumanEval, MBPP, Infilling (lm-eval-harness + custom metrics)
  • πŸ“¦ Weights + checkpoints on Hugging Face
  • 🀝 Transparent configs for full reproducibility

Why Open-dLLM?

Most diffusion LLM repos (e.g., LLaDA, Dream) only release inference scripts + weights, which limits reproducibility.
Open-dLLM is the first to open-source the entire stack for diffusion LLMs.

πŸ‘‰ With Open-dLLM, you can go from raw data β†’ training β†’ checkpoints β†’ evaluation β†’ inference, all in one repo.


πŸ”Ž Transparency Comparison of Diffusion LLM Releases

Project Data Training Code Inference Evaluation Weights
Open-dLLM / Open-dCoder (ours) βœ… βœ… βœ… βœ… βœ…
LLaDA ❌ ❌ βœ… ⚠️ Limited βœ…
Dream ❌ ❌ βœ… ⚠️ Limited βœ…
Gemini-Diffusion ❌ ❌ ❌ ❌ ❌ (API only)
Seed Diffusion ❌ ❌ ❌ ❌ ❌ (API only)
Mercury ❌ ❌ ❌ ❌ ❌ (API only)

βœ… = fully available Β· ❌ = not provided Β· ⚠️ = partial/limited


βš™οΈ Install

We use micromamba for environment management (feel free to adapt to conda):

micromamba install -c nvidia/label/cuda-12.3.0 cuda-toolkit -y
pip install ninja

# install the newest torch with cu121
pip install torch==2.5.0 --index-url https://download.pytorch.org/whl/cu121

pip install "flash-attn==2.7.4.post1" \
  --extra-index-url https://github.com/Dao-AILab/flash-attention/releases/download

pip install --upgrade --no-cache-dir \
  tensordict torchdata byte-flux triton>=3.1.0 \
  transformers==4.54.1 accelerate datasets peft hf-transfer \
  codetiming hydra-core pandas pyarrow>=15.0.0 pylatexenc \
  wandb ninja liger-kernel==0.5.8 \
  pytest yapf py-spy pyext pre-commit ruff packaging

pip install -e .

πŸš€ Quickstart: Sampling

from transformers import AutoTokenizer
from veomni.models.transformers.qwen2.modeling_qwen2 import Qwen2ForCausalLM
from veomni.models.transformers.qwen2.generation_utils import MDMGenerationConfig
import torch

model_id = "fredzzp/open-dcoder-0.5B"
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load tokenizer + model
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = Qwen2ForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, trust_remote_code=True
).to(device).eval()

# Prompt
prompt = "Write a quick sort algorithm in python."
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

# Generation config
gen_cfg = MDMGenerationConfig(max_new_tokens=128, steps=200, temperature=0.7)

with torch.no_grad():
    outputs = model.diffusion_generate(inputs=input_ids, generation_config=gen_cfg)

print(tokenizer.decode(outputs.sequences[0], skip_special_tokens=True))

πŸ‘‰ For full logging, history tracking, and file output:

python sample.py

πŸ“Š Benchmarking

We release a fully open-source evaluation suite for diffusion-based LLMs (dLLMs), covering both standard code generation tasks and code infilling tasks.

Benchmarks include: HumanEval / HumanEval+, MBPP / MBPP+, HumanEval-Infill, SantaCoder-FIM.


Standard Code Generation

Method HumanEval HumanEval+ MBPP MBPP+
Pass@1 Pass@10 Pass@1 Pass@10 Pass@1 Pass@10 Pass@1 Pass@10
LLaDA (8B) 35.4 50.0 30.5 43.3 38.8 53.4 52.6 69.1
Dream (7B) 56.7 59.2 50.0 53.7 55.4 56.2 71.5 72.5
Mask DFM (1.3B) 9.1 17.6 7.9 13.4 6.2 25.0 – –
Edit Flow (1.3B) 12.8 24.3 10.4 20.7 10.0 36.4 – –
Open-dCoder (0.5B, Ours) 20.8 38.4 17.6 35.2 16.7 38.4 23.9 53.6

Despite being only 0.5B parameters, Open-dCoder competes with much larger dLLMs in code completion tasks.


Code Infilling

Method HumanEval Infill Pass@1 SantaCoder Exact Match
LLaDA-8B 48.3 35.1
Dream-7B 39.4 40.7
DiffuCoder-7B 54.8 38.8
Dream-Coder-7B 55.3 40.0
Open-dCoder (0.5B, Ours) 32.5 29.6
Open-dCoder (0.5B, Ours) Oracle Length 77.4 56.4

We followed the average fixed length evaluation setting in DreamOn to get the results.


πŸ§ͺ Evaluation

Install evaluation packages:

pip install -e lm-evaluation-harness human-eval-infilling

Code Completion (HumanEval, MBPP)

cd eval/eval_completion
bash run_eval.sh

Code Infilling

cd eval/eval_infill
bash run_eval.sh

πŸ‹οΈ Pretraining

  • Data: Concise, high-quality code corpus FineCode, hosted on Hugging Face.
  • Initialization: Following Dream, continued pretraining from Qwen2.5-Coder, adapting it into the diffusion framework.
  • Loss: Masked Diffusion Model (MDM) objective β€” masking ratios uniformly sampled from [0,1], reconstructed with cross-entropy loss.

Download Data

python3 scripts/download_hf_data.py --repo_id fredzzp/fine_code --local_dir ./data

Training

export TOKENIZERS_PARALLELISM=false
NNODES=${NNODES:=1}
NPROC_PER_NODE=4
NODE_RANK=${NODE_RANK:=0}
MASTER_ADDR=${MASTER_ADDR:=0.0.0.0}
MASTER_PORT=${MASTER_PORT:=12345}



torchrun --nnodes=$NNODES --nproc-per-node $NPROC_PER_NODE --node-rank $NODE_RANK \
  --master-addr=$MASTER_ADDR --master-port=$MASTER_PORT tasks/train_torch.py \
  configs/pretrain/qwen2_5_coder_500M.yaml \
  --data.train_path=data/data \
  --train.ckpt_manager=dcp \
  --train.micro_batch_size=16 \
  --train.global_batch_size=512 \
  --train.output_dir=logs/Qwen2.5-Coder-0.5B_mdm \
  --train.save_steps=10000

Uploading Checkpoints to Hugging Face

from huggingface_hub import HfApi

REPO_ID = "fredzzp/open-dcoder-0.5B"
LOCAL_DIR = "logs/Qwen2.5-Coder-0.5B_mdm/checkpoints/global_step_370000/hf_ckpt"

api = HfApi()
api.create_repo(repo_id=REPO_ID, repo_type="model", exist_ok=True)
api.upload_folder(repo_id=REPO_ID, repo_type="model", folder_path=LOCAL_DIR)

πŸ™ Appreciation

This project builds on incredible prior work:

We stand on the shoulders of these projects, and hope Open-dLLM contributes back to the diffusion LLM community.

πŸ“š Citation

If you use Open-dLLM or Open-dCoder in your research, please cite us:

@misc{opendllm2025,
  title        = {Open-dLLM: Open Diffusion Large Language Models},
  author       = {Fred Zhangzhi Peng, Shuibai Zhang, Alex Tong, and contributors},
  year         = {2025},
  howpublished = {\url{https://github.com/pengzhangzhi/Open-dLLM}},
  note         = {Blog: \url{https://oval-shell-31c.notion.site/Open-Diffusion-Large-Language-Model-25e03bf6136480b7a4ebe3d53be9f68a?pvs=74}, 
                  Model: \url{https://huggingface.co/fredzzp/open-dcoder-0.5B}}
}

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for Open-dLLM

Similar Open Source Tools

For similar tasks

For similar jobs