auto-paper-digest

auto-paper-digest: An automated pipeline that tracks Hugging Face weekly AI papers, downloads PDFs, imports them into NotebookLM, generates video overviews, and archives everything into a searchable weekly digest.

Stars: 485

Visit

Auto Paper Digest (APD) is a tool designed to automatically fetch cutting-edge AI research papers, download PDFs, generate video explanations, and publish them on platforms like HuggingFace, Douyin, and portal websites. It provides functionalities such as fetching papers from Hugging Face, downloading PDFs from arXiv, generating videos using NotebookLM, automatic publishing to HuggingFace Dataset, automatic publishing to Douyin, and hosting videos on a Gradio portal website. The tool also supports resuming interrupted tasks, persistent login states for Google and Douyin, and a structured workflow divided into three phases: Upload, Download, and Publish.

README:

🚀 Auto Paper Digest (APD)

自动获取 AI 前沿论文 → 下载 PDF → 生成视频讲解 → 发布到 HuggingFace/抖音 → 门户网站展示

🎥 在线体验： https://huggingface.co/spaces/brianxiadong0627/paper-digest

📱 关注抖音，获取最新 AI 论文解读视频！

🔥 最新AI论文，每周更新
_{扫码关注，第一时间获取精彩内容}

✨ 功能亮点

功能	说明
📚 论文获取	自动抓取 Hugging Face 每周热门 AI 论文（支持周 URL）
📄 PDF 下载	从 arXiv 下载论文 PDF（幂等操作，SHA256 校验）
🎬 视频生成	通过 NotebookLM 自动生成论文视频讲解
📤 自动发布	上传视频到 HuggingFace Dataset
📱 抖音发布	自动发布视频到抖音创作者平台
🌐 门户网站	Gradio 门户网站，在线播放视频
💾 断点续传	SQLite 状态追踪，支持中断后继续
🔐 登录复用	Google/抖音登录状态持久化，一次登录长期使用

📐 架构设计

┌─────────────────────────────────────────────────────────────────────┐
│                        Auto Paper Digest                             │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   Phase 1: Upload            Phase 2: Download      Phase 3: Publish │
│   ┌─────────┐    ┌─────────┐    ┌─────────────┐    ┌──────────────┐ │
│   │   HF    │───▶│  arXiv  │───▶│ NotebookLM  │───▶│  HuggingFace │ │
│   │ Papers  │    │  PDFs   │    │   Videos    │    │   Dataset    │ │
│   └─────────┘    └─────────┘    └─────────────┘    └──────────────┘ │
│        │               │               │                   │         │
│        ▼               ▼               ▼                   ▼         │
│   ┌─────────────────────────────────────────────────────────────┐   │
│   │                    SQLite Database                           │   │
│   │      (status: NEW → PDF_OK → NBLM_OK → VIDEO_OK)            │   │
│   └─────────────────────────────────────────────────────────────┘   │
│                              │                                       │
│              ┌───────────────┼───────────────┐                       │
│              ▼               ▼               ▼                       │
│   ┌─────────────────┐ ┌─────────────┐ ┌─────────────┐               │
│   │ Portal Website  │ │   Douyin    │ │   Other     │               │
│   │  (HF Spaces)    │ │  Creator    │ │  Platforms  │               │
│   └─────────────────┘ └─────────────┘ └─────────────┘               │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

🚀 快速开始

1. 安装

# 克隆仓库
git clone https://github.com/brianxiadong/auto-paper-digest.git
cd auto-paper-digest

# 安装依赖
pip install -e .

# 安装浏览器
playwright install chromium

2. 配置环境变量

# 复制配置模板
cp .env.example .env

# 编辑 .env 填入 HuggingFace 配置
# HF_TOKEN=hf_xxx
# HF_USERNAME=your-username
# HF_DATASET_NAME=paper-digest-videos

3. 首次登录 Google

apd login

浏览器会打开 NotebookLM 登录页面，完成 Google 登录后，会话将被保存。

📖 三阶段工作流

Phase 1: 上传并触发视频生成

apd upload --week 2026-01 --headful --max 10

该命令会：

✅ 获取 HuggingFace 本周论文（使用 /week/YYYY-WXX URL）
✅ 下载 arXiv PDF（支持缓存，已下载的跳过）
✅ 上传到 NotebookLM
✅ 触发视频生成（不等待完成）

Phase 2: 下载生成的视频

等待几分钟后（视频生成需要时间），运行：

apd download-video --week 2026-01 --headful

支持缓存！已下载的视频会自动跳过，使用 --force 强制重新下载。

Phase 3: 发布到 HuggingFace

apd publish --week 2026-01

该命令会：

✅ 上传视频到 HuggingFace Dataset
✅ 更新 metadata.json
✅ 生成 Markdown 摘要

Phase 3b: 发布到抖音（可选）

首次使用需要先登录抖音：

apd douyin-login

浏览器会打开抖音创作者中心登录页面，使用抖音 APP 扫码登录，登录状态将被保存。

然后发布视频到抖音：

apd publish-douyin --week 2026-01 --headful

该命令会：

✅ 自动上传视频到抖音创作者平台
✅ 填写视频标题（论文标题）
✅ 添加话题标签（AI、论文解读等）
✅ 自动点击发布

💡 提示：首次使用建议添加 --headful 参数观察发布过程，确认无误后可去掉该参数。

📅 按日处理（可选）

除了按周处理外，也支持按日期处理论文：

# 获取指定日期的论文
apd fetch --date 2026-01-08 --max 10

# 上传并生成视频
apd upload --date 2026-01-08 --headful --max 10

# 下载视频
apd download-video --date 2026-01-08 --headful

# 发布到抖音
apd publish-douyin --date 2026-01-08 --headful

⚠️ 注意：周末和节假日没有论文，系统会提示错误而非继续处理。

文件夹结构

按日和按周的数据分开存放：

data/pdfs/weekly/2026-01/ - 按周处理的 PDF
data/pdfs/daily/2026-01-08/ - 按日处理的 PDF
data/videos/weekly/2026-01/ - 按周处理的视频
data/videos/daily/2026-01-08/ - 按日处理的视频

🌐 门户网站

视频发布后，可在 HuggingFace Spaces 门户网站直接观看：

https://huggingface.co/spaces/your-username/paper-digest

📖 命令大全

命令	说明
`apd login`	打开浏览器完成 Google 登录（NotebookLM）
`apd douyin-login`	打开浏览器完成抖音登录
`apd fetch`	仅获取论文列表（不下载）
`apd download`	仅下载 PDF（支持缓存）
`apd upload`	Phase 1：获取 + 下载 + 上传 + 触发生成
`apd download-video`	Phase 2：下载已生成的视频（支持缓存）
`apd publish`	Phase 3：发布到 HuggingFace
`apd publish-douyin`	Phase 3b：发布到抖音创作者平台
`apd digest`	生成本地周报
`apd run`	完整流程（一键执行，需等待视频生成）
`apd status`	查看论文处理状态

常用参数

--week, -w     指定周 ID（如 2026-01），默认当前周
--max, -m      最大论文数量
--headful      显示浏览器窗口（调试时使用）
--force, -f    强制重新处理（忽略缓存）
--debug        开启调试日志

📁 目录结构

auto-paper-digest/
├── apd/                    # 主程序包
│   ├── cli.py              # 命令行入口
│   ├── config.py           # 配置常量
│   ├── db.py               # SQLite 数据库
│   ├── hf_fetcher.py       # HF 论文抓取（支持周 URL）
│   ├── pdf_downloader.py   # PDF 下载器
│   ├── nblm_bot.py         # NotebookLM 自动化
│   ├── douyin_bot.py       # 抖音创作者平台自动化
│   ├── publisher.py        # HuggingFace 发布
│   ├── digest.py           # 周报生成
│   └── utils.py            # 工具函数
├── portal/                 # HuggingFace Spaces 门户
│   ├── app.py              # Gradio 应用
│   ├── requirements.txt
│   └── README.md
├── data/
│   ├── apd.db              # SQLite 数据库
│   ├── .douyin_auth.json   # 抖音登录状态
│   ├── pdfs/               # 下载的 PDF（按周分目录）
│   ├── videos/             # 生成的视频（按周分目录）
│   ├── digests/            # 周报文件
│   └── profiles/           # 浏览器配置（含登录态）
├── .env.example            # 环境变量模板
└── pyproject.toml

� 缓存机制

PDF 缓存

已下载的 PDF 通过 SHA256 校验
相同文件自动跳过

视频缓存

使用文件名前缀匹配（{paper_id}_*.mp4）
支持新的命名格式：{paper_id}_{video_title}.mp4
使用 --force 强制重新下载

发布缓存

metadata.json 中记录已发布的论文
重复发布自动跳过

📊 状态追踪

NEW → PDF_OK → NBLM_OK → VIDEO_OK
 │                          │
 └──────── ERROR ◄──────────┘

状态	含义
`NEW`	论文已抓取，待处理
`PDF_OK`	PDF 已下载
`NBLM_OK`	已上传到 NotebookLM，视频生成中
`VIDEO_OK`	视频已下载
`ERROR`	处理失败（会自动重试）

查看状态：

apd status --week 2026-01
apd status --week 2026-01 --status ERROR

🔧 故障排除

登录问题

apd login

NotebookLM 界面变化

查看截图：

ls data/profiles/screenshots/

视频未生成

视频生成需要几分钟时间，请稍后重试：

apd download-video --week 2026-01 --headful

HuggingFace Token 问题

确保 .env 文件配置正确：

cat .env
# 检查 HF_TOKEN 和 HF_USERNAME

🤝 技术栈

Python 3.11+ - 核心语言
Playwright - 浏览器自动化
SQLite - 状态持久化
Click - CLI 框架
Requests + BeautifulSoup - 网页抓取
huggingface_hub - HF API
Gradio - 门户网站
python-dotenv - 环境变量管理

📄 License

For Tasks:

Click tags to check more tools for each tasks

fetch papers download pdfs generate videos publish content host videos

For Jobs:

data scientist machine learning engineer ai researcher content creator research assistant

Alternative AI tools for auto-paper-digest

Similar Open Source Tools

auto-paper-digest

github

: 485

lanhu-mcp

Lanhu MCP Server is a powerful Model Context Protocol (MCP) server designed for the AI programming era, perfectly supporting the Lanhu design collaboration platform. It offers features like intelligent requirement analysis, team knowledge base, UI design support, and performance optimization. The server is suitable for Cursor + Lanhu, Windsurf + Lanhu, Claude Code + Lanhu, Trae + Lanhu, and Cline + Lanhu integrations. It aims to break the isolation of AI IDEs and enable all AI assistants to share knowledge and context.

github

: 436

PaiAgent

PaiAgent is an enterprise-level AI workflow visualization orchestration platform that simplifies the combination and scheduling of AI capabilities. It allows developers and business users to quickly build complex AI processing flows through an intuitive drag-and-drop interface, without the need to write code, enabling collaboration of various large models.

github

: 78

bumpgen

bumpgen is a tool designed to automatically upgrade TypeScript / TSX dependencies and make necessary code changes to handle any breaking issues that may arise. It uses an abstract syntax tree to analyze code relationships, type definitions for external methods, and a plan graph DAG to execute changes in the correct order. The tool is currently limited to TypeScript and TSX but plans to support other strongly typed languages in the future. It aims to simplify the process of upgrading dependencies and handling code changes caused by updates.

github

: 67

AIxVuln

AIxVuln is an automated vulnerability discovery and verification system based on large models (LLM) + function calling + Docker sandbox. The system manages 'projects' through a web UI/desktop client, automatically organizing multiple 'digital humans' for environment setup, code auditing, vulnerability verification, and report generation. It utilizes an isolated Docker environment for dependency installation, service startup, PoC verification, and evidence collection, ultimately producing downloadable vulnerability reports. The system has already discovered dozens of vulnerabilities in real open-source projects.

github

: 78

observers

Observers is a lightweight library for AI observability that provides support for various generative AI APIs and storage backends. It allows users to track interactions with AI models and sync observations to different storage systems. The library supports OpenAI, Hugging Face transformers, AISuite, Litellm, and Docling for document parsing and export. Users can configure different stores such as Hugging Face Datasets, DuckDB, Argilla, and OpenTelemetry to manage and query their observations. Observers is designed to enhance AI model monitoring and observability in a user-friendly manner.

github

: 231

vibium

Vibium is a browser automation infrastructure designed for AI agents, providing a single binary that manages browser lifecycle, WebDriver BiDi protocol, and an MCP server. It offers zero configuration, AI-native capabilities, and is lightweight with no runtime dependencies. It is suitable for AI agents, test automation, and any tasks requiring browser interaction.

github

: 2.6k

z.ai2api_python

Z.AI2API Python is a lightweight OpenAI API proxy service that integrates seamlessly with existing applications. It supports the full functionality of GLM-4.5 series models and features high-performance streaming responses, enhanced tool invocation, support for thinking mode, integration with search models, Docker deployment, session isolation for privacy protection, flexible configuration via environment variables, and intelligent upstream model routing.

github

: 210

aiohomematic

AIO Homematic (hahomematic) is a lightweight Python 3 library for controlling and monitoring HomeMatic and HomematicIP devices, with support for third-party devices/gateways. It automatically creates entities for device parameters, offers custom entity classes for complex behavior, and includes features like caching paramsets for faster restarts. Designed to integrate with Home Assistant, it requires specific firmware versions for HomematicIP devices. The public API is defined in modules like central, client, model, exceptions, and const, with example usage provided. Useful links include changelog, data point definitions, troubleshooting, and developer resources for architecture, data flow, model extension, and Home Assistant lifecycle.

github

: 162

openakita

OpenAkita is a self-evolving AI Agent framework that autonomously learns new skills, performs daily self-checks and repairs, accumulates experience from task execution, and persists until the task is done. It auto-generates skills, installs dependencies, learns from mistakes, and remembers preferences. The framework is standards-based, multi-platform, and provides a Setup Center GUI for intuitive installation and configuration. It features self-learning and evolution mechanisms, a Ralph Wiggum Mode for persistent execution, multi-LLM endpoints, multi-platform IM support, desktop automation, multi-agent architecture, scheduled tasks, identity and memory management, a tool system, and a guided wizard for setup.

github

: 54

memsearch

Memsearch is a tool that allows users to give their AI agents persistent memory in a few lines of code. It enables users to write memories as markdown and search them semantically. Inspired by OpenClaw's markdown-first memory architecture, Memsearch is pluggable into any agent framework. The tool offers features like smart deduplication, live sync, and a ready-made Claude Code plugin for building agent memory.

github

: 188

Agentic-ADK

Agentic ADK is an Agent application development framework launched by Alibaba International AI Business, based on Google-ADK and Ali-LangEngine. It is used for developing, constructing, evaluating, and deploying powerful, flexible, and controllable complex AI Agents. ADK aims to make Agent development simpler and more user-friendly, enabling developers to more easily build, deploy, and orchestrate various Agent applications ranging from simple tasks to complex collaborations.

github

: 508

boxlite

BoxLite is an embedded, lightweight micro-VM runtime designed for AI agents running OCI containers with hardware-level isolation. It is built for high concurrency with no daemon required, offering features like lightweight VMs, high concurrency, hardware isolation, embeddability, and OCI compatibility. Users can spin up 'Boxes' to run containers for AI agent sandboxes and multi-tenant code execution scenarios where Docker alone is insufficient and full VM infrastructure is too heavy. BoxLite supports Python, Node.js, and Rust with quick start guides for each, along with features like CPU/memory limits, storage options, networking capabilities, security layers, and image registry configuration. The tool provides SDKs for Python and Node.js, with Go support coming soon. It offers detailed documentation, examples, and architecture insights for users to understand how BoxLite works under the hood.

github

: 1.1k

gin-vue-admin

Gin-vue-admin is a full-stack development platform based on Vue and Gin, integrating features like JWT authentication, dynamic routing, dynamic menus, Casbin authorization, form generator, code generator, etc. It provides various example files to help users focus more on business development. The project offers detailed documentation, video tutorials for setup and deployment, and a community for support and contributions. Users need a certain level of knowledge in Golang and Vue to work with this project. It is recommended to follow the Apache2.0 license if using the project for commercial purposes.

github

: 23.5k

kweaver

KWeaver is an open-source ecosystem for building, deploying, and running decision intelligence AI applications. It adopts ontology as the core methodology for business knowledge networks, with DIP as the core platform, aiming to provide elastic, agile, and reliable enterprise-grade decision intelligence to further unleash productivity. The DIP platform includes key subsystems such as ADP, Decision Agent, DIP Studio, and AI Store.

github

: 154

py-xiaozhi

py-xiaozhi is a Python-based XiaoZhi voice client designed for learning code and experiencing AI XiaoZhi's voice functions without hardware conditions. It features voice interaction, graphical interface, volume control, session management, encrypted audio transmission, CLI mode, and automatic copying of verification codes and opening browsers for first-time users. The project aims to optimize and add new features to zhh827's py-xiaozhi based on the original hardware project xiaozhi-esp32 and the Python implementation py-xiaozhi.

github

: 554

For similar tasks

auto-paper-digest

github

: 485

SciPIP

SciPIP is a scientific paper idea generation tool powered by a large language model (LLM) designed to assist researchers in quickly generating novel research ideas. It conducts a literature review based on user-provided background information and generates fresh ideas for potential studies. The tool is designed to help researchers in various fields by providing a GUI environment for idea generation, supporting NLP, multimodal, and CV fields, and allowing users to interact with the tool through a web app or terminal. SciPIP uses Neo4j as its database and provides functionalities for generating new ideas, fetching papers, and constructing the database.

github

: 60

cog-comfyui

Cog-comfyui allows users to run ComfyUI workflows on Replicate. ComfyUI is a visual programming tool for creating and sharing generative art workflows. With cog-comfyui, users can access a variety of pre-trained models and custom nodes to create their own unique artworks. The tool is easy to use and does not require any coding experience. Users simply need to upload their API JSON file and any necessary input files, and then click the "Run" button. Cog-comfyui will then generate the output image or video file.

github

: 604

biniou

biniou is a self-hosted webui for various GenAI (generative artificial intelligence) tasks. It allows users to generate multimedia content using AI models and chatbots on their own computer, even without a dedicated GPU. The tool can work offline once deployed and required models are downloaded. It offers a wide range of features for text, image, audio, video, and 3D object generation and modification. Users can easily manage the tool through a control panel within the webui, with support for various operating systems and CUDA optimization. biniou is powered by Huggingface and Gradio, providing a cross-platform solution for AI content generation.

github

: 619

Awesome-Colorful-LLM

Awesome-Colorful-LLM is a meticulously assembled anthology of vibrant multimodal research focusing on advancements propelled by large language models (LLMs) in domains such as Vision, Audio, Agent, Robotics, and Fundamental Sciences like Mathematics. The repository contains curated collections of works, datasets, benchmarks, projects, and tools related to LLMs and multimodal learning. It serves as a comprehensive resource for researchers and practitioners interested in exploring the intersection of language models and various modalities for tasks like image understanding, video pretraining, 3D modeling, document understanding, audio analysis, agent learning, robotic applications, and mathematical research.

github

: 106

omniscient

Omniscient is an advanced AI Platform offered as a SaaS, empowering projects with cutting-edge artificial intelligence capabilities. Seamlessly integrating with Next.js 14, React, Typescript, and APIs like OpenAI and Replicate, it provides solutions for code generation, conversation simulation, image creation, music composition, and video generation.

github

: 82

so-vits-models

This repository collects various LLM, AI-related models, applications, and datasets, including LLM-Chat for dialogue models, LLMs for large models, so-vits-svc for sound-related models, stable-diffusion for image-related models, and virtual-digital-person for generating videos. It also provides resources for deep learning courses and overviews, AI competitions, and specific AI tasks such as text, image, voice, and video processing.

github

: 164

jimeng-free-api-all

Jimeng AI Free API is a reverse-engineered API server that encapsulates Jimeng AI's image and video generation capabilities into OpenAI-compatible API interfaces. It supports the latest jimeng-5.0-preview, jimeng-4.6 text-to-image models, Seedance 2.0 multi-image intelligent video generation, zero-configuration deployment, and multi-token support. The API is fully compatible with OpenAI API format, seamlessly integrating with existing clients and supporting multiple session IDs for polling usage.

github

: 263

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 1.1k

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.9k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 32.9k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675