auto-paper-digest
auto-paper-digest: An automated pipeline that tracks Hugging Face weekly AI papers, downloads PDFs, imports them into NotebookLM, generates video overviews, and archives everything into a searchable weekly digest.
Stars: 485
Auto Paper Digest (APD) is a tool designed to automatically fetch cutting-edge AI research papers, download PDFs, generate video explanations, and publish them on platforms like HuggingFace, Douyin, and portal websites. It provides functionalities such as fetching papers from Hugging Face, downloading PDFs from arXiv, generating videos using NotebookLM, automatic publishing to HuggingFace Dataset, automatic publishing to Douyin, and hosting videos on a Gradio portal website. The tool also supports resuming interrupted tasks, persistent login states for Google and Douyin, and a structured workflow divided into three phases: Upload, Download, and Publish.
README:
่ชๅจ่ทๅ AI ๅๆฒฟ่ฎบๆ โ ไธ่ฝฝ PDF โ ็ๆ่ง้ข่ฎฒ่งฃ โ ๅๅธๅฐ HuggingFace/ๆ้ณ โ ้จๆท็ฝ็ซๅฑ็คบ
๐ฅ ๅจ็บฟไฝ้ช๏ผ https://huggingface.co/spaces/brianxiadong0627/paper-digest
๐ฅ ๆๆฐAI่ฎบๆ๏ผๆฏๅจๆดๆฐ ๆซ็ ๅ ณๆณจ๏ผ็ฌฌไธๆถ้ด่ทๅ็ฒพๅฝฉๅ ๅฎน |
| ๅ่ฝ | ่ฏดๆ |
|---|---|
| ๐ ่ฎบๆ่ทๅ | ่ชๅจๆๅ Hugging Face ๆฏๅจ็ญ้จ AI ่ฎบๆ๏ผๆฏๆๅจ URL๏ผ |
| ๐ PDF ไธ่ฝฝ | ไป arXiv ไธ่ฝฝ่ฎบๆ PDF๏ผๅน็ญๆไฝ๏ผSHA256 ๆ ก้ช๏ผ |
| ๐ฌ ่ง้ข็ๆ | ้่ฟ NotebookLM ่ชๅจ็ๆ่ฎบๆ่ง้ข่ฎฒ่งฃ |
| ๐ค ่ชๅจๅๅธ | ไธไผ ่ง้ขๅฐ HuggingFace Dataset |
| ๐ฑ ๆ้ณๅๅธ | ่ชๅจๅๅธ่ง้ขๅฐๆ้ณๅไฝ่ ๅนณๅฐ |
| ๐ ้จๆท็ฝ็ซ | Gradio ้จๆท็ฝ็ซ๏ผๅจ็บฟๆญๆพ่ง้ข |
| ๐พ ๆญ็น็ปญไผ | SQLite ็ถๆ่ฟฝ่ธช๏ผๆฏๆไธญๆญๅ็ปง็ปญ |
| ๐ ็ปๅฝๅค็จ | Google/ๆ้ณ็ปๅฝ็ถๆๆไน ๅ๏ผไธๆฌก็ปๅฝ้ฟๆไฝฟ็จ |
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Auto Paper Digest โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Phase 1: Upload Phase 2: Download Phase 3: Publish โ
โ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ HF โโโโโถโ arXiv โโโโโถโ NotebookLM โโโโโถโ HuggingFace โ โ
โ โ Papers โ โ PDFs โ โ Videos โ โ Dataset โ โ
โ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ โ โ โ โ
โ โผ โผ โผ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ SQLite Database โ โ
โ โ (status: NEW โ PDF_OK โ NBLM_OK โ VIDEO_OK) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโ โ
โ โผ โผ โผ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ Portal Website โ โ Douyin โ โ Other โ โ
โ โ (HF Spaces) โ โ Creator โ โ Platforms โ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# ๅ
้ไปๅบ
git clone https://github.com/brianxiadong/auto-paper-digest.git
cd auto-paper-digest
# ๅฎ่ฃ
ไพ่ต
pip install -e .
# ๅฎ่ฃ
ๆต่งๅจ
playwright install chromium# ๅคๅถ้
็ฝฎๆจกๆฟ
cp .env.example .env
# ็ผ่พ .env ๅกซๅ
ฅ HuggingFace ้
็ฝฎ
# HF_TOKEN=hf_xxx
# HF_USERNAME=your-username
# HF_DATASET_NAME=paper-digest-videosapd loginๆต่งๅจไผๆๅผ NotebookLM ็ปๅฝ้กต้ข๏ผๅฎๆ Google ็ปๅฝๅ๏ผไผ่ฏๅฐ่ขซไฟๅญใ
apd upload --week 2026-01 --headful --max 10่ฏฅๅฝไปคไผ๏ผ
- โ
่ทๅ HuggingFace ๆฌๅจ่ฎบๆ๏ผไฝฟ็จ
/week/YYYY-WXXURL๏ผ - โ ไธ่ฝฝ arXiv PDF๏ผๆฏๆ็ผๅญ๏ผๅทฒไธ่ฝฝ็่ทณ่ฟ๏ผ
- โ ไธไผ ๅฐ NotebookLM
- โ ่งฆๅ่ง้ข็ๆ๏ผไธ็ญๅพ ๅฎๆ๏ผ
็ญๅพ ๅ ๅ้ๅ๏ผ่ง้ข็ๆ้่ฆๆถ้ด๏ผ๏ผ่ฟ่ก๏ผ
apd download-video --week 2026-01 --headfulๆฏๆ็ผๅญ๏ผๅทฒไธ่ฝฝ็่ง้ขไผ่ชๅจ่ทณ่ฟ๏ผไฝฟ็จ --force ๅผบๅถ้ๆฐไธ่ฝฝใ
apd publish --week 2026-01่ฏฅๅฝไปคไผ๏ผ
- โ ไธไผ ่ง้ขๅฐ HuggingFace Dataset
- โ ๆดๆฐ metadata.json
- โ ็ๆ Markdown ๆ่ฆ
้ฆๆฌกไฝฟ็จ้่ฆๅ ็ปๅฝๆ้ณ๏ผ
apd douyin-loginๆต่งๅจไผๆๅผๆ้ณๅไฝ่ ไธญๅฟ็ปๅฝ้กต้ข๏ผไฝฟ็จๆ้ณ APP ๆซ็ ็ปๅฝ๏ผ็ปๅฝ็ถๆๅฐ่ขซไฟๅญใ
็ถๅๅๅธ่ง้ขๅฐๆ้ณ๏ผ
apd publish-douyin --week 2026-01 --headful่ฏฅๅฝไปคไผ๏ผ
- โ ่ชๅจไธไผ ่ง้ขๅฐๆ้ณๅไฝ่ ๅนณๅฐ
- โ ๅกซๅ่ง้ขๆ ้ข๏ผ่ฎบๆๆ ้ข๏ผ
- โ ๆทปๅ ่ฏ้ขๆ ็ญพ๏ผAIใ่ฎบๆ่งฃ่ฏป็ญ๏ผ
- โ ่ชๅจ็นๅปๅๅธ
๐ก ๆ็คบ๏ผ้ฆๆฌกไฝฟ็จๅปบ่ฎฎๆทปๅ
--headfulๅๆฐ่งๅฏๅๅธ่ฟ็จ๏ผ็กฎ่ฎคๆ ่ฏฏๅๅฏๅปๆ่ฏฅๅๆฐใ
้คไบๆๅจๅค็ๅค๏ผไนๆฏๆๆๆฅๆๅค็่ฎบๆ๏ผ
# ่ทๅๆๅฎๆฅๆ็่ฎบๆ
apd fetch --date 2026-01-08 --max 10
# ไธไผ ๅนถ็ๆ่ง้ข
apd upload --date 2026-01-08 --headful --max 10
# ไธ่ฝฝ่ง้ข
apd download-video --date 2026-01-08 --headful
# ๅๅธๅฐๆ้ณ
apd publish-douyin --date 2026-01-08 --headful
โ ๏ธ ๆณจๆ๏ผๅจๆซๅ่ๅๆฅๆฒกๆ่ฎบๆ๏ผ็ณป็ปไผๆ็คบ้่ฏฏ่้็ปง็ปญๅค็ใ
ๆๆฅๅๆๅจ็ๆฐๆฎๅๅผๅญๆพ๏ผ
-
data/pdfs/weekly/2026-01/- ๆๅจๅค็็ PDF -
data/pdfs/daily/2026-01-08/- ๆๆฅๅค็็ PDF -
data/videos/weekly/2026-01/- ๆๅจๅค็็่ง้ข -
data/videos/daily/2026-01-08/- ๆๆฅๅค็็่ง้ข
่ง้ขๅๅธๅ๏ผๅฏๅจ HuggingFace Spaces ้จๆท็ฝ็ซ็ดๆฅ่ง็๏ผ
https://huggingface.co/spaces/your-username/paper-digest
| ๅฝไปค | ่ฏดๆ |
|---|---|
apd login |
ๆๅผๆต่งๅจๅฎๆ Google ็ปๅฝ๏ผNotebookLM๏ผ |
apd douyin-login |
ๆๅผๆต่งๅจๅฎๆๆ้ณ็ปๅฝ |
apd fetch |
ไป ่ทๅ่ฎบๆๅ่กจ๏ผไธไธ่ฝฝ๏ผ |
apd download |
ไป ไธ่ฝฝ PDF๏ผๆฏๆ็ผๅญ๏ผ |
apd upload |
Phase 1๏ผ่ทๅ + ไธ่ฝฝ + ไธไผ + ่งฆๅ็ๆ |
apd download-video |
Phase 2๏ผไธ่ฝฝๅทฒ็ๆ็่ง้ข๏ผๆฏๆ็ผๅญ๏ผ |
apd publish |
Phase 3๏ผๅๅธๅฐ HuggingFace |
apd publish-douyin |
Phase 3b๏ผๅๅธๅฐๆ้ณๅไฝ่ ๅนณๅฐ |
apd digest |
็ๆๆฌๅฐๅจๆฅ |
apd run |
ๅฎๆดๆต็จ๏ผไธ้ฎๆง่ก๏ผ้็ญๅพ ่ง้ข็ๆ๏ผ |
apd status |
ๆฅ็่ฎบๆๅค็็ถๆ |
--week, -w ๆๅฎๅจ ID๏ผๅฆ 2026-01๏ผ๏ผ้ป่ฎคๅฝๅๅจ
--max, -m ๆๅคง่ฎบๆๆฐ้
--headful ๆพ็คบๆต่งๅจ็ชๅฃ๏ผ่ฐ่ฏๆถไฝฟ็จ๏ผ
--force, -f ๅผบๅถ้ๆฐๅค็๏ผๅฟฝ็ฅ็ผๅญ๏ผ
--debug ๅผๅฏ่ฐ่ฏๆฅๅฟauto-paper-digest/
โโโ apd/ # ไธป็จๅบๅ
โ โโโ cli.py # ๅฝไปค่กๅ
ฅๅฃ
โ โโโ config.py # ้
็ฝฎๅธธ้
โ โโโ db.py # SQLite ๆฐๆฎๅบ
โ โโโ hf_fetcher.py # HF ่ฎบๆๆๅ๏ผๆฏๆๅจ URL๏ผ
โ โโโ pdf_downloader.py # PDF ไธ่ฝฝๅจ
โ โโโ nblm_bot.py # NotebookLM ่ชๅจๅ
โ โโโ douyin_bot.py # ๆ้ณๅไฝ่
ๅนณๅฐ่ชๅจๅ
โ โโโ publisher.py # HuggingFace ๅๅธ
โ โโโ digest.py # ๅจๆฅ็ๆ
โ โโโ utils.py # ๅทฅๅ
ทๅฝๆฐ
โโโ portal/ # HuggingFace Spaces ้จๆท
โ โโโ app.py # Gradio ๅบ็จ
โ โโโ requirements.txt
โ โโโ README.md
โโโ data/
โ โโโ apd.db # SQLite ๆฐๆฎๅบ
โ โโโ .douyin_auth.json # ๆ้ณ็ปๅฝ็ถๆ
โ โโโ pdfs/ # ไธ่ฝฝ็ PDF๏ผๆๅจๅ็ฎๅฝ๏ผ
โ โโโ videos/ # ็ๆ็่ง้ข๏ผๆๅจๅ็ฎๅฝ๏ผ
โ โโโ digests/ # ๅจๆฅๆไปถ
โ โโโ profiles/ # ๆต่งๅจ้
็ฝฎ๏ผๅซ็ปๅฝๆ๏ผ
โโโ .env.example # ็ฏๅขๅ้ๆจกๆฟ
โโโ pyproject.toml
- ๅทฒไธ่ฝฝ็ PDF ้่ฟ SHA256 ๆ ก้ช
- ็ธๅๆไปถ่ชๅจ่ทณ่ฟ
- ไฝฟ็จๆไปถๅๅ็ผๅน้
๏ผ
{paper_id}_*.mp4๏ผ - ๆฏๆๆฐ็ๅฝๅๆ ผๅผ๏ผ
{paper_id}_{video_title}.mp4 - ไฝฟ็จ
--forceๅผบๅถ้ๆฐไธ่ฝฝ
- metadata.json ไธญ่ฎฐๅฝๅทฒๅๅธ็่ฎบๆ
- ้ๅคๅๅธ่ชๅจ่ทณ่ฟ
NEW โ PDF_OK โ NBLM_OK โ VIDEO_OK
โ โ
โโโโโโโโโ ERROR โโโโโโโโโโโโ
| ็ถๆ | ๅซไน |
|---|---|
NEW |
่ฎบๆๅทฒๆๅ๏ผๅพ ๅค็ |
PDF_OK |
PDF ๅทฒไธ่ฝฝ |
NBLM_OK |
ๅทฒไธไผ ๅฐ NotebookLM๏ผ่ง้ข็ๆไธญ |
VIDEO_OK |
่ง้ขๅทฒไธ่ฝฝ |
ERROR |
ๅค็ๅคฑ่ดฅ๏ผไผ่ชๅจ้่ฏ๏ผ |
ๆฅ็็ถๆ๏ผ
apd status --week 2026-01
apd status --week 2026-01 --status ERRORapd loginๆฅ็ๆชๅพ๏ผ
ls data/profiles/screenshots/่ง้ข็ๆ้่ฆๅ ๅ้ๆถ้ด๏ผ่ฏท็จๅ้่ฏ๏ผ
apd download-video --week 2026-01 --headful็กฎไฟ .env ๆไปถ้
็ฝฎๆญฃ็กฎ๏ผ
cat .env
# ๆฃๆฅ HF_TOKEN ๅ HF_USERNAME- Python 3.11+ - ๆ ธๅฟ่ฏญ่จ
- Playwright - ๆต่งๅจ่ชๅจๅ
- SQLite - ็ถๆๆไน ๅ
- Click - CLI ๆกๆถ
- Requests + BeautifulSoup - ็ฝ้กตๆๅ
- huggingface_hub - HF API
- Gradio - ้จๆท็ฝ็ซ
- python-dotenv - ็ฏๅขๅ้็ฎก็
MIT License ยฉ 2026
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for auto-paper-digest
Similar Open Source Tools
auto-paper-digest
Auto Paper Digest (APD) is a tool designed to automatically fetch cutting-edge AI research papers, download PDFs, generate video explanations, and publish them on platforms like HuggingFace, Douyin, and portal websites. It provides functionalities such as fetching papers from Hugging Face, downloading PDFs from arXiv, generating videos using NotebookLM, automatic publishing to HuggingFace Dataset, automatic publishing to Douyin, and hosting videos on a Gradio portal website. The tool also supports resuming interrupted tasks, persistent login states for Google and Douyin, and a structured workflow divided into three phases: Upload, Download, and Publish.
lanhu-mcp
Lanhu MCP Server is a powerful Model Context Protocol (MCP) server designed for the AI programming era, perfectly supporting the Lanhu design collaboration platform. It offers features like intelligent requirement analysis, team knowledge base, UI design support, and performance optimization. The server is suitable for Cursor + Lanhu, Windsurf + Lanhu, Claude Code + Lanhu, Trae + Lanhu, and Cline + Lanhu integrations. It aims to break the isolation of AI IDEs and enable all AI assistants to share knowledge and context.
PaiAgent
PaiAgent is an enterprise-level AI workflow visualization orchestration platform that simplifies the combination and scheduling of AI capabilities. It allows developers and business users to quickly build complex AI processing flows through an intuitive drag-and-drop interface, without the need to write code, enabling collaboration of various large models.
bumpgen
bumpgen is a tool designed to automatically upgrade TypeScript / TSX dependencies and make necessary code changes to handle any breaking issues that may arise. It uses an abstract syntax tree to analyze code relationships, type definitions for external methods, and a plan graph DAG to execute changes in the correct order. The tool is currently limited to TypeScript and TSX but plans to support other strongly typed languages in the future. It aims to simplify the process of upgrading dependencies and handling code changes caused by updates.
AIxVuln
AIxVuln is an automated vulnerability discovery and verification system based on large models (LLM) + function calling + Docker sandbox. The system manages 'projects' through a web UI/desktop client, automatically organizing multiple 'digital humans' for environment setup, code auditing, vulnerability verification, and report generation. It utilizes an isolated Docker environment for dependency installation, service startup, PoC verification, and evidence collection, ultimately producing downloadable vulnerability reports. The system has already discovered dozens of vulnerabilities in real open-source projects.
observers
Observers is a lightweight library for AI observability that provides support for various generative AI APIs and storage backends. It allows users to track interactions with AI models and sync observations to different storage systems. The library supports OpenAI, Hugging Face transformers, AISuite, Litellm, and Docling for document parsing and export. Users can configure different stores such as Hugging Face Datasets, DuckDB, Argilla, and OpenTelemetry to manage and query their observations. Observers is designed to enhance AI model monitoring and observability in a user-friendly manner.
vibium
Vibium is a browser automation infrastructure designed for AI agents, providing a single binary that manages browser lifecycle, WebDriver BiDi protocol, and an MCP server. It offers zero configuration, AI-native capabilities, and is lightweight with no runtime dependencies. It is suitable for AI agents, test automation, and any tasks requiring browser interaction.
z.ai2api_python
Z.AI2API Python is a lightweight OpenAI API proxy service that integrates seamlessly with existing applications. It supports the full functionality of GLM-4.5 series models and features high-performance streaming responses, enhanced tool invocation, support for thinking mode, integration with search models, Docker deployment, session isolation for privacy protection, flexible configuration via environment variables, and intelligent upstream model routing.
aiohomematic
AIO Homematic (hahomematic) is a lightweight Python 3 library for controlling and monitoring HomeMatic and HomematicIP devices, with support for third-party devices/gateways. It automatically creates entities for device parameters, offers custom entity classes for complex behavior, and includes features like caching paramsets for faster restarts. Designed to integrate with Home Assistant, it requires specific firmware versions for HomematicIP devices. The public API is defined in modules like central, client, model, exceptions, and const, with example usage provided. Useful links include changelog, data point definitions, troubleshooting, and developer resources for architecture, data flow, model extension, and Home Assistant lifecycle.
openakita
OpenAkita is a self-evolving AI Agent framework that autonomously learns new skills, performs daily self-checks and repairs, accumulates experience from task execution, and persists until the task is done. It auto-generates skills, installs dependencies, learns from mistakes, and remembers preferences. The framework is standards-based, multi-platform, and provides a Setup Center GUI for intuitive installation and configuration. It features self-learning and evolution mechanisms, a Ralph Wiggum Mode for persistent execution, multi-LLM endpoints, multi-platform IM support, desktop automation, multi-agent architecture, scheduled tasks, identity and memory management, a tool system, and a guided wizard for setup.
memsearch
Memsearch is a tool that allows users to give their AI agents persistent memory in a few lines of code. It enables users to write memories as markdown and search them semantically. Inspired by OpenClaw's markdown-first memory architecture, Memsearch is pluggable into any agent framework. The tool offers features like smart deduplication, live sync, and a ready-made Claude Code plugin for building agent memory.
Agentic-ADK
Agentic ADK is an Agent application development framework launched by Alibaba International AI Business, based on Google-ADK and Ali-LangEngine. It is used for developing, constructing, evaluating, and deploying powerful, flexible, and controllable complex AI Agents. ADK aims to make Agent development simpler and more user-friendly, enabling developers to more easily build, deploy, and orchestrate various Agent applications ranging from simple tasks to complex collaborations.
boxlite
BoxLite is an embedded, lightweight micro-VM runtime designed for AI agents running OCI containers with hardware-level isolation. It is built for high concurrency with no daemon required, offering features like lightweight VMs, high concurrency, hardware isolation, embeddability, and OCI compatibility. Users can spin up 'Boxes' to run containers for AI agent sandboxes and multi-tenant code execution scenarios where Docker alone is insufficient and full VM infrastructure is too heavy. BoxLite supports Python, Node.js, and Rust with quick start guides for each, along with features like CPU/memory limits, storage options, networking capabilities, security layers, and image registry configuration. The tool provides SDKs for Python and Node.js, with Go support coming soon. It offers detailed documentation, examples, and architecture insights for users to understand how BoxLite works under the hood.
gin-vue-admin
Gin-vue-admin is a full-stack development platform based on Vue and Gin, integrating features like JWT authentication, dynamic routing, dynamic menus, Casbin authorization, form generator, code generator, etc. It provides various example files to help users focus more on business development. The project offers detailed documentation, video tutorials for setup and deployment, and a community for support and contributions. Users need a certain level of knowledge in Golang and Vue to work with this project. It is recommended to follow the Apache2.0 license if using the project for commercial purposes.
kweaver
KWeaver is an open-source ecosystem for building, deploying, and running decision intelligence AI applications. It adopts ontology as the core methodology for business knowledge networks, with DIP as the core platform, aiming to provide elastic, agile, and reliable enterprise-grade decision intelligence to further unleash productivity. The DIP platform includes key subsystems such as ADP, Decision Agent, DIP Studio, and AI Store.
py-xiaozhi
py-xiaozhi is a Python-based XiaoZhi voice client designed for learning code and experiencing AI XiaoZhi's voice functions without hardware conditions. It features voice interaction, graphical interface, volume control, session management, encrypted audio transmission, CLI mode, and automatic copying of verification codes and opening browsers for first-time users. The project aims to optimize and add new features to zhh827's py-xiaozhi based on the original hardware project xiaozhi-esp32 and the Python implementation py-xiaozhi.
For similar tasks
auto-paper-digest
Auto Paper Digest (APD) is a tool designed to automatically fetch cutting-edge AI research papers, download PDFs, generate video explanations, and publish them on platforms like HuggingFace, Douyin, and portal websites. It provides functionalities such as fetching papers from Hugging Face, downloading PDFs from arXiv, generating videos using NotebookLM, automatic publishing to HuggingFace Dataset, automatic publishing to Douyin, and hosting videos on a Gradio portal website. The tool also supports resuming interrupted tasks, persistent login states for Google and Douyin, and a structured workflow divided into three phases: Upload, Download, and Publish.
SciPIP
SciPIP is a scientific paper idea generation tool powered by a large language model (LLM) designed to assist researchers in quickly generating novel research ideas. It conducts a literature review based on user-provided background information and generates fresh ideas for potential studies. The tool is designed to help researchers in various fields by providing a GUI environment for idea generation, supporting NLP, multimodal, and CV fields, and allowing users to interact with the tool through a web app or terminal. SciPIP uses Neo4j as its database and provides functionalities for generating new ideas, fetching papers, and constructing the database.
cog-comfyui
Cog-comfyui allows users to run ComfyUI workflows on Replicate. ComfyUI is a visual programming tool for creating and sharing generative art workflows. With cog-comfyui, users can access a variety of pre-trained models and custom nodes to create their own unique artworks. The tool is easy to use and does not require any coding experience. Users simply need to upload their API JSON file and any necessary input files, and then click the "Run" button. Cog-comfyui will then generate the output image or video file.
biniou
biniou is a self-hosted webui for various GenAI (generative artificial intelligence) tasks. It allows users to generate multimedia content using AI models and chatbots on their own computer, even without a dedicated GPU. The tool can work offline once deployed and required models are downloaded. It offers a wide range of features for text, image, audio, video, and 3D object generation and modification. Users can easily manage the tool through a control panel within the webui, with support for various operating systems and CUDA optimization. biniou is powered by Huggingface and Gradio, providing a cross-platform solution for AI content generation.
Awesome-Colorful-LLM
Awesome-Colorful-LLM is a meticulously assembled anthology of vibrant multimodal research focusing on advancements propelled by large language models (LLMs) in domains such as Vision, Audio, Agent, Robotics, and Fundamental Sciences like Mathematics. The repository contains curated collections of works, datasets, benchmarks, projects, and tools related to LLMs and multimodal learning. It serves as a comprehensive resource for researchers and practitioners interested in exploring the intersection of language models and various modalities for tasks like image understanding, video pretraining, 3D modeling, document understanding, audio analysis, agent learning, robotic applications, and mathematical research.
omniscient
Omniscient is an advanced AI Platform offered as a SaaS, empowering projects with cutting-edge artificial intelligence capabilities. Seamlessly integrating with Next.js 14, React, Typescript, and APIs like OpenAI and Replicate, it provides solutions for code generation, conversation simulation, image creation, music composition, and video generation.
so-vits-models
This repository collects various LLM, AI-related models, applications, and datasets, including LLM-Chat for dialogue models, LLMs for large models, so-vits-svc for sound-related models, stable-diffusion for image-related models, and virtual-digital-person for generating videos. It also provides resources for deep learning courses and overviews, AI competitions, and specific AI tasks such as text, image, voice, and video processing.
jimeng-free-api-all
Jimeng AI Free API is a reverse-engineered API server that encapsulates Jimeng AI's image and video generation capabilities into OpenAI-compatible API interfaces. It supports the latest jimeng-5.0-preview, jimeng-4.6 text-to-image models, Seedance 2.0 multi-image intelligent video generation, zero-configuration deployment, and multi-token support. The API is fully compatible with OpenAI API format, seamlessly integrating with existing clients and supporting multiple session IDs for polling usage.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.
