auto-news

auto-news

A personal news aggregator to pull information from multi-sources + LLM (ChatGPT via LangChain) to help us reading efficiently with less noises, the sources including: Tweets, RSS, YouTube, Web Articles, Reddit, and personal Journal notes.

Stars: 219

Visit
 screenshot

Auto-News is an automatic news aggregator tool that utilizes Large Language Models (LLM) to pull information from various sources such as Tweets, RSS feeds, YouTube videos, web articles, Reddit, and journal notes. The tool aims to help users efficiently read and filter content based on personal interests, providing a unified reading experience and organizing information effectively. It features feed aggregation with summarization, transcript generation for videos and articles, noise reduction, task organization, and deep dive topic exploration. The tool supports multiple LLM backends, offers weekly top-k aggregations, and can be deployed on Linux/MacOS using docker-compose or Kubernetes.

README:

Auto-News: An Automatic News Aggregator with LLM

GitHub Build Kubernetes ChatGPT Notion Helm

A personal news aggregator to pull information from multi-sources + LLM (ChatGPT) to help us read efficiently with less noise, the sources including Tweets, RSS, YouTube, Web Articles, Reddit, and random Journal notes.

Why need it?

In the world of this information explosion, we live with noise every day, it becomes even worse after the generative AI was born. Time is a precious resource for each of us, How to use our time more efficiently? It becomes more challenging than ever. Think about how much time we spent on pulling/searching/filtering content from different sources, how many times we put the article/paper or long video as a side tab, but never got a chance to look at, and how much effort to organize the information we have read. We need a better way to get rid of the noises, focus on reading the information efficiently based on our interests, and stay on track with the goals we defined.

See this Blog post and these videos Introduction, Data flows for more details.

https://github.com/finaldie/auto-news/assets/1088543/4387f688-61d3-4270-b5a6-105aa8ee0ea9

Features

  • Aggregate feed sources (including RSS, Reddit, Tweets, etc) with summarization
  • Summarize YouTube videos (generate transcript if needed)
  • Summarize Web Articles (generate transcript if needed)
  • Filter content based on personal interests and remove 80%+ noises
  • A unified/central reading experience (e.g., RSS reader-like style, Notion based)
  • [LLM] Generate TODO list from Takeaways/Journal-notes
  • [LLM] Organize Journal notes with summarization and insights
  • [LLM] Experimental Deepdive topic via web search agent and autogen
  • Multi-LLM backend: OpenAI ChatGPT, Google Gemini
  • Weekly top-k aggregations

image

Documentation

https://github.com/finaldie/auto-news/wiki

Architecture

  • UI: Notion-based, cross-platform (Web browser, iOS/Android app)
  • Backend: Runs on Linux/MacOS

image

Backend System Requirements

Component Minimum Requirements Recommended
OS Linux, MacOS Linux, MacOS
CPU 2 cores 8 cores
Memory 6GB 16GB
Disk 20GB 100GB

Kubernetes Deployment

See the installation guide from:

Quick Start Guide (Docker-compose)

Preparison

[UI] Create Notion Entry Page

Go to Notion, create a page as the main entry (For example Readings page), and enable Notion Integration for this page

[Backend] Create Environment File

Checkout the repo and copy .env.template to build/.env, then fill up the environment vars:

  • NOTION_TOKEN
  • NOTION_ENTRY_PAGE_ID
  • OPENAI_API_KEY
  • [Optional] REDDIT_CLIENT_ID and REDDIT_CLIENT_SECRET
  • [Optional] Vars with TWITTER_ prefix

[Backend] Build Services

make deps && make build && make deploy && make init

[Backend] Start Services

make start

Now, the services are up and running, it will pull sources every hour.

[UI] Set up Notion Tweet/RSS/Reddit list

Go to the Notion entry page we created before, and we will see the following folder structure has been created automatically:

Readings
├── Inbox
│   ├── Inbox - Article
│   └── Inbox - YouTube
│   └── Inbox - Journal
├── Index
│   ├── Index - Inbox
│   ├── Index - ToRead
│   ├── RSS_List
│   └── Tweet_List
│   └── Reddit_List
└── ToRead
    └── ToRead
  • Go to RSS_List page, and fill in the RSS name and URL
  • Go to Reddit_List page, and fill the subreddit names
  • Go to Tweet_List page, and fill in the Tweet screen names (Tips: Paid Account Only)

[UI] Set up Notion database views

Go to Notion ToRead database page, all the data will flow into this database later on, create the database views for different sources to help us organize flows easier. E.g. Tweets, Articles, YouTube, RSS, etc

Now, enjoy and have fun.

Operations

[Monitoring] Control Panel

For troubleshooting, we can use the URLs below to access the services and check the logs and data

Service Role Panel URL
Airflow Orchestration http://localhost:8080
Milvus Vector Database http://localhost:9100
Adminer DB accessor http://localhost:8070

Stop/Restart Services

In case we want, apply the following commands from the codebase folder.

# stop
make stop

# restart
make stop && make start

Redeploy .env and DAGs

make stop && make init && make start

Upgrade to the latest code

make upgrade && make stop && make init && make start

Rebuild Docker Images

make stop && make build && make init && make start

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for auto-news

Similar Open Source Tools

For similar tasks

For similar jobs