SurfSense

A Knowledge Graph Brain 🧠 for World Wide Web Surfers. Never forget anything you see on the Internet

Stars: 152

Visit

SurfSense is a tool designed to help users save and organize content from the internet into a personal Knowledge Graph. It allows users to capture web browsing sessions and webpage content using a Chrome extension, enabling easy retrieval and recall of saved information. SurfSense offers features like powerful search capabilities, natural language interaction with saved content, self-hosting options, and integration with GraphRAG for meaningful content relations. The tool eliminates the need for web scraping by directly reading data from the DOM, making it a convenient solution for managing online information.

README:

SurfSense

Well when I’m browsing the internet, I tend to save a ton of content—but remembering when and what you saved? Total brain freeze! That’s where SurfSense comes in. SurfSense is like a Knowledge Graph Brain 🧠 for anything you see (Social Media Chats, Calender Invites, Important Mails, Tutorials, Recipies and anything ) on the World Wide Web. Now, you’ll never forget any browsing session. Easily capture your web browsing session and desired webpage content using an easy-to-use cross browser extension. Then, ask your personal knowledge base anything about your saved content, and voilà—instant recall!

Video

https://github.com/user-attachments/assets/37985a8b-acbd-4fff-b276-512bbf0bf6aa

Key Features

💡 Idea: Save any content you see on the internet in your own Knowledge Graph.
⚙️ Cross Browser Extension: Save content from your favourite browser.
🔍 Powerful Search: Quickly find anything in your Web Browsing Sessions.
💬 Chat with your Web History: Interact in Natural Language with your saved Web Browsing Sessions.
🏠 Self Hostable: Open source and easy to deploy locally.
📊 Use GraphRAG: Utilize the power of GraphRAG to find meaningful relations in your saved content.
🔟% Cheap On Wallet: Works Flawlessly with OpenAI gpt-4o-mini model.
🕸️ No WebScraping: Extension directly reads the data from DOM.
🔔 Automatic Important Notifications: Get critical notifications such as important meetings, invites etc.

How to get started?

Before we begin, we need to set up our Neo4j Graph Database. This is where SurfSense stores all your saved information. For a quick setup, I suggest getting your free Neo4j Aura DB from https://neo4j.com/cloud/platform/aura-graph-database/ or setting it up locally.

After obtaining your Neo4j credentials, make sure to get your OpenAI API Key from https://platform.openai.com/.

UPDATE 24 AUGUST 2024: Extension code is now migrated to Plasmo. You can use extension in any webbrowser. All Webstore links will be updated soon.

Register Your SurfSense account at https://www.surfsense.net/signup
Download SurfSense Extension from https://chromewebstore.google.com/detail/surfsense/jihmihbdpfjhppdlifphccgefjhifblf

Now you are ready to use SurfSense. Start by first logging into the Extension.

When you start the extension you should see a Login page like this

After logging in you will need to fill your Neo4j Credentials & OpenAPI Key.

After Saving you should be able to use extension now.

Options	Explanations
Clear Inactive History Sessions	It clears the saved content for Inactive Tab Sessions.
Save Current Webpage Snapshot	Stores the current webpage session info into SurfSense history store
Save to SurfSense	Processes the SurfSense History Store & Initiates a Save Job

Now just start browsing the Internet. Whatever you want to save any content take its Snapshot and save it to SurfSense. After Save Job is completed you are ready to ask anything about it to your Knowledge Graph Brain 🧠.
Critical Notifications are automatically generated. Check them out at https://www.surfsense.net/notifications .

Now go to SurfSense Chat Options at https://www.surfsense.net/chat & fill the Neo4j Credentials & OpenAPI Key if asked.

OPTIONS	DESCRIPTION
Precision Chat	Used for detailed search and chatting with your saved web sessions and their content.
General Chat	Used for general questions about your content. Doesn't work well with Dates & Time.

Chat Screenshots

PRECISION

Search

Results

Multi Webpage Chat

GENERAL

As an example lets visit : https://myanimelist.net/anime/season (Summer 2024 Anime atm) and save it to SurfSense.

Now lets ask SurfSense "Give list of summer 2024 animes with images."

Sample Response:

Now Let's ask it more information about our related session.

Sample More Description Response:

Local Setup Guide

Backend

For authentication purposes, you’ll also need a PostgreSQL instance running on your machine.

Now lets setup the SurfSense BackEnd

Clone this repo.
Go to ./backend subdirectory.
Setup Python Virtual Enviroment
Run pip install -r requirements.txt to install all required dependencies.
Update the required Environment variables in envs.py

ENV VARIABLE	Description
POSTGRES_DATABASE_URL	postgresql+psycopg2://user:pass@host:5432/database
API_SECRET_KEY	Can be any Random String value. Make Sure to remember it for as you need to send it in request to Backend for security purposes.

Backend is a FastAPI Backend so now just run the server on unicorn using command uvicorn server:app --host 0.0.0.0 --port 8000
If everything worked fine you should see screen like this.

Extension

UPDATE: Extension code is now migrated to Plasmo. Follow this guide to build for your target browser now : https://docs.plasmo.com/framework/workflows/build

env eg in .env.local

Now resister a quick user through Swagger API > Try it Out: http://127.0.0.1:8000/docs#/default/register_user_register_post

Make Sure in request body "apisecretkey" value is same value as API_SECRET_KEY we been assigning.

FrontEnd

For local frontend setup just fill out the .env file of frontend.

ENV VARIABLE	DESCRIPTION
NEXT_PUBLIC_API_SECRET_KEY	Same String value your set for Backend & Extension
NEXT_PUBLIC_BACKEND_URL	Give hosted backend url here. Eg. `http://127.0.0.1:8000`
NEXT_PUBLIC_RECAPTCHA_SITE_KEY	Google Recaptcha v2 Client Key
RECAPTCHA_SECRET_KEY	Google Recaptcha v2 Server Key

and run it using pnpm run dev

You should see your Next.js frontend running at localhost:3000

Tech Stack

Extenstion : Chrome Manifest v3
BackEnd : FastAPI with LangChain
FrontEnd: Next.js with Aceternity.

Architecture:

In Progress...........

Future Work

Based on feedback, I will work on making it compatible with local models.
Cross Browser Extension [Done]
Generalize the way SurfSense uses Graphs. Will soon make an integration with FalkorDB soon.
Critical Notifications [Done]
Saving Chats [Done]
Basic keyword search page for saved sessions [Done]
Multi & Single Document Chat [Done]
Implement some tricks from GraphRAG papers to optimize current GraphRAG logic.

Contribute

Contributions are very welcome! A contribution can be as small as a ⭐ or even finding and creating issues. Fine-tuning the Backend is always desired.

For Tasks:

Click tags to check more tools for each tasks

save webpage content search web browsing sessions chat with saved content host personal knowledge graph analyze content relations

For Jobs:

content curator research analyst knowledge manager digital marketer data scientist

Alternative AI tools for SurfSense

Similar Open Source Tools

SurfSense

github

: 152

obsidian-smart-composer

Smart Composer is an Obsidian plugin that enhances note-taking and content creation by integrating AI capabilities. It allows users to efficiently write by referencing their vault content, providing contextual chat with precise context selection, multimedia context support for website links and images, document edit suggestions, and vault search for relevant notes. The plugin also offers features like custom model selection, local model support, custom system prompts, and prompt templates. Users can set up the plugin by installing it through the Obsidian community plugins, enabling it, and configuring API keys for supported providers like OpenAI, Anthropic, and Gemini. Smart Composer aims to streamline the writing process by leveraging AI technology within the Obsidian platform.

github

: 1.1k

zero-true

Zero-True is a Python and SQL reactive computational notebook designed for building and collaborating on data-driven applications. It offers an integrated and simple environment with transparent updates, dynamic and interactive UI rendering, fast prototyping capabilities, and open-source community contributions. Users can create rich, reactive apps with ease and publish them confidently. Zero-True aims to improve data accessibility and foster collaboration among users.

github

: 58

open-webui

Open WebUI is an extensible, feature-rich, and user-friendly self-hosted WebUI designed to operate entirely offline. It supports various LLM runners, including Ollama and OpenAI-compatible APIs. For more information, be sure to check out our Open WebUI Documentation.

github

: 111.1k

ai-driven-dev-community

AI Driven Dev Community is a repository aimed at helping developers become more efficient by utilizing AI tools in their daily coding tasks. It provides a collection of tools, prompts, snippets, and agents for developers to integrate AI into their workflow. The repository is regularly updated with new resources and focuses on best practices for using AI in development work. Users can find tools like Espanso, ChatGPT, GitHub Copilot, and VSCode recommended for enhancing their coding experience. Additionally, the repository offers guidance on customizing AI for developers, installing AI toolbox for software engineers, and contributing to the community through easy steps.

github

: 69

workbench-example-hybrid-rag

This NVIDIA AI Workbench project is designed for developing a Retrieval Augmented Generation application with a customizable Gradio Chat app. It allows users to embed documents into a locally running vector database and run inference locally on a Hugging Face TGI server, in the cloud using NVIDIA inference endpoints, or using microservices via NVIDIA Inference Microservices (NIMs). The project supports various models with different quantization options and provides tutorials for using different inference modes. Users can troubleshoot issues, customize the Gradio app, and access advanced tutorials for specific tasks.

github

: 106

bagel

Bagel is a tool that allows users to chat with their robotics and drone data similar to using ChatGPT. It generates deterministic and auditable DuckDB SQL queries to analyze data, supporting various robotics and sensor log formats. Users can interact with Bagel through a Discord server, and it can be integrated with different language models. Bagel provides tutorials, Docker images for easy deployment, and a roadmap for upcoming features like Computer Vision Module, Anomaly Detection, and more.

github

: 335

extensionOS

Extension | OS is an open-source browser extension that brings AI directly to users' web browsers, allowing them to access powerful models like LLMs seamlessly. Users can create prompts, fix grammar, and access intelligent assistance without switching tabs. The extension aims to revolutionize online information interaction by integrating AI into everyday browsing experiences. It offers features like Prompt Factory for tailored prompts, seamless LLM model access, secure API key storage, and a Mixture of Agents feature. The extension was developed to empower users to unleash their creativity with custom prompts and enhance their browsing experience with intelligent assistance.

github

: 73

BMAD-METHOD

BMAD-METHOD™ is a universal AI agent framework that revolutionizes Agile AI-Driven Development. It offers specialized AI expertise across various domains, including software development, entertainment, creative writing, business strategy, and personal wellness. The framework introduces two key innovations: Agentic Planning, where dedicated agents collaborate to create detailed specifications, and Context-Engineered Development, which ensures complete understanding and guidance for developers. BMAD-METHOD™ simplifies the development process by eliminating planning inconsistency and context loss, providing a seamless workflow for creating AI agents and expanding functionality through expansion packs.

github

: 16.4k

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675

WeClone

WeClone is an all-in-one solution for creating your digital twin from chat records. It allows users to fine-tune large language models using their chat history, capturing their unique style and personality to integrate into a chatbot, effectively creating a digital avatar. The tool offers digital cloning, chatbot integration, user-friendly interface for managing chat records, fine-tuning with LoRA, and cross-platform compatibility.

github

: 85

nanobrowser

Nanobrowser is an open-source AI web automation tool that runs in your browser. It is a free alternative to OpenAI Operator with flexible LLM options and a multi-agent system. Nanobrowser offers premium web automation capabilities while keeping users in complete control, with features like a multi-agent system, interactive side panel, task automation, follow-up questions, and multiple LLM support. Users can easily download and install Nanobrowser as a Chrome extension, configure agent models, and accomplish tasks such as news summary, GitHub research, and shopping research with just a sentence. The tool uses a specialized multi-agent system powered by large language models to understand and execute complex web tasks. Nanobrowser is actively developed with plans to expand LLM support, implement security measures, optimize memory usage, enable session replay, and develop specialized agents for domain-specific tasks. Contributions from the community are welcome to improve Nanobrowser and build the future of web automation.

github

: 9.8k

AgentGenesis

AgentGenesis is an open-source web app that provides customizable Gen AI code snippets for building custom RAG flows and AI agents. Users can easily copy and paste components into their applications, modify the code as needed, and build on top of it. The tool aims to simplify the development process by offering pre-built code snippets that can be integrated seamlessly into projects without the need for npm installation. AgentGenesis encourages contributions, positive feedback, and support from the community to enhance and grow the project.

github

: 112

swark

Swark is a VS Code extension that automatically generates architecture diagrams from code using large language models (LLMs). It is directly integrated with GitHub Copilot, requires no authentication or API key, and supports all languages. Swark helps users learn new codebases, review AI-generated code, improve documentation, understand legacy code, spot design flaws, and gain test coverage insights. It saves output in a 'swark-output' folder with diagram and log files. Source code is only shared with GitHub Copilot for privacy. The extension settings allow customization for file reading, file extensions, exclusion patterns, and language model selection. Swark is open source under the GNU Affero General Public License v3.0.

github

: 274

mattermost-plugin-agents

The Mattermost Agents Plugin integrates AI capabilities directly into your Mattermost workspace, allowing users to run local LLMs on their infrastructure or connect to cloud providers. It offers multiple AI assistants with specialized personalities, thread and channel summarization, action item extraction, meeting transcription, semantic search, smart reactions, direct conversations with AI assistants, and flexible LLM support. The plugin comes with comprehensive documentation, installation instructions, system requirements, and development guidelines for users to interact with AI features and configure LLM providers.

github

: 183

mobile-use

Mobile-use is an open-source AI agent that controls Android or IOS devices using natural language. It understands commands to perform tasks like sending messages and navigating apps. Features include natural language control, UI-aware automation, data scraping, and extensibility. Users can automate their mobile experience by setting up environment variables, customizing LLM configurations, and launching the tool via Docker or manually for development. The tool supports physical Android phones, Android simulators, and iOS simulators. Contributions are welcome, and the project is licensed under MIT.

github

: 1.7k

For similar tasks

SurfSense

github

: 152

For similar jobs

book

Podwise is an AI knowledge management app designed specifically for podcast listeners. With the Podwise platform, you only need to follow your favorite podcasts, such as "Hardcore Hackers". When a program is released, Podwise will use AI to transcribe, extract, summarize, and analyze the podcast content, helping you to break down the hard-core podcast knowledge. At the same time, it is connected to platforms such as Notion, Obsidian, Logseq, and Readwise, embedded in your knowledge management workflow, and integrated with content from other channels including news, newsletters, and blogs, helping you to improve your second brain 🧠.

github

: 1.0k

extractor

Extractor is an AI-powered data extraction library for Laravel that leverages OpenAI's capabilities to effortlessly extract structured data from various sources, including images, PDFs, and emails. It features a convenient wrapper around OpenAI Chat and Completion endpoints, supports multiple input formats, includes a flexible Field Extractor for arbitrary data extraction, and integrates with Textract for OCR functionality. Extractor utilizes JSON Mode from the latest GPT-3.5 and GPT-4 models, providing accurate and efficient data extraction.

github

: 86

Scrapegraph-ai

ScrapeGraphAI is a Python library that uses Large Language Models (LLMs) and direct graph logic to create web scraping pipelines for websites, documents, and XML files. It allows users to extract specific information from web pages by providing a prompt describing the desired data. ScrapeGraphAI supports various LLMs, including Ollama, OpenAI, Gemini, and Docker, enabling users to choose the most suitable model for their needs. The library provides a user-friendly interface through its `SmartScraper` class, which simplifies the process of building and executing scraping pipelines. ScrapeGraphAI is open-source and available on GitHub, with extensive documentation and examples to guide users. It is particularly useful for researchers and data scientists who need to extract structured data from web pages for analysis and exploration.

github

: 12.8k

databerry

Chaindesk is a no-code platform that allows users to easily set up a semantic search system for personal data without technical knowledge. It supports loading data from various sources such as raw text, web pages, files (Word, Excel, PowerPoint, PDF, Markdown, Plain Text), and upcoming support for web sites, Notion, and Airtable. The platform offers a user-friendly interface for managing datastores, querying data via a secure API endpoint, and auto-generating ChatGPT Plugins for each datastore. Chaindesk utilizes a Vector Database (Qdrant), Openai's text-embedding-ada-002 for embeddings, and has a chunk size of 1024 tokens. The technology stack includes Next.js, Joy UI, LangchainJS, PostgreSQL, Prisma, and Qdrant, inspired by the ChatGPT Retrieval Plugin.

github

: 2.9k

auto-news

Auto-News is an automatic news aggregator tool that utilizes Large Language Models (LLM) to pull information from various sources such as Tweets, RSS feeds, YouTube videos, web articles, Reddit, and journal notes. The tool aims to help users efficiently read and filter content based on personal interests, providing a unified reading experience and organizing information effectively. It features feed aggregation with summarization, transcript generation for videos and articles, noise reduction, task organization, and deep dive topic exploration. The tool supports multiple LLM backends, offers weekly top-k aggregations, and can be deployed on Linux/MacOS using docker-compose or Kubernetes.

github

: 465

SemanticFinder

SemanticFinder is a frontend-only live semantic search tool that calculates embeddings and cosine similarity client-side using transformers.js and SOTA embedding models from Huggingface. It allows users to search through large texts like books with pre-indexed examples, customize search parameters, and offers data privacy by keeping input text in the browser. The tool can be used for basic search tasks, analyzing texts for recurring themes, and has potential integrations with various applications like wikis, chat apps, and personal history search. It also provides options for building browser extensions and future ideas for further enhancements and integrations.

github

: 204

1filellm

1filellm is a command-line data aggregation tool designed for LLM ingestion. It aggregates and preprocesses data from various sources into a single text file, facilitating the creation of information-dense prompts for large language models. The tool supports automatic source type detection, handling of multiple file formats, web crawling functionality, integration with Sci-Hub for research paper downloads, text preprocessing, and token count reporting. Users can input local files, directories, GitHub repositories, pull requests, issues, ArXiv papers, YouTube transcripts, web pages, Sci-Hub papers via DOI or PMID. The tool provides uncompressed and compressed text outputs, with the uncompressed text automatically copied to the clipboard for easy pasting into LLMs.

github

: 292

Agently-Daily-News-Collector

Agently Daily News Collector is an open-source project showcasing a workflow powered by the Agent ly AI application development framework. It allows users to generate news collections on various topics by inputting the field topic. The AI agents automatically perform the necessary tasks to generate a high-quality news collection saved in a markdown file. Users can edit settings in the YAML file, install Python and required packages, input their topic idea, and wait for the news collection to be generated. The process involves tasks like outlining, searching, summarizing, and preparing column data. The project dependencies include Agently AI Development Framework, duckduckgo-search, BeautifulSoup4, and PyYAM.

github

: 338