SurfSense
A Knowledge Graph Brain 🧠 for World Wide Web Surfers. Never forget anything you see on the Internet
Stars: 152
SurfSense is a tool designed to help users save and organize content from the internet into a personal Knowledge Graph. It allows users to capture web browsing sessions and webpage content using a Chrome extension, enabling easy retrieval and recall of saved information. SurfSense offers features like powerful search capabilities, natural language interaction with saved content, self-hosting options, and integration with GraphRAG for meaningful content relations. The tool eliminates the need for web scraping by directly reading data from the DOM, making it a convenient solution for managing online information.
README:
Well when I’m browsing the internet, I tend to save a ton of content—but remembering when and what you saved? Total brain freeze! That’s where SurfSense comes in. SurfSense is like a Knowledge Graph Brain 🧠 for anything you see (Social Media Chats, Calender Invites, Important Mails, Tutorials, Recipies and anything ) on the World Wide Web. Now, you’ll never forget any browsing session. Easily capture your web browsing session and desired webpage content using an easy-to-use cross browser extension. Then, ask your personal knowledge base anything about your saved content, and voilà—instant recall!
https://github.com/user-attachments/assets/37985a8b-acbd-4fff-b276-512bbf0bf6aa
- 💡 Idea: Save any content you see on the internet in your own Knowledge Graph.
- ⚙️ Cross Browser Extension: Save content from your favourite browser.
- 🔍 Powerful Search: Quickly find anything in your Web Browsing Sessions.
- 💬 Chat with your Web History: Interact in Natural Language with your saved Web Browsing Sessions.
- 🏠 Self Hostable: Open source and easy to deploy locally.
- 📊 Use GraphRAG: Utilize the power of GraphRAG to find meaningful relations in your saved content.
- 🔟% Cheap On Wallet: Works Flawlessly with OpenAI gpt-4o-mini model.
- 🕸️ No WebScraping: Extension directly reads the data from DOM.
- 🔔 Automatic Important Notifications: Get critical notifications such as important meetings, invites etc.
Before we begin, we need to set up our Neo4j Graph Database. This is where SurfSense stores all your saved information. For a quick setup, I suggest getting your free Neo4j Aura DB from https://neo4j.com/cloud/platform/aura-graph-database/ or setting it up locally.
After obtaining your Neo4j credentials, make sure to get your OpenAI API Key from https://platform.openai.com/.
UPDATE 24 AUGUST 2024: Extension code is now migrated to Plasmo. You can use extension in any webbrowser. All Webstore links will be updated soon.
- Register Your SurfSense account at https://www.surfsense.net/signup
- Download SurfSense Extension from https://chromewebstore.google.com/detail/surfsense/jihmihbdpfjhppdlifphccgefjhifblf
Now you are ready to use SurfSense. Start by first logging into the Extension.
When you start the extension you should see a Login page like this
After logging in you will need to fill your Neo4j Credentials & OpenAPI Key.
After Saving you should be able to use extension now.
Options | Explanations |
---|---|
Clear Inactive History Sessions | It clears the saved content for Inactive Tab Sessions. |
Save Current Webpage Snapshot | Stores the current webpage session info into SurfSense history store |
Save to SurfSense | Processes the SurfSense History Store & Initiates a Save Job |
- Now just start browsing the Internet. Whatever you want to save any content take its Snapshot and save it to SurfSense. After Save Job is completed you are ready to ask anything about it to your Knowledge Graph Brain 🧠.
- Critical Notifications are automatically generated. Check them out at https://www.surfsense.net/notifications .
- Now go to SurfSense Chat Options at https://www.surfsense.net/chat & fill the Neo4j Credentials & OpenAPI Key if asked.
OPTIONS | DESCRIPTION |
---|---|
Precision Chat | Used for detailed search and chatting with your saved web sessions and their content. |
General Chat | Used for general questions about your content. Doesn't work well with Dates & Time. |
As an example lets visit : https://myanimelist.net/anime/season (Summer 2024 Anime atm) and save it to SurfSense.
Now lets ask SurfSense "Give list of summer 2024 animes with images."
Sample Response:
Now Let's ask it more information about our related session.
Sample More Description Response:
For authentication purposes, you’ll also need a PostgreSQL instance running on your machine.
Now lets setup the SurfSense BackEnd
- Clone this repo.
- Go to ./backend subdirectory.
- Setup Python Virtual Enviroment
- Run
pip install -r requirements.txt
to install all required dependencies. - Update the required Environment variables in envs.py
ENV VARIABLE | Description |
---|---|
POSTGRES_DATABASE_URL | postgresql+psycopg2://user:pass@host:5432/database |
API_SECRET_KEY | Can be any Random String value. Make Sure to remember it for as you need to send it in request to Backend for security purposes. |
- Backend is a FastAPI Backend so now just run the server on unicorn using command
uvicorn server:app --host 0.0.0.0 --port 8000
- If everything worked fine you should see screen like this.
UPDATE: Extension code is now migrated to Plasmo. Follow this guide to build for your target browser now : https://docs.plasmo.com/framework/workflows/build
env eg in .env.local
Now resister a quick user through Swagger API > Try it Out: http://127.0.0.1:8000/docs#/default/register_user_register_post
Make Sure in request body "apisecretkey"
value is same value as API_SECRET_KEY
we been assigning.
For local frontend setup just fill out the .env
file of frontend.
ENV VARIABLE | DESCRIPTION |
---|---|
NEXT_PUBLIC_API_SECRET_KEY | Same String value your set for Backend & Extension |
NEXT_PUBLIC_BACKEND_URL | Give hosted backend url here. Eg. http://127.0.0.1:8000
|
NEXT_PUBLIC_RECAPTCHA_SITE_KEY | Google Recaptcha v2 Client Key |
RECAPTCHA_SECRET_KEY | Google Recaptcha v2 Server Key |
and run it using pnpm run dev
You should see your Next.js frontend running at localhost:3000
- Extenstion : Chrome Manifest v3
- BackEnd : FastAPI with LangChain
- FrontEnd: Next.js with Aceternity.
In Progress...........
- Based on feedback, I will work on making it compatible with local models.
- Cross Browser Extension [Done]
- Generalize the way SurfSense uses Graphs. Will soon make an integration with FalkorDB soon.
- Critical Notifications [Done]
- Saving Chats [Done]
- Basic keyword search page for saved sessions [Done]
- Multi & Single Document Chat [Done]
- Implement some tricks from GraphRAG papers to optimize current GraphRAG logic.
Contributions are very welcome! A contribution can be as small as a ⭐ or even finding and creating issues. Fine-tuning the Backend is always desired.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for SurfSense
Similar Open Source Tools
SurfSense
SurfSense is a tool designed to help users save and organize content from the internet into a personal Knowledge Graph. It allows users to capture web browsing sessions and webpage content using a Chrome extension, enabling easy retrieval and recall of saved information. SurfSense offers features like powerful search capabilities, natural language interaction with saved content, self-hosting options, and integration with GraphRAG for meaningful content relations. The tool eliminates the need for web scraping by directly reading data from the DOM, making it a convenient solution for managing online information.
phospho
Phospho is a text analytics platform for LLM apps. It helps you detect issues and extract insights from text messages of your users or your app. You can gather user feedback, measure success, and iterate on your app to create the best conversational experience for your users.
ai-driven-dev-community
AI Driven Dev Community is a repository aimed at helping developers become more efficient by utilizing AI tools in their daily coding tasks. It provides a collection of tools, prompts, snippets, and agents for developers to integrate AI into their workflow. The repository is regularly updated with new resources and focuses on best practices for using AI in development work. Users can find tools like Espanso, ChatGPT, GitHub Copilot, and VSCode recommended for enhancing their coding experience. Additionally, the repository offers guidance on customizing AI for developers, installing AI toolbox for software engineers, and contributing to the community through easy steps.
workbench-example-hybrid-rag
This NVIDIA AI Workbench project is designed for developing a Retrieval Augmented Generation application with a customizable Gradio Chat app. It allows users to embed documents into a locally running vector database and run inference locally on a Hugging Face TGI server, in the cloud using NVIDIA inference endpoints, or using microservices via NVIDIA Inference Microservices (NIMs). The project supports various models with different quantization options and provides tutorials for using different inference modes. Users can troubleshoot issues, customize the Gradio app, and access advanced tutorials for specific tasks.
quivr
Quivr is a personal assistant powered by Generative AI, designed to be a second brain for users. It offers fast and efficient access to data, ensuring security and compatibility with various file formats. Quivr is open source and free to use, allowing users to share their brains publicly or keep them private. The marketplace feature enables users to share and utilize brains created by others, boosting productivity. Quivr's offline mode provides anytime, anywhere access to data. Key features include speed, security, OS compatibility, file compatibility, open source nature, public/private sharing options, a marketplace, and offline mode.
open-webui
Open WebUI is an extensible, feature-rich, and user-friendly self-hosted WebUI designed to operate entirely offline. It supports various LLM runners, including Ollama and OpenAI-compatible APIs. For more information, be sure to check out our Open WebUI Documentation.
extensionOS
Extension | OS is an open-source browser extension that brings AI directly to users' web browsers, allowing them to access powerful models like LLMs seamlessly. Users can create prompts, fix grammar, and access intelligent assistance without switching tabs. The extension aims to revolutionize online information interaction by integrating AI into everyday browsing experiences. It offers features like Prompt Factory for tailored prompts, seamless LLM model access, secure API key storage, and a Mixture of Agents feature. The extension was developed to empower users to unleash their creativity with custom prompts and enhance their browsing experience with intelligent assistance.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
Easy-Voice-Toolkit
Easy Voice Toolkit is a toolkit based on open source voice projects, providing automated audio tools including speech model training. Users can seamlessly integrate functions like audio processing, voice recognition, voice transcription, dataset creation, model training, and voice conversion to transform raw audio files into ideal speech models. The toolkit supports multiple languages and is currently only compatible with Windows systems. It acknowledges the contributions of various projects and offers local deployment options for both users and developers. Additionally, cloud deployment on Google Colab is available. The toolkit has been tested on Windows OS devices and includes a FAQ section and terms of use for academic exchange purposes.
CursorLens
Cursor Lens is an open-source tool that acts as a proxy between Cursor and various AI providers, logging interactions and providing detailed analytics to help developers optimize their use of AI in their coding workflow. It supports multiple AI providers, captures and logs all requests, provides visual analytics on AI usage, allows users to set up and switch between different AI configurations, offers real-time monitoring of AI interactions, tracks token usage, estimates costs based on token usage and model pricing. Built with Next.js, React, PostgreSQL, Prisma ORM, Vercel AI SDK, Tailwind CSS, and shadcn/ui components.
upscayl
Upscayl is a free and open-source AI image upscaler that uses advanced AI algorithms to enlarge and enhance low-resolution images without losing quality. It is a cross-platform application built with the Linux-first philosophy, available on all major desktop operating systems. Upscayl utilizes Real-ESRGAN and Vulkan architecture for image enhancement, and its backend is fully open-source under the AGPLv3 license. It is important to note that a Vulkan compatible GPU is required for Upscayl to function effectively.
Bobble-AI
AmbuFlow is a mobile application developed using HTML, CSS, JavaScript, and Google API to notify patients of nearby hospitals and provide estimated ambulance arrival times. It offers critical details like patient's location and enhances GPS route management with real-time traffic data for efficient navigation. The app helps users find nearby hospitals, track ambulances in real-time, and manage ambulance routes based on traffic and distance. It ensures quick emergency response, real-time tracking, enhanced communication, resource management, and a user-friendly interface for seamless navigation in high-stress situations.
DevoxxGenieIDEAPlugin
Devoxx Genie is a Java-based IntelliJ IDEA plugin that integrates with local and cloud-based LLM providers to aid in reviewing, testing, and explaining project code. It supports features like code highlighting, chat conversations, and adding files/code snippets to context. Users can modify REST endpoints and LLM parameters in settings, including support for cloud-based LLMs. The plugin requires IntelliJ version 2023.3.4 and JDK 17. Building and publishing the plugin is done using Gradle tasks. Users can select an LLM provider, choose code, and use commands like review, explain, or generate unit tests for code analysis.
miniperplx
MiniPerplx is a minimalistic AI-powered search engine designed to help users find information on the internet. It utilizes AI technologies from providers like OpenAI, Anthropic, and Tavily to deliver accurate and relevant search results. Users can deploy their own instance of MiniPerplx by obtaining API keys, setting up environment variables, and running the development server. The tool aims to streamline the process of information retrieval by leveraging advanced AI capabilities in a user-friendly interface.
obsidian-ai-assistant
Obsidian AI Assistant is a simple plugin that enables interactions with various AI models such as OpenAI ChatGPT, Anthropic Claude, OpenAI DALL·E, and OpenAI Whisper directly from Obsidian notes. The plugin offers features like text assistance, image generation, and speech-to-text functionality. Users can chat with the AI assistant, generate images for notes, and dictate notes using speech-to-text. The plugin allows customization of text models, image generation options, and language settings for speech-to-text. It requires official API keys for using OpenAI and Anthropic Claude models.
For similar tasks
SurfSense
SurfSense is a tool designed to help users save and organize content from the internet into a personal Knowledge Graph. It allows users to capture web browsing sessions and webpage content using a Chrome extension, enabling easy retrieval and recall of saved information. SurfSense offers features like powerful search capabilities, natural language interaction with saved content, self-hosting options, and integration with GraphRAG for meaningful content relations. The tool eliminates the need for web scraping by directly reading data from the DOM, making it a convenient solution for managing online information.
For similar jobs
book
Podwise is an AI knowledge management app designed specifically for podcast listeners. With the Podwise platform, you only need to follow your favorite podcasts, such as "Hardcore Hackers". When a program is released, Podwise will use AI to transcribe, extract, summarize, and analyze the podcast content, helping you to break down the hard-core podcast knowledge. At the same time, it is connected to platforms such as Notion, Obsidian, Logseq, and Readwise, embedded in your knowledge management workflow, and integrated with content from other channels including news, newsletters, and blogs, helping you to improve your second brain 🧠.
extractor
Extractor is an AI-powered data extraction library for Laravel that leverages OpenAI's capabilities to effortlessly extract structured data from various sources, including images, PDFs, and emails. It features a convenient wrapper around OpenAI Chat and Completion endpoints, supports multiple input formats, includes a flexible Field Extractor for arbitrary data extraction, and integrates with Textract for OCR functionality. Extractor utilizes JSON Mode from the latest GPT-3.5 and GPT-4 models, providing accurate and efficient data extraction.
Scrapegraph-ai
ScrapeGraphAI is a Python library that uses Large Language Models (LLMs) and direct graph logic to create web scraping pipelines for websites, documents, and XML files. It allows users to extract specific information from web pages by providing a prompt describing the desired data. ScrapeGraphAI supports various LLMs, including Ollama, OpenAI, Gemini, and Docker, enabling users to choose the most suitable model for their needs. The library provides a user-friendly interface through its `SmartScraper` class, which simplifies the process of building and executing scraping pipelines. ScrapeGraphAI is open-source and available on GitHub, with extensive documentation and examples to guide users. It is particularly useful for researchers and data scientists who need to extract structured data from web pages for analysis and exploration.
databerry
Chaindesk is a no-code platform that allows users to easily set up a semantic search system for personal data without technical knowledge. It supports loading data from various sources such as raw text, web pages, files (Word, Excel, PowerPoint, PDF, Markdown, Plain Text), and upcoming support for web sites, Notion, and Airtable. The platform offers a user-friendly interface for managing datastores, querying data via a secure API endpoint, and auto-generating ChatGPT Plugins for each datastore. Chaindesk utilizes a Vector Database (Qdrant), Openai's text-embedding-ada-002 for embeddings, and has a chunk size of 1024 tokens. The technology stack includes Next.js, Joy UI, LangchainJS, PostgreSQL, Prisma, and Qdrant, inspired by the ChatGPT Retrieval Plugin.
auto-news
Auto-News is an automatic news aggregator tool that utilizes Large Language Models (LLM) to pull information from various sources such as Tweets, RSS feeds, YouTube videos, web articles, Reddit, and journal notes. The tool aims to help users efficiently read and filter content based on personal interests, providing a unified reading experience and organizing information effectively. It features feed aggregation with summarization, transcript generation for videos and articles, noise reduction, task organization, and deep dive topic exploration. The tool supports multiple LLM backends, offers weekly top-k aggregations, and can be deployed on Linux/MacOS using docker-compose or Kubernetes.
SemanticFinder
SemanticFinder is a frontend-only live semantic search tool that calculates embeddings and cosine similarity client-side using transformers.js and SOTA embedding models from Huggingface. It allows users to search through large texts like books with pre-indexed examples, customize search parameters, and offers data privacy by keeping input text in the browser. The tool can be used for basic search tasks, analyzing texts for recurring themes, and has potential integrations with various applications like wikis, chat apps, and personal history search. It also provides options for building browser extensions and future ideas for further enhancements and integrations.
1filellm
1filellm is a command-line data aggregation tool designed for LLM ingestion. It aggregates and preprocesses data from various sources into a single text file, facilitating the creation of information-dense prompts for large language models. The tool supports automatic source type detection, handling of multiple file formats, web crawling functionality, integration with Sci-Hub for research paper downloads, text preprocessing, and token count reporting. Users can input local files, directories, GitHub repositories, pull requests, issues, ArXiv papers, YouTube transcripts, web pages, Sci-Hub papers via DOI or PMID. The tool provides uncompressed and compressed text outputs, with the uncompressed text automatically copied to the clipboard for easy pasting into LLMs.
Agently-Daily-News-Collector
Agently Daily News Collector is an open-source project showcasing a workflow powered by the Agent ly AI application development framework. It allows users to generate news collections on various topics by inputting the field topic. The AI agents automatically perform the necessary tasks to generate a high-quality news collection saved in a markdown file. Users can edit settings in the YAML file, install Python and required packages, input their topic idea, and wait for the news collection to be generated. The process involves tasks like outlining, searching, summarizing, and preparing column data. The project dependencies include Agently AI Development Framework, duckduckgo-search, BeautifulSoup4, and PyYAM.