Lumos
A RAG LLM co-pilot for browsing the web, powered by local LLMs
Stars: 1257
Lumos is a Chrome extension powered by a local LLM co-pilot for browsing the web. It allows users to summarize long threads, news articles, and technical documentation. Users can ask questions about reviews and product pages. The tool requires a local Ollama server for LLM inference and embedding database. Lumos supports multimodal models and file attachments for processing text and image content. It also provides options to customize models, hosts, and content parsers. The extension can be easily accessed through keyboard shortcuts and offers tools for automatic invocation based on prompts.
README:
A RAG LLM co-pilot for browsing the web, powered by local LLMs.
This Chrome extension is powered by Ollama. Inference is done on your local machine without any remote server support. However, due to security constraints in the Chrome extension platform, the app does rely on local server support to run the LLM. This app is inspired by the Chrome extension example provided by the Web LLM project and the local LLM examples provided by LangChain.
Lumos. Nox. Lumos. Nox.
- Summarize long threads on issue tracking sites, forums, and social media sites.
- Summarize news articles.
- Ask questions about reviews on business and product pages.
- Ask questions about long, technical documentation.
- ... what else?
A local Ollama server is needed for the embedding database and LLM inference. Download and install Ollama and the CLI here.
Example:
ollama pull llama2
Example:
OLLAMA_ORIGINS=chrome-extension://* ollama serve
Terminal output:
2023/11/19 20:55:16 images.go:799: total blobs: 6
2023/11/19 20:55:16 images.go:806: total unused blobs removed: 0
2023/11/19 20:55:16 routes.go:777: Listening on 127.0.0.1:11434 (version 0.1.10)
Note: The environment variable OLLAMA_ORIGINS must be set to chrome-extension://* to allow requests originating from the Chrome extension. The following error will occur in the Chrome extension if OLLAMA_ORIGINS is not set properly.
Access to fetch at 'http://localhost:11434/api/tags' from origin 'chrome-extension://<extension_id>' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.
Run launchctl setenv to set OLLAMA_ORIGINS.
launchctl setenv OLLAMA_ORIGINS "chrome-extension://*"
Setting environment variables on Mac (Ollama)
The Ollama server can also be run in a Docker container. The container should have the OLLAMA_ORIGINS environment variable set to chrome-extension://*.
Run docker run with the -e flag to set the OLLAMA_ORIGINS environment variable:
docker run -e OLLAMA_ORIGINS="chrome-extension://*" -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
In the project directory, you can run:
Launches the test runner in the interactive watch mode.
See the section about running tests for more information.
Runs eslint and prettier on src and __tests__ files.
Builds the app for production to the dist folder.
It correctly bundles React in production mode and optimizes the build for the best performance.
The build is minified and the filenames include the hashes.
Your app is ready to be deployed!
See the section about deployment for more information.
https://developer.chrome.com/docs/extensions/mv3/getstarted/development-basics/#load-unpacked
Create a keyboard shortcut to make Lumos easily accessible.
- Navigate to
chrome://extensions/shortcuts. - Configure a shortcut for Lumos (
Activate the extension). For example,⌘L(command key+L).
If you don't have npm installed, you can download the pre-built extension package from the Releases page.
Right-click on the extension icon and select Options to access the extension's Options page.
-
Ollama Model: Select desired model (e.g.
llama2) -
Ollama Embedding Model: Select desired embedding model (e.g.
nomic-embed-text). Caution: Using a different embedding model requires Ollama to swap models, which may incur undesired latency in the app. This is a known limitation in Ollama and may be improved in the future. -
Ollama Host: Select desired host (defaults to
http://localhost:11434) - Vector Store TTL (minutes): Number of minutes to store a URL's content in the vector store cache.
-
Content Parser Config: Lumos's default content parser will extract all text content between a page's
<body></body>tag. To customize the content parser, add an entry to the configuration. - Enable/Disable Tools: Enable or disable individual tools. If a tool is enabled, a custom prefix trigger (e.g. "calc:") can be specified to override the app's internal prompt classification mechanism.
- Enable/Disable Dark Arts: 😈
Each URL path can have its own content parser. The content parser config for the longest URL path will be matched.
- chunkSize: Number of characters to chunk page content into for indexing into RAG vectorstore
- chunkOverlap: Number of characters to overlap in chunks for indexing into RAG vectorstore
-
selectors:
document.querySelector()queries to perform to retrieve page content -
selectorsAll:
document.querySelectorAll()queries to perform to retrieve page content
For example, given the following config, if the URL path of the current tab is domain.com/path1/subpath1/subsubpath1, then the config for domain.com/path1/subpath1 will be used (i.e. chunkSize=600).
{
"domain.com/path1/subpath1": {
"chunkSize": 600,
"chunkOverlap": 200,
"selectors": [
"#id"
],
"selectorsAll": []
},
"domain.com/path1": {
"chunkSize": 500,
"chunkOverlap": 0,
"selectors": [
".className"
],
"selectorsAll": []
}
}See docs for How to Create a Custom Content Parser. See documentation for querySelector() and querySelectorAll() to confirm all querying capabilities.
Example:
{
"default": {
"chunkSize": 500,
"chunkOverlap": 0,
"selectors": [
"body"
],
"selectorsAll": []
},
"medium.com": {
"chunkSize": 500,
"chunkOverlap": 0,
"selectors": [
"article"
],
"selectorsAll": []
},
"reddit.com": {
"chunkSize": 500,
"chunkOverlap": 0,
"selectors": [],
"selectorsAll": [
"shreddit-comment"
]
},
"stackoverflow.com": {
"chunkSize": 500,
"chunkOverlap": 0,
"selectors": [
"#question-header",
"#mainbar"
],
"selectorsAll": []
},
"wikipedia.org": {
"chunkSize": 2000,
"chunkOverlap": 500,
"selectors": [
"#bodyContent"
],
"selectorsAll": []
},
"yelp.com": {
"chunkSize": 500,
"chunkOverlap": 0,
"selectors": [
"#location-and-hours",
"#reviews"
],
"selectorsAll": []
}
}Alternatively, if content is highlighted on a page (e.g. highlighted text), that content will be parsed instead of the content produced from the content parser configuration.
Note: Content that is highlighted will not be cached in the vector store cache. Each subsequent prompt containing highlighted content will generate new embeddings.
-
cmd + c: Copy last message to clipboard. -
cmd + j: ToggleDisable content parsingcheckbox. -
cmd + k: Clear all messages. -
cmd + ;: Open/close Chat History panel. -
ctrl + c: Cancel request (LLM request/streaming or embeddings generation) -
ctrl + x: Remove file attachment. -
ctrl + r: Regenerate last LLM response.
Lumos supports multimodal models! Images that are present on the current page will be downloaded and bound to the model for prompting. See documentation and examples here.
File attachments can be uploaded to Lumos. The contents of a file will be parsed and processed through Lumos's RAG workflow (similar to processing page content). By default, the text content of a file will be parsed if the extension type is not explicitly listed below.
Supported extension types:
.csv.json.pdf- any plain text file format (
.txt,.md,.py, etc)
Note: If an attachment is present, page content will not be parsed. Remove the file attachment to resume parsing page content.
Image files will be processed through Lumos's multimodal workflow (requires multimodal model).
Supported image types:
-
.jpeg,.jpg .png
Lumos invokes Tools automatically based on the provided prompt. See documentation and examples here.
- Local LLM in the Browser Powered by Ollama
- Local LLM in the Browser Powered by Ollama (Part 2)
- Let’s Normalize Online, In-Memory RAG! (Part 3)
- Supercharging If-Statements With Prompt Classification Using Ollama and LangChain (Part 4)
- Bolstering LangChain’s MemoryVectorStore With Keyword Search (Part 5)
- A Guide to Gotchas with LangChain Document Loaders in a Chrome Extension (Part 6)
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Lumos
Similar Open Source Tools
Lumos
Lumos is a Chrome extension powered by a local LLM co-pilot for browsing the web. It allows users to summarize long threads, news articles, and technical documentation. Users can ask questions about reviews and product pages. The tool requires a local Ollama server for LLM inference and embedding database. Lumos supports multimodal models and file attachments for processing text and image content. It also provides options to customize models, hosts, and content parsers. The extension can be easily accessed through keyboard shortcuts and offers tools for automatic invocation based on prompts.
langserve
LangServe helps developers deploy `LangChain` runnables and chains as a REST API. This library is integrated with FastAPI and uses pydantic for data validation. In addition, it provides a client that can be used to call into runnables deployed on a server. A JavaScript client is available in LangChain.js.
langchainrb
Langchain.rb is a Ruby library that makes it easy to build LLM-powered applications. It provides a unified interface to a variety of LLMs, vector search databases, and other tools, making it easy to build and deploy RAG (Retrieval Augmented Generation) systems and assistants. Langchain.rb is open source and available under the MIT License.
bot-on-anything
The 'bot-on-anything' repository allows developers to integrate various AI models into messaging applications, enabling the creation of intelligent chatbots. By configuring the connections between models and applications, developers can easily switch between multiple channels within a project. The architecture is highly scalable, allowing the reuse of algorithmic capabilities for each new application and model integration. Supported models include ChatGPT, GPT-3.0, New Bing, and Google Bard, while supported applications range from terminals and web platforms to messaging apps like WeChat, Telegram, QQ, and more. The repository provides detailed instructions for setting up the environment, configuring the models and channels, and running the chatbot for various tasks across different messaging platforms.
aiavatarkit
AIAvatarKit is a tool for building AI-based conversational avatars quickly. It supports various platforms like VRChat and cluster, along with real-world devices. The tool is extensible, allowing unlimited capabilities based on user needs. It requires VOICEVOX API, Google or Azure Speech Services API keys, and Python 3.10. Users can start conversations out of the box and enjoy seamless interactions with the avatars.
swarmzero
SwarmZero SDK is a library that simplifies the creation and execution of AI Agents and Swarms of Agents. It supports various LLM Providers such as OpenAI, Azure OpenAI, Anthropic, MistralAI, Gemini, Nebius, and Ollama. Users can easily install the library using pip or poetry, set up the environment and configuration, create and run Agents, collaborate with Swarms, add tools for complex tasks, and utilize retriever tools for semantic information retrieval. Sample prompts are provided to help users explore the capabilities of the agents and swarms. The SDK also includes detailed examples and documentation for reference.
deep-searcher
DeepSearcher is a tool that combines reasoning LLMs and Vector Databases to perform search, evaluation, and reasoning based on private data. It is suitable for enterprise knowledge management, intelligent Q&A systems, and information retrieval scenarios. The tool maximizes the utilization of enterprise internal data while ensuring data security, supports multiple embedding models, and provides support for multiple LLMs for intelligent Q&A and content generation. It also includes features like private data search, vector database management, and document loading with web crawling capabilities under development.
llm-vscode
llm-vscode is an extension designed for all things LLM, utilizing llm-ls as its backend. It offers features such as code completion with 'ghost-text' suggestions, the ability to choose models for code generation via HTTP requests, ensuring prompt size fits within the context window, and code attribution checks. Users can configure the backend, suggestion behavior, keybindings, llm-ls settings, and tokenization options. Additionally, the extension supports testing models like Code Llama 13B, Phind/Phind-CodeLlama-34B-v2, and WizardLM/WizardCoder-Python-34B-V1.0. Development involves cloning llm-ls, building it, and setting up the llm-vscode extension for use.
aiohttp-cors
The aiohttp_cors library provides Cross Origin Resource Sharing (CORS) support for aiohttp, an asyncio-powered asynchronous HTTP server. CORS allows overriding the Same-origin policy for specific resources, enabling web pages to access resources from different origins. The library helps configure CORS settings for resources and routes in aiohttp applications, allowing control over origins, credentials passing, headers, and preflight requests.
agentpress
AgentPress is a collection of simple but powerful utilities that serve as building blocks for creating AI agents. It includes core components for managing threads, registering tools, processing responses, state management, and utilizing LLMs. The tool provides a modular architecture for handling messages, LLM API calls, response processing, tool execution, and results management. Users can easily set up the environment, create custom tools with OpenAPI or XML schema, and manage conversation threads with real-time interaction. AgentPress aims to be agnostic, simple, and flexible, allowing users to customize and extend functionalities as needed.
langchain-extract
LangChain Extract is a simple web server that allows you to extract information from text and files using LLMs. It is built using FastAPI, LangChain, and Postgresql. The backend closely follows the extraction use-case documentation and provides a reference implementation of an app that helps to do extraction over data using LLMs. This repository is meant to be a starting point for building your own extraction application which may have slightly different requirements or use cases.
sparkle
Sparkle is a tool that streamlines the process of building AI-driven features in applications using Large Language Models (LLMs). It guides users through creating and managing agents, defining tools, and interacting with LLM providers like OpenAI. Sparkle allows customization of LLM provider settings, model configurations, and provides a seamless integration with Sparkle Server for exposing agents via an OpenAI-compatible chat API endpoint.
instructor
Instructor is a popular Python library for managing structured outputs from large language models (LLMs). It offers a user-friendly API for validation, retries, and streaming responses. With support for various LLM providers and multiple languages, Instructor simplifies working with LLM outputs. The library includes features like response models, retry management, validation, streaming support, and flexible backends. It also provides hooks for logging and monitoring LLM interactions, and supports integration with Anthropic, Cohere, Gemini, Litellm, and Google AI models. Instructor facilitates tasks such as extracting user data from natural language, creating fine-tuned models, managing uploaded files, and monitoring usage of OpenAI models.
awadb
AwaDB is an AI native database designed for embedding vectors. It simplifies database usage by eliminating the need for schema definition and manual indexing. The system ensures real-time search capabilities with millisecond-level latency. Built on 5 years of production experience with Vearch, AwaDB incorporates best practices from the community to offer stability and efficiency. Users can easily add and search for embedded sentences using the provided client libraries or RESTful API.
banks
Banks is a linguist professor tool that helps generate meaningful LLM prompts using a template language. It provides a user-friendly way to create prompts for various tasks such as blog writing, summarizing documents, lemmatizing text, and generating text using a LLM. The tool supports async operations and comes with predefined filters for data processing. Banks leverages Jinja's macro system to create prompts and interact with OpenAI API for text generation. It also offers a cache mechanism to avoid regenerating text for the same template and context.
For similar tasks
Lumos
Lumos is a Chrome extension powered by a local LLM co-pilot for browsing the web. It allows users to summarize long threads, news articles, and technical documentation. Users can ask questions about reviews and product pages. The tool requires a local Ollama server for LLM inference and embedding database. Lumos supports multimodal models and file attachments for processing text and image content. It also provides options to customize models, hosts, and content parsers. The extension can be easily accessed through keyboard shortcuts and offers tools for automatic invocation based on prompts.
open-source-slack-ai
This repository provides a ready-to-run basic Slack AI solution that allows users to summarize threads and channels using OpenAI. Users can generate thread summaries, channel overviews, channel summaries since a specific time, and full channel summaries. The tool is powered by GPT-3.5-Turbo and an ensemble of NLP models. It requires Python 3.8 or higher, an OpenAI API key, Slack App with associated API tokens, Poetry package manager, and ngrok for local development. Users can customize channel and thread summaries, run tests with coverage using pytest, and contribute to the project for future enhancements.
For similar jobs
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
daily-poetry-image
Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.
exif-photo-blog
EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.
SillyTavern
SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.
Twitter-Insight-LLM
This project enables you to fetch liked tweets from Twitter (using Selenium), save it to JSON and Excel files, and perform initial data analysis and image captions. This is part of the initial steps for a larger personal project involving Large Language Models (LLMs).
AISuperDomain
Aila Desktop Application is a powerful tool that integrates multiple leading AI models into a single desktop application. It allows users to interact with various AI models simultaneously, providing diverse responses and insights to their inquiries. With its user-friendly interface and customizable features, Aila empowers users to engage with AI seamlessly and efficiently. Whether you're a researcher, student, or professional, Aila can enhance your AI interactions and streamline your workflow.
ChatGPT-On-CS
This project is an intelligent dialogue customer service tool based on a large model, which supports access to platforms such as WeChat, Qianniu, Bilibili, Douyin Enterprise, Douyin, Doudian, Weibo chat, Xiaohongshu professional account operation, Xiaohongshu, Zhihu, etc. You can choose GPT3.5/GPT4.0/ Lazy Treasure Box (more platforms will be supported in the future), which can process text, voice and pictures, and access external resources such as operating systems and the Internet through plug-ins, and support enterprise AI applications customized based on their own knowledge base.
obs-localvocal
LocalVocal is a live-streaming AI assistant plugin for OBS that allows you to transcribe audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). It's privacy-first, with all data staying on your machine, and requires no GPU, cloud costs, network, or downtime.
