Lumos
A RAG LLM co-pilot for browsing the web, powered by local LLMs
Stars: 1257
Lumos is a Chrome extension powered by a local LLM co-pilot for browsing the web. It allows users to summarize long threads, news articles, and technical documentation. Users can ask questions about reviews and product pages. The tool requires a local Ollama server for LLM inference and embedding database. Lumos supports multimodal models and file attachments for processing text and image content. It also provides options to customize models, hosts, and content parsers. The extension can be easily accessed through keyboard shortcuts and offers tools for automatic invocation based on prompts.
README:
A RAG LLM co-pilot for browsing the web, powered by local LLMs.
This Chrome extension is powered by Ollama. Inference is done on your local machine without any remote server support. However, due to security constraints in the Chrome extension platform, the app does rely on local server support to run the LLM. This app is inspired by the Chrome extension example provided by the Web LLM project and the local LLM examples provided by LangChain.
Lumos. Nox. Lumos. Nox.
- Summarize long threads on issue tracking sites, forums, and social media sites.
- Summarize news articles.
- Ask questions about reviews on business and product pages.
- Ask questions about long, technical documentation.
- ... what else?
A local Ollama server is needed for the embedding database and LLM inference. Download and install Ollama and the CLI here.
Example:
ollama pull llama2
Example:
OLLAMA_ORIGINS=chrome-extension://* ollama serve
Terminal output:
2023/11/19 20:55:16 images.go:799: total blobs: 6
2023/11/19 20:55:16 images.go:806: total unused blobs removed: 0
2023/11/19 20:55:16 routes.go:777: Listening on 127.0.0.1:11434 (version 0.1.10)
Note: The environment variable OLLAMA_ORIGINS
must be set to chrome-extension://*
to allow requests originating from the Chrome extension. The following error will occur in the Chrome extension if OLLAMA_ORIGINS
is not set properly.
Access to fetch at 'http://localhost:11434/api/tags' from origin 'chrome-extension://<extension_id>' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.
Run launchctl setenv
to set OLLAMA_ORIGINS
.
launchctl setenv OLLAMA_ORIGINS "chrome-extension://*"
Setting environment variables on Mac (Ollama)
The Ollama server can also be run in a Docker container. The container should have the OLLAMA_ORIGINS
environment variable set to chrome-extension://*
.
Run docker run
with the -e
flag to set the OLLAMA_ORIGINS
environment variable:
docker run -e OLLAMA_ORIGINS="chrome-extension://*" -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
In the project directory, you can run:
Launches the test runner in the interactive watch mode.
See the section about running tests for more information.
Runs eslint
and prettier
on src
and __tests__
files.
Builds the app for production to the dist
folder.
It correctly bundles React in production mode and optimizes the build for the best performance.
The build is minified and the filenames include the hashes.
Your app is ready to be deployed!
See the section about deployment for more information.
https://developer.chrome.com/docs/extensions/mv3/getstarted/development-basics/#load-unpacked
Create a keyboard shortcut to make Lumos easily accessible.
- Navigate to
chrome://extensions/shortcuts
. - Configure a shortcut for Lumos (
Activate the extension
). For example,⌘L
(command key
+L
).
If you don't have npm
installed, you can download the pre-built extension package from the Releases page.
Right-click on the extension icon and select Options
to access the extension's Options page.
-
Ollama Model: Select desired model (e.g.
llama2
) -
Ollama Embedding Model: Select desired embedding model (e.g.
nomic-embed-text
). Caution: Using a different embedding model requires Ollama to swap models, which may incur undesired latency in the app. This is a known limitation in Ollama and may be improved in the future. -
Ollama Host: Select desired host (defaults to
http://localhost:11434
) - Vector Store TTL (minutes): Number of minutes to store a URL's content in the vector store cache.
-
Content Parser Config: Lumos's default content parser will extract all text content between a page's
<body></body>
tag. To customize the content parser, add an entry to the configuration. - Enable/Disable Tools: Enable or disable individual tools. If a tool is enabled, a custom prefix trigger (e.g. "calc:") can be specified to override the app's internal prompt classification mechanism.
- Enable/Disable Dark Arts: 😈
Each URL path can have its own content parser. The content parser config for the longest URL path will be matched.
- chunkSize: Number of characters to chunk page content into for indexing into RAG vectorstore
- chunkOverlap: Number of characters to overlap in chunks for indexing into RAG vectorstore
-
selectors:
document.querySelector()
queries to perform to retrieve page content -
selectorsAll:
document.querySelectorAll()
queries to perform to retrieve page content
For example, given the following config, if the URL path of the current tab is domain.com/path1/subpath1/subsubpath1
, then the config for domain.com/path1/subpath1
will be used (i.e. chunkSize=600
).
{
"domain.com/path1/subpath1": {
"chunkSize": 600,
"chunkOverlap": 200,
"selectors": [
"#id"
],
"selectorsAll": []
},
"domain.com/path1": {
"chunkSize": 500,
"chunkOverlap": 0,
"selectors": [
".className"
],
"selectorsAll": []
}
}
See docs for How to Create a Custom Content Parser. See documentation for querySelector()
and querySelectorAll()
to confirm all querying capabilities.
Example:
{
"default": {
"chunkSize": 500,
"chunkOverlap": 0,
"selectors": [
"body"
],
"selectorsAll": []
},
"medium.com": {
"chunkSize": 500,
"chunkOverlap": 0,
"selectors": [
"article"
],
"selectorsAll": []
},
"reddit.com": {
"chunkSize": 500,
"chunkOverlap": 0,
"selectors": [],
"selectorsAll": [
"shreddit-comment"
]
},
"stackoverflow.com": {
"chunkSize": 500,
"chunkOverlap": 0,
"selectors": [
"#question-header",
"#mainbar"
],
"selectorsAll": []
},
"wikipedia.org": {
"chunkSize": 2000,
"chunkOverlap": 500,
"selectors": [
"#bodyContent"
],
"selectorsAll": []
},
"yelp.com": {
"chunkSize": 500,
"chunkOverlap": 0,
"selectors": [
"#location-and-hours",
"#reviews"
],
"selectorsAll": []
}
}
Alternatively, if content is highlighted on a page (e.g. highlighted text), that content will be parsed instead of the content produced from the content parser configuration.
Note: Content that is highlighted will not be cached in the vector store cache. Each subsequent prompt containing highlighted content will generate new embeddings.
-
cmd + c
: Copy last message to clipboard. -
cmd + j
: ToggleDisable content parsing
checkbox. -
cmd + k
: Clear all messages. -
cmd + ;
: Open/close Chat History panel. -
ctrl + c
: Cancel request (LLM request/streaming or embeddings generation) -
ctrl + x
: Remove file attachment. -
ctrl + r
: Regenerate last LLM response.
Lumos supports multimodal models! Images that are present on the current page will be downloaded and bound to the model for prompting. See documentation and examples here.
File attachments can be uploaded to Lumos. The contents of a file will be parsed and processed through Lumos's RAG workflow (similar to processing page content). By default, the text content of a file will be parsed if the extension type is not explicitly listed below.
Supported extension types:
.csv
.json
.pdf
- any plain text file format (
.txt
,.md
,.py
, etc)
Note: If an attachment is present, page content will not be parsed. Remove the file attachment to resume parsing page content.
Image files will be processed through Lumos's multimodal workflow (requires multimodal model).
Supported image types:
-
.jpeg
,.jpg
.png
Lumos invokes Tools automatically based on the provided prompt. See documentation and examples here.
- Local LLM in the Browser Powered by Ollama
- Local LLM in the Browser Powered by Ollama (Part 2)
- Let’s Normalize Online, In-Memory RAG! (Part 3)
- Supercharging If-Statements With Prompt Classification Using Ollama and LangChain (Part 4)
- Bolstering LangChain’s MemoryVectorStore With Keyword Search (Part 5)
- A Guide to Gotchas with LangChain Document Loaders in a Chrome Extension (Part 6)
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Lumos
Similar Open Source Tools
Lumos
Lumos is a Chrome extension powered by a local LLM co-pilot for browsing the web. It allows users to summarize long threads, news articles, and technical documentation. Users can ask questions about reviews and product pages. The tool requires a local Ollama server for LLM inference and embedding database. Lumos supports multimodal models and file attachments for processing text and image content. It also provides options to customize models, hosts, and content parsers. The extension can be easily accessed through keyboard shortcuts and offers tools for automatic invocation based on prompts.
call-center-ai
Call Center AI is an AI-powered call center solution leveraging Azure and OpenAI GPT. It allows for AI agent-initiated phone calls or direct calls to the bot from a configured phone number. The bot is customizable for various industries like insurance, IT support, and customer service, with features such as accessing claim information, conversation history, language change, SMS sending, and more. The project is a proof of concept showcasing the integration of Azure Communication Services, Azure Cognitive Services, and Azure OpenAI for an automated call center solution.
call-center-ai
Call Center AI is an AI-powered call center solution that leverages Azure and OpenAI GPT. It is a proof of concept demonstrating the integration of Azure Communication Services, Azure Cognitive Services, and Azure OpenAI to build an automated call center solution. The project showcases features like accessing claims on a public website, customer conversation history, language change during conversation, bot interaction via phone number, multiple voice tones, lexicon understanding, todo list creation, customizable prompts, content filtering, GPT-4 Turbo for customer requests, specific data schema for claims, documentation database access, SMS report sending, conversation resumption, and more. The system architecture includes components like RAG AI Search, SMS gateway, call gateway, moderation, Cosmos DB, event broker, GPT-4 Turbo, Redis cache, translation service, and more. The tool can be deployed remotely using GitHub Actions and locally with prerequisites like Azure environment setup, configuration file creation, and resource hosting. Advanced usage includes custom training data with AI Search, prompt customization, language customization, moderation level customization, claim data schema customization, OpenAI compatible model usage for the LLM, and Twilio integration for SMS.
client-python
The Mistral Python Client is a tool inspired by cohere-python that allows users to interact with the Mistral AI API. It provides functionalities to access and utilize the AI capabilities offered by Mistral. Users can easily install the client using pip and manage dependencies using poetry. The client includes examples demonstrating how to use the API for various tasks, such as chat interactions. To get started, users need to obtain a Mistral API Key and set it as an environment variable. Overall, the Mistral Python Client simplifies the integration of Mistral AI services into Python applications.
langchainrb
Langchain.rb is a Ruby library that makes it easy to build LLM-powered applications. It provides a unified interface to a variety of LLMs, vector search databases, and other tools, making it easy to build and deploy RAG (Retrieval Augmented Generation) systems and assistants. Langchain.rb is open source and available under the MIT License.
json-repair
JSON Repair is a toolkit designed to address JSON anomalies that can arise from Large Language Models (LLMs). It offers a comprehensive solution for repairing JSON strings, ensuring accuracy and reliability in your data processing. With its user-friendly interface and extensive capabilities, JSON Repair empowers developers to seamlessly integrate JSON repair into their workflows.
hf-waitress
HF-Waitress is a powerful server application for deploying and interacting with HuggingFace Transformer models. It simplifies running open-source Large Language Models (LLMs) locally on-device, providing on-the-fly quantization via BitsAndBytes, HQQ, and Quanto. It requires no manual model downloads, offers concurrency, streaming responses, and supports various hardware and platforms. The server uses a `config.json` file for easy configuration management and provides detailed error handling and logging.
claim-ai-phone-bot
AI-powered call center solution with Azure and OpenAI GPT. The bot can answer calls, understand the customer's request, and provide relevant information or assistance. It can also create a todo list of tasks to complete the claim, and send a report after the call. The bot is customizable, and can be used in multiple languages.
ai-artifacts
AI Artifacts is an open source tool that replicates Anthropic's Artifacts UI in the Claude chat app. It utilizes E2B's Code Interpreter SDK and Core SDK for secure AI code execution in a cloud sandbox environment. Users can run AI-generated code in various languages such as Python, JavaScript, R, and Nextjs apps. The tool also supports running AI-generated Python in Jupyter notebook, Next.js apps, and Streamlit apps. Additionally, it offers integration with Vercel AI SDK for tool calling and streaming responses from the model.
langserve
LangServe helps developers deploy `LangChain` runnables and chains as a REST API. This library is integrated with FastAPI and uses pydantic for data validation. In addition, it provides a client that can be used to call into runnables deployed on a server. A JavaScript client is available in LangChain.js.
aiocsv
aiocsv is a Python module that provides asynchronous CSV reading and writing. It is designed to be a drop-in replacement for the Python's builtin csv module, but with the added benefit of being able to read and write CSV files asynchronously. This makes it ideal for use in applications that need to process large CSV files efficiently.
fragments
Fragments is an open-source tool that leverages Anthropic's Claude Artifacts, Vercel v0, and GPT Engineer. It is powered by E2B Sandbox SDK and Code Interpreter SDK, allowing secure execution of AI-generated code. The tool is based on Next.js 14, shadcn/ui, TailwindCSS, and Vercel AI SDK. Users can stream in the UI, install packages from npm and pip, and add custom stacks and LLM providers. Fragments enables users to build web apps with Python interpreter, Next.js, Vue.js, Streamlit, and Gradio, utilizing providers like OpenAI, Anthropic, Google AI, and more.
SimplerLLM
SimplerLLM is an open-source Python library that simplifies interactions with Large Language Models (LLMs) for researchers and beginners. It provides a unified interface for different LLM providers, tools for enhancing language model capabilities, and easy development of AI-powered tools and apps. The library offers features like unified LLM interface, generic text loader, RapidAPI connector, SERP integration, prompt template builder, and more. Users can easily set up environment variables, create LLM instances, use tools like SERP, generic text loader, calling RapidAPI APIs, and prompt template builder. Additionally, the library includes chunking functions to split texts into manageable chunks based on different criteria. Future updates will bring more tools, interactions with local LLMs, prompt optimization, response evaluation, GPT Trainer, document chunker, advanced document loader, integration with more providers, Simple RAG with SimplerVectors, integration with vector databases, agent builder, and LLM server.
syncode
SynCode is a novel framework for the grammar-guided generation of Large Language Models (LLMs) that ensures syntactically valid output with respect to defined Context-Free Grammar (CFG) rules. It supports general-purpose programming languages like Python, Go, SQL, JSON, and more, allowing users to define custom grammars using EBNF syntax. The tool compares favorably to other constrained decoders and offers features like fast grammar-guided generation, compatibility with HuggingFace Language Models, and the ability to work with various decoding strategies.
redis-vl-python
The Python Redis Vector Library (RedisVL) is a tailor-made client for AI applications leveraging Redis. It enhances applications with Redis' speed, flexibility, and reliability, incorporating capabilities like vector-based semantic search, full-text search, and geo-spatial search. The library bridges the gap between the emerging AI-native developer ecosystem and the capabilities of Redis by providing a lightweight, elegant, and intuitive interface. It abstracts the features of Redis into a grammar that is more aligned to the needs of today's AI/ML Engineers or Data Scientists.
redisvl
Redis Vector Library (RedisVL) is a Python client library for building AI applications on top of Redis. It provides a high-level interface for managing vector indexes, performing vector search, and integrating with popular embedding models and providers. RedisVL is designed to make it easy for developers to build and deploy AI applications that leverage the speed, flexibility, and reliability of Redis.
For similar tasks
Lumos
Lumos is a Chrome extension powered by a local LLM co-pilot for browsing the web. It allows users to summarize long threads, news articles, and technical documentation. Users can ask questions about reviews and product pages. The tool requires a local Ollama server for LLM inference and embedding database. Lumos supports multimodal models and file attachments for processing text and image content. It also provides options to customize models, hosts, and content parsers. The extension can be easily accessed through keyboard shortcuts and offers tools for automatic invocation based on prompts.
open-source-slack-ai
This repository provides a ready-to-run basic Slack AI solution that allows users to summarize threads and channels using OpenAI. Users can generate thread summaries, channel overviews, channel summaries since a specific time, and full channel summaries. The tool is powered by GPT-3.5-Turbo and an ensemble of NLP models. It requires Python 3.8 or higher, an OpenAI API key, Slack App with associated API tokens, Poetry package manager, and ngrok for local development. Users can customize channel and thread summaries, run tests with coverage using pytest, and contribute to the project for future enhancements.
For similar jobs
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
daily-poetry-image
Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.
exif-photo-blog
EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.
SillyTavern
SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.
Twitter-Insight-LLM
This project enables you to fetch liked tweets from Twitter (using Selenium), save it to JSON and Excel files, and perform initial data analysis and image captions. This is part of the initial steps for a larger personal project involving Large Language Models (LLMs).
AISuperDomain
Aila Desktop Application is a powerful tool that integrates multiple leading AI models into a single desktop application. It allows users to interact with various AI models simultaneously, providing diverse responses and insights to their inquiries. With its user-friendly interface and customizable features, Aila empowers users to engage with AI seamlessly and efficiently. Whether you're a researcher, student, or professional, Aila can enhance your AI interactions and streamline your workflow.
ChatGPT-On-CS
This project is an intelligent dialogue customer service tool based on a large model, which supports access to platforms such as WeChat, Qianniu, Bilibili, Douyin Enterprise, Douyin, Doudian, Weibo chat, Xiaohongshu professional account operation, Xiaohongshu, Zhihu, etc. You can choose GPT3.5/GPT4.0/ Lazy Treasure Box (more platforms will be supported in the future), which can process text, voice and pictures, and access external resources such as operating systems and the Internet through plug-ins, and support enterprise AI applications customized based on their own knowledge base.
obs-localvocal
LocalVocal is a live-streaming AI assistant plugin for OBS that allows you to transcribe audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). It's privacy-first, with all data staying on your machine, and requires no GPU, cloud costs, network, or downtime.