cellm
Use LLMs in Excel formulas
Stars: 675
Cellm is an Excel extension that allows users to leverage Large Language Models (LLMs) like ChatGPT within cell formulas. It enables users to extract AI responses to text ranges, making it useful for automating repetitive tasks that involve data processing and analysis. Cellm supports various models from Anthropic, Mistral, OpenAI, and Google, as well as locally hosted models via Llamafiles, Ollama, or vLLM. The tool is designed to simplify the integration of AI capabilities into Excel for tasks such as text classification, data cleaning, content summarization, entity extraction, and more.
README:
Cellm is an Excel extension that lets you use Large Language Models (LLMs) like ChatGPT in cell formulas.
Similar to Excel's =SUM()
function that outputs the sum of a range of numbers, Cellm's =PROMPT()
function outputs the AI response to a range of text.
For example, you can write =PROMPT(A1, "Extract all person names mentioned in the text.")
in a cell's formula and drag the cell to apply the prompt to many rows. Cellm is useful when you want to use AI for repetitive tasks that would normally require copy-pasting data in and out of a chat window many times.
This extension does one thing and one thing well.
- Calls LLMs in formulas and returns short answers suitable for cells.
- Supports models from Anthropic, Mistral, OpenAI, and Google as well as locally hosted models via Llamafiles, Ollama, or vLLM.
Say you're reviewing medical studies and need to quickly identify papers relevant to your research. Here's how Cellm can help with this task:
https://github.com/user-attachments/assets/c93f7da9-aabd-4c13-a4f5-3e12332c5794
In this example, we copy the papers' titles and abstracts into Excel and write this prompt:
"If the paper studies diabetic neuropathy and stroke, return "Include", otherwise, return "Exclude"."
We then use autofill to apply the prompt to many papers. Simple and powerful.
Note that a paper is misclassified. The models will make mistakes. It is your responsibility to cross validate if a model is accurate enough for your use case and upgrade model or use another approach if not.
Cellm must be built from source and installed via Excel. Follow the steps below.
-
Clone this repository:
git clone https://github.com/getcellm/cellm.git
-
In your terminal, go into the root of the project directory:
cd cellm
-
Install dependencies:
dotnet restore
-
Build the project:
dotnet build --configuration Release
-
Cellm uses Ollama and the Gemma 2 2B model by default. Download and install Ollama and run the following command in your Windows terminal to download the model:
ollama pull gemma2:2b
To use other models, see the Models section below.
These steps will build Cellm on your computer. Continue with the steps below to install Cellm in Excel.
- In Excel, go to File > Options > Add-Ins.
- In the
Manage
drop-down menu, selectExcel Add-ins
and clickGo...
. - Click
Browse...
and select theCellm-AddIn64.xll
file in the bin/Release/net9.0-windows. This folder is located in the root of git repository that you cloned. - Check the box next to Cellm and click
OK
.
Cellm provides the following functions:
PROMPT(cells: range, [instruction: range | instruction: string | temperature: double], [temperature: double]): string
-
cells (Required): A cell or a range of cells.
- Context and (optionally) instructions. The model will use the cells as context and follow any instructions as long as they are present somewhere in the cells.
-
instructions (Optional): A cell, a range of cells, or a string.
- The model will follow these instructions and ignore instructions in the cells of the first argument.
- Default: Empty.
-
temperature (Optional): double.
- A value between 0 and 1 that controls the balance between deterministic outputs and creative exploration. Lower values make the output more deterministic, higher values make it more random.
- Default: 0. The model will almost always give you the same result.
- Returns: string: The AI model's response.
Example usage:
-
=PROMPT(A1:D10, "Extract keywords")
will use the selected range of cells as context and follow the instruction to extract keywords. -
=PROMPT(A1:D10, "Extract keywords", 0.7)
will use the selected range of cells as context, follow the instruction to extract keywords, and use a temperature of 0.7. -
=PROMPT(A1:D10)
will use the range of cells as context and follow instructions as long as they present somewhere in the cells. -
=PROMPT(A1:D10, 0.7)
will use the selected range of cells as context, follow any instruction within the cells, and use a temperature of 0.7.
PROMPTWITH(providerAndModel: string or cell, cells: range, [instruction: range | instruction: string | temperature: double], [temperature: double]): string
Allows you to specify the model as the first argument.
-
providerAndModel (Required): A string on the form "provider/model".
- Default: anthropic/claude-3-5-sonnet-20240620
Example usage:
-
=PROMPTWITH("openai/gpt-4o-mini", A1:D10, "Extract keywords")
will extract keywords using OpenAI's GPT-4o mini model instead of the default model.
Cellm is useful for repetitive tasks on both structured and unstructured data. Here are some practical applications:
-
Text Classification
=PROMPT(B2, "Analyze the survey response. Categorize as 'Product', 'Service', 'Pricing', or 'Other'.")
Use classification prompts to quickly categorize large volumes of e.g. open-ended survey responses.
-
Model Comparison
Make a sheet with user queries in the first column and provider/model pairs in the first row. Write this prompt in the cell B2:
=PROMPTWITH(B$1,$A2,"Answer the question in column A")
Drag the cell across the entire table to apply all models to all queries.
-
Data Cleaning
=PROMPT(E2, "Standardize the company name by removing any legal entity identifiers (e.g., Inc., LLC) and correcting common misspellings.")
Useful for cleaning and standardizing messy datasets.
-
Content Summarization
=PROMPT(F2, "Provide a 2-sentence summary of the article in the context.")
Great for quickly digesting large amounts of text data, such as news articles or research papers.
-
Entity Extraction
=PROMPT(G2, "Extract all person names mentioned in the text.")
Useful for analyzing unstructured text data in fields like journalism, research, or customer relationship management.
-
When Built-in Excel Functions Are Insufficient
=PROMPT(A1, "Fix email formatting")
Useful when an "auditor" inserts random spaces in a column with thousands of email adresses. Use a local model if you are worried about sending sensitive data to hosted models.
These use cases are starting points. Experiment with different instructions to find what works best for your data. It works best when combined with human judgment and expertise in your specific domain.
Cellm supports both hosted and local models. These are configured via appsettings files.
You can use appsettings.Local.OpenAiCompatible.json
as a starting point for configuring any model provider that is compatible with OpenAI's API. Just rename it to appsettings.Local.json
and edit the values. In general, you should leave appsettings.json
alone and add your own configuration to appsettings.Local.json
only. Any settings in this file will override the default settings in appsettings.json
.
The following sections shows you how to configure appsettings.Local.json
for commonly used hosted and local models.
Cellm supports hosted models from Anthropic, Google, OpenAI, and any OpenAI-compatible provider. To use e.g. Claude 3.5 Sonnet from Anthropic:
-
Rename
src/Cellm/appsettings.Anthropic.json
tosrc/Cellm/appsettings.Local.json
. -
Add your Anthropic API key to
src/Cellm/appsettings.Local.json
:{ "AnthropicConfiguration": { "ApiKey": "ADD_YOUR_ANTHROPIC_API_KEY_HERE" }, "ProviderConfiguration": { "DefaultProvider": "Anthropic" } }
See the appsettings.Local.*.json
files for examples on other providers.
Cellm supports local models that run on your computer via Llamafiles, Ollama, or vLLM. This ensures none of your data ever leaves your machine. And it's free.
Cellm uses Ollama Gemma 2 2B model by default. This clever little model runs fine on a CPU. For any model larger than 3B you will need a GPU. Ollama will automatically use your GPU if you have one. To get started:
- Download and install Ollama from https://ollama.com/.
- Download the model by running the following command in your Windows terminal:
ollama pull gemma2:2b
See https://ollama.com/search for a complete list of supported models.
Llamafile is a stand-alone executable that is very easy to setup. To get started:
-
Download a llamafile from https://github.com/Mozilla-Ocho/llamafile (e.g. Gemma 2 2B).
-
Run the following command in your Windows terminal:
.\gemma-2-2b-it.Q6_K.llamafile.exe --server --v2
To offload inference to your NVIDIA or AMD GPU, run:
.\gemma-2-2b-it.Q6_K.llamafile.exe --server --v2 -ngl 999
-
Rename
appsettings.Llamafile.json
toappsettings.Local.json
. -
Build and install Cellm.
If you prefer to run models via docker, both Ollama and vLLM are packaged up with docker compose files in the docker/
folder. Ollama is designed for easy of use and vLLM is designed to run many requests in parallel. vLLM is particularly useful if you need to process a lot of data locally. Open WebUI in included in both Ollama and vLLM docker compose files so you can test the local model outside of Cellm. It is available at http://localhost:3000
.
To get started, we recommend using Ollama with the Gemma 2 2B model:
- Build and install Cellm.
- Run the following command in the
docker/
directory:docker compose -f docker-compose.Ollama.yml up --detach docker compose -f docker-compose.Ollama.yml exec backend ollama pull gemma2:2b docker compose -f docker-compose.Ollama.yml down // When you want to shut it down
To use other Ollama models, pull another of the supported models. If you want to speed up inference, you can use your GPU as well:
docker compose -f docker-compose.Ollama.yml -f docker-compose.Ollama.GPU.yml up --detach
If you want to further speed up running many requests in parallel, you can use vLLM instead of Ollama. You must supply the docker compose file with a Huggingface API key either via an environment variable or editing the docker compose file directy. Look at the vLLM docker compose file for details. If you don't know what a Huggingface API key is, just use Ollama.
To start vLLM:
docker compose -f docker-compose.vLLM.GPU.yml up --detach
To use other vLLM models, change the "--model" argument in the docker compose file to another Huggingface model.
Do:
- Experiment with different prompts to find the most effective instructions for your data.
- Use cell references to dynamically change your prompts based on other data in your spreadsheet.
- Use local models for sensitive and confidential data.
- Refer to the cell data as "context" in your instructions.
- Verify at least a subset of a model's responses. Models will make errors and rely entirely on your input, which may also contain errors.
Don't:
- Don't use Cellm to compute sums, averages, and other numerical calculations. The current generation of LLMs are not designed for mathematical operations. Use Excel's existing functions instead.
- Don't use cloud model providers to process sensitive or confidential data.
- Don't use extremely long prompts or give Cellm complex tasks. A normal chat UI lets you have a back and forth conversation which is better for exploring complex topics.
- Don't use Cellm for tasks that require up-to-date information beyond the AI model's knowledge cutoff date unless you provide the information as context.
My girlfriend was writing a systematic review paper. She had to compare 7.500 papers against inclusion and exclusion criterias. I told her this was a great use case for LLMs but quickly realized that individually copying 7.500 papers in and out of chat windows was a total pain. This sparked the idea to make an AI tool to automate repetitive tasks for people like her who would rather avoid programming.
I think Cellm is really cool because it enables everyone to automate repetitive tasks with AI to a level that was previously available only to programmers.
Fair Core License, Version 1.0, Apache 2.0 Future License
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for cellm
Similar Open Source Tools
cellm
Cellm is an Excel extension that allows users to leverage Large Language Models (LLMs) like ChatGPT within cell formulas. It enables users to extract AI responses to text ranges, making it useful for automating repetitive tasks that involve data processing and analysis. Cellm supports various models from Anthropic, Mistral, OpenAI, and Google, as well as locally hosted models via Llamafiles, Ollama, or vLLM. The tool is designed to simplify the integration of AI capabilities into Excel for tasks such as text classification, data cleaning, content summarization, entity extraction, and more.
BentoDiffusion
BentoDiffusion is a BentoML example project that demonstrates how to serve and deploy diffusion models in the Stable Diffusion (SD) family. These models are specialized in generating and manipulating images based on text prompts. The project provides a guide on using SDXL Turbo as an example, along with instructions on prerequisites, installing dependencies, running the BentoML service, and deploying to BentoCloud. Users can interact with the deployed service using Swagger UI or other methods. Additionally, the project offers the option to choose from various diffusion models available in the repository for deployment.
MARS5-TTS
MARS5 is a novel English speech model (TTS) developed by CAMB.AI, featuring a two-stage AR-NAR pipeline with a unique NAR component. The model can generate speech for various scenarios like sports commentary and anime with just 5 seconds of audio and a text snippet. It allows steering prosody using punctuation and capitalization in the transcript. Speaker identity is specified using an audio reference file, enabling 'deep clone' for improved quality. The model can be used via torch.hub or HuggingFace, supporting both shallow and deep cloning for inference. Checkpoints are provided for AR and NAR models, with hardware requirements of 750M+450M params on GPU. Contributions to improve model stability, performance, and reference audio selection are welcome.
airbroke
Airbroke is an open-source error catcher tool designed for modern web applications. It provides a PostgreSQL-based backend with an Airbrake-compatible HTTP collector endpoint and a React-based frontend for error management. The tool focuses on simplicity, maintaining a small database footprint even under heavy data ingestion. Users can ask AI about issues, replay HTTP exceptions, and save/manage bookmarks for important occurrences. Airbroke supports multiple OAuth providers for secure user authentication and offers occurrence charts for better insights into error occurrences. The tool can be deployed in various ways, including building from source, using Docker images, deploying on Vercel, Render.com, Kubernetes with Helm, or Docker Compose. It requires Node.js, PostgreSQL, and specific system resources for deployment.
AnkiAIUtils
Anki AI Utils is a powerful suite of AI-powered tools designed to enhance your Anki flashcard learning experience by automatically improving cards you struggle with. The tools include features such as adaptive learning, personalized memory hooks, automation readiness, universal compatibility, provider agnosticism, and infinite extensibility. The toolkit consists of tools like Illustrator for creating custom mnemonic images, Reformulator for rephrasing flashcards, Mnemonics Creator for generating memorable mnemonics, Explainer for providing detailed explanations, and Mnemonics Helper for quick mnemonic generation. The project aims to motivate others to package the tools into addons for wider accessibility.
llamabot
LlamaBot is a Pythonic bot interface to Large Language Models (LLMs), providing an easy way to experiment with LLMs in Jupyter notebooks and build Python apps utilizing LLMs. It supports all models available in LiteLLM. Users can access LLMs either through local models with Ollama or by using API providers like OpenAI and Mistral. LlamaBot offers different bot interfaces like SimpleBot, ChatBot, QueryBot, and ImageBot for various tasks such as rephrasing text, maintaining chat history, querying documents, and generating images. The tool also includes CLI demos showcasing its capabilities and supports contributions for new features and bug reports from the community.
aisuite
Aisuite is a simple, unified interface to multiple Generative AI providers. It allows developers to easily interact with various Language Model (LLM) providers like OpenAI, Anthropic, Azure, Google, AWS, and more through a standardized interface. The library focuses on chat completions and provides a thin wrapper around python client libraries, enabling creators to test responses from different LLM providers without changing their code. Aisuite maximizes stability by using HTTP endpoints or SDKs for making calls to the providers. Users can install the base package or specific provider packages, set up API keys, and utilize the library to generate chat completion responses from different models.
serverless-pdf-chat
The serverless-pdf-chat repository contains a sample application that allows users to ask natural language questions of any PDF document they upload. It leverages serverless services like Amazon Bedrock, AWS Lambda, and Amazon DynamoDB to provide text generation and analysis capabilities. The application architecture involves uploading a PDF document to an S3 bucket, extracting metadata, converting text to vectors, and using a LangChain to search for information related to user prompts. The application is not intended for production use and serves as a demonstration and educational tool.
gpt-subtrans
GPT-Subtrans is an open-source subtitle translator that utilizes large language models (LLMs) as translation services. It supports translation between any language pairs that the language model supports. Note that GPT-Subtrans requires an active internet connection, as subtitles are sent to the provider's servers for translation, and their privacy policy applies.
dockershrink
Dockershrink is an AI-powered Commandline Tool designed to help reduce the size of Docker images. It combines traditional Rule-based analysis with Generative AI techniques to optimize Image configurations. The tool supports NodeJS applications and aims to save costs on storage, data transfer, and build times while increasing developer productivity. By automatically applying advanced optimization techniques, Dockershrink simplifies the process for engineers and organizations, resulting in significant savings and efficiency improvements.
HackBot
HackBot is an AI-powered cybersecurity chatbot designed to provide accurate answers to cybersecurity-related queries, conduct code analysis, and scan analysis. It utilizes the Meta-LLama2 AI model through the 'LlamaCpp' library to respond coherently. The chatbot offers features like local AI/Runpod deployment support, cybersecurity chat assistance, interactive interface, clear output presentation, static code analysis, and vulnerability analysis. Users can interact with HackBot through a command-line interface and utilize it for various cybersecurity tasks.
civitai
Civitai is a platform where people can share their stable diffusion models (textual inversions, hypernetworks, aesthetic gradients, VAEs, and any other crazy stuff people do to customize their AI generations), collaborate with others to improve them, and learn from each other's work. The platform allows users to create an account, upload their models, and browse models that have been shared by others. Users can also leave comments and feedback on each other's models to facilitate collaboration and knowledge sharing.
MultiPL-E
MultiPL-E is a system for translating unit test-driven neural code generation benchmarks to new languages. It is part of the BigCode Code Generation LM Harness and allows for evaluating Code LLMs using various benchmarks. The tool supports multiple versions with improvements and new language additions, providing a scalable and polyglot approach to benchmarking neural code generation. Users can access a tutorial for direct usage and explore the dataset of translated prompts on the Hugging Face Hub.
DAILA
DAILA is a unified interface for AI systems in decompilers, supporting various decompilers and AI systems. It allows users to utilize local and remote LLMs, like ChatGPT and Claude, and local models such as VarBERT. DAILA can be used as a decompiler plugin with GUI or as a scripting library. It also provides a Docker container for offline installations and supports tasks like summarizing functions and renaming variables in decompilation.
ezkl
EZKL is a library and command-line tool for doing inference for deep learning models and other computational graphs in a zk-snark (ZKML). It enables the following workflow: 1. Define a computational graph, for instance a neural network (but really any arbitrary set of operations), as you would normally in pytorch or tensorflow. 2. Export the final graph of operations as an .onnx file and some sample inputs to a .json file. 3. Point ezkl to the .onnx and .json files to generate a ZK-SNARK circuit with which you can prove statements such as: > "I ran this publicly available neural network on some private data and it produced this output" > "I ran my private neural network on some public data and it produced this output" > "I correctly ran this publicly available neural network on some public data and it produced this output" In the backend we use the collaboratively-developed Halo2 as a proof system. The generated proofs can then be verified with much less computational resources, including on-chain (with the Ethereum Virtual Machine), in a browser, or on a device.
vigenair
ViGenAiR is a tool that harnesses the power of Generative AI models on Google Cloud Platform to automatically transform long-form Video Ads into shorter variants, targeting different audiences. It generates video, image, and text assets for Demand Gen and YouTube video campaigns. Users can steer the model towards generating desired videos, conduct A/B testing, and benefit from various creative features. The tool offers benefits like diverse inventory, compelling video ads, creative excellence, user control, and performance insights. ViGenAiR works by analyzing video content, splitting it into coherent segments, and generating variants following Google's best practices for effective ads.
For similar tasks
cellm
Cellm is an Excel extension that allows users to leverage Large Language Models (LLMs) like ChatGPT within cell formulas. It enables users to extract AI responses to text ranges, making it useful for automating repetitive tasks that involve data processing and analysis. Cellm supports various models from Anthropic, Mistral, OpenAI, and Google, as well as locally hosted models via Llamafiles, Ollama, or vLLM. The tool is designed to simplify the integration of AI capabilities into Excel for tasks such as text classification, data cleaning, content summarization, entity extraction, and more.
opendataeditor
The Open Data Editor (ODE) is a no-code application to explore, validate and publish data in a simple way. It is an open source project powered by the Frictionless Framework. The ODE is currently available for download and testing in beta.
data-juicer
Data-Juicer is a one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs. It is a systematic & reusable library of 80+ core OPs, 20+ reusable config recipes, and 20+ feature-rich dedicated toolkits, designed to function independently of specific LLM datasets and processing pipelines. Data-Juicer allows detailed data analyses with an automated report generation feature for a deeper understanding of your dataset. Coupled with multi-dimension automatic evaluation capabilities, it supports a timely feedback loop at multiple stages in the LLM development process. Data-Juicer offers tens of pre-built data processing recipes for pre-training, fine-tuning, en, zh, and more scenarios. It provides a speedy data processing pipeline requiring less memory and CPU usage, optimized for maximum productivity. Data-Juicer is flexible & extensible, accommodating most types of data formats and allowing flexible combinations of OPs. It is designed for simplicity, with comprehensive documentation, easy start guides and demo configs, and intuitive configuration with simple adding/removing OPs from existing configs.
OAD
OAD is a powerful open-source tool for analyzing and visualizing data. It provides a user-friendly interface for exploring datasets, generating insights, and creating interactive visualizations. With OAD, users can easily import data from various sources, clean and preprocess data, perform statistical analysis, and create customizable visualizations to communicate findings effectively. Whether you are a data scientist, analyst, or researcher, OAD can help you streamline your data analysis workflow and uncover valuable insights from your data.
Streamline-Analyst
Streamline Analyst is a cutting-edge, open-source application powered by Large Language Models (LLMs) designed to revolutionize data analysis. This Data Analysis Agent effortlessly automates tasks such as data cleaning, preprocessing, and complex operations like identifying target objects, partitioning test sets, and selecting the best-fit models based on your data. With Streamline Analyst, results visualization and evaluation become seamless. It aims to expedite the data analysis process, making it accessible to all, regardless of their expertise in data analysis. The tool is built to empower users to process data and achieve high-quality visualizations with unparalleled efficiency, and to execute high-performance modeling with the best strategies. Future enhancements include Natural Language Processing (NLP), neural networks, and object detection utilizing YOLO, broadening its capabilities to meet diverse data analysis needs.
2021-13th-ironman
This repository is a part of the 13th iT Help Ironman competition, focusing on exploring explainable artificial intelligence (XAI) in machine learning and deep learning. The content covers the basics of XAI, its applications, cases, challenges, and future directions. It also includes practical machine learning algorithms, model deployment, and integration concepts. The author aims to provide detailed resources on AI and share knowledge with the audience through this competition.
crazyai-ml
The 'crazyai-ml' repository is a collection of resources related to machine learning, specifically focusing on explaining artificial intelligence models. It includes articles, code snippets, and tutorials covering various machine learning algorithms, data analysis, model training, and deployment. The content aims to provide a comprehensive guide for beginners in the field of AI, offering practical implementations and insights into popular machine learning packages and model tuning techniques. The repository also addresses the integration of AI models and frontend-backend concepts, making it a valuable resource for individuals interested in AI applications.
ProX
ProX is a lm-based data refinement framework that automates the process of cleaning and improving data used in pre-training large language models. It offers better performance, domain flexibility, efficiency, and cost-effectiveness compared to traditional methods. The framework has been shown to improve model performance by over 2% and boost accuracy by up to 20% in tasks like math. ProX is designed to refine data at scale without the need for manual adjustments, making it a valuable tool for data preprocessing in natural language processing tasks.
For similar jobs
book
Podwise is an AI knowledge management app designed specifically for podcast listeners. With the Podwise platform, you only need to follow your favorite podcasts, such as "Hardcore Hackers". When a program is released, Podwise will use AI to transcribe, extract, summarize, and analyze the podcast content, helping you to break down the hard-core podcast knowledge. At the same time, it is connected to platforms such as Notion, Obsidian, Logseq, and Readwise, embedded in your knowledge management workflow, and integrated with content from other channels including news, newsletters, and blogs, helping you to improve your second brain 🧠.
extractor
Extractor is an AI-powered data extraction library for Laravel that leverages OpenAI's capabilities to effortlessly extract structured data from various sources, including images, PDFs, and emails. It features a convenient wrapper around OpenAI Chat and Completion endpoints, supports multiple input formats, includes a flexible Field Extractor for arbitrary data extraction, and integrates with Textract for OCR functionality. Extractor utilizes JSON Mode from the latest GPT-3.5 and GPT-4 models, providing accurate and efficient data extraction.
Scrapegraph-ai
ScrapeGraphAI is a Python library that uses Large Language Models (LLMs) and direct graph logic to create web scraping pipelines for websites, documents, and XML files. It allows users to extract specific information from web pages by providing a prompt describing the desired data. ScrapeGraphAI supports various LLMs, including Ollama, OpenAI, Gemini, and Docker, enabling users to choose the most suitable model for their needs. The library provides a user-friendly interface through its `SmartScraper` class, which simplifies the process of building and executing scraping pipelines. ScrapeGraphAI is open-source and available on GitHub, with extensive documentation and examples to guide users. It is particularly useful for researchers and data scientists who need to extract structured data from web pages for analysis and exploration.
databerry
Chaindesk is a no-code platform that allows users to easily set up a semantic search system for personal data without technical knowledge. It supports loading data from various sources such as raw text, web pages, files (Word, Excel, PowerPoint, PDF, Markdown, Plain Text), and upcoming support for web sites, Notion, and Airtable. The platform offers a user-friendly interface for managing datastores, querying data via a secure API endpoint, and auto-generating ChatGPT Plugins for each datastore. Chaindesk utilizes a Vector Database (Qdrant), Openai's text-embedding-ada-002 for embeddings, and has a chunk size of 1024 tokens. The technology stack includes Next.js, Joy UI, LangchainJS, PostgreSQL, Prisma, and Qdrant, inspired by the ChatGPT Retrieval Plugin.
auto-news
Auto-News is an automatic news aggregator tool that utilizes Large Language Models (LLM) to pull information from various sources such as Tweets, RSS feeds, YouTube videos, web articles, Reddit, and journal notes. The tool aims to help users efficiently read and filter content based on personal interests, providing a unified reading experience and organizing information effectively. It features feed aggregation with summarization, transcript generation for videos and articles, noise reduction, task organization, and deep dive topic exploration. The tool supports multiple LLM backends, offers weekly top-k aggregations, and can be deployed on Linux/MacOS using docker-compose or Kubernetes.
SemanticFinder
SemanticFinder is a frontend-only live semantic search tool that calculates embeddings and cosine similarity client-side using transformers.js and SOTA embedding models from Huggingface. It allows users to search through large texts like books with pre-indexed examples, customize search parameters, and offers data privacy by keeping input text in the browser. The tool can be used for basic search tasks, analyzing texts for recurring themes, and has potential integrations with various applications like wikis, chat apps, and personal history search. It also provides options for building browser extensions and future ideas for further enhancements and integrations.
1filellm
1filellm is a command-line data aggregation tool designed for LLM ingestion. It aggregates and preprocesses data from various sources into a single text file, facilitating the creation of information-dense prompts for large language models. The tool supports automatic source type detection, handling of multiple file formats, web crawling functionality, integration with Sci-Hub for research paper downloads, text preprocessing, and token count reporting. Users can input local files, directories, GitHub repositories, pull requests, issues, ArXiv papers, YouTube transcripts, web pages, Sci-Hub papers via DOI or PMID. The tool provides uncompressed and compressed text outputs, with the uncompressed text automatically copied to the clipboard for easy pasting into LLMs.
Agently-Daily-News-Collector
Agently Daily News Collector is an open-source project showcasing a workflow powered by the Agent ly AI application development framework. It allows users to generate news collections on various topics by inputting the field topic. The AI agents automatically perform the necessary tasks to generate a high-quality news collection saved in a markdown file. Users can edit settings in the YAML file, install Python and required packages, input their topic idea, and wait for the news collection to be generated. The process involves tasks like outlining, searching, summarizing, and preparing column data. The project dependencies include Agently AI Development Framework, duckduckgo-search, BeautifulSoup4, and PyYAM.