SheetCopilot
We release a general framework for prompting LLMs to manipulate software in a closed-loop manner.
Stars: 82
SheetCopilot is an assistant agent that manipulates spreadsheets by following user commands. It leverages Large Language Models (LLMs) to interact with spreadsheets like a human expert, enabling non-expert users to complete tasks on complex software such as Google Sheets and Excel via a language interface. The tool observes spreadsheet states, polishes generated solutions based on external action documents and error feedback, and aims to improve success rate and efficiency. SheetCopilot offers a dataset with diverse task categories and operations, supporting operations like entry & manipulation, management, formatting, charts, and pivot tables. Users can interact with SheetCopilot in Excel or Google Sheets, executing tasks like calculating revenue, creating pivot tables, and plotting charts. The tool's evaluation includes performance comparisons with leading LLMs and VBA-based methods on specific datasets, showcasing its capabilities in controlling various aspects of a spreadsheet.
README:
Overview • Setup • Dataset • Sheetcopilot Usage • Evaluation • Poster • Paper • Citation
We release the SheetCopilot agent as well as the evaluation environment in this repository.
SheetCopilot is an assistant agent that manipulates spreadsheets by following user commands. It breaks new ground in human-computer interaction, opening up possibilities for enabling non-expert users to complete their mundane work on complex software (e.g. Google Sheets and Excel) via a language interface.
-
[2024/02/24] 🛠 Full SheetCopilot was released.
-
[2023/12/26] 🛠 SheetCopilot equipped with Chain-of-Thoughts and external document retrieval was released.
-
[2023/11/15] ✨ SheetCopilot for Google Sheets was released! You can now use SheetCopilot directly on Google Sheets. Check out our Google Sheets plugin store page and watch this tutorial for installation and usage guide.
-
[2023/10/27] 🛠 More ground truths! We added more reference solutions to our benchmark (
dataset/task_sheet_answers_v2
) to obtain more accurate evaluation results. -
[2023/10/25] SheetCopilot benchmark was open-sourced.
-
[2023/9/22] 🎉 Our paper was accepted to NeurIPS 2023.
-
[2023/5/19] 👷🏻♂️ SheetCopilot was completed.
- Update the function call parsing code to fix the quote parsing errors
- Update API implementations
- Update the evaluation script to improve the checking accuracy
SheetCopilot employs a novel way of directing Large Language Models (LLMs) to manipulate spreadsheets like a human expert. To achieve elegant closed-loop control, SheetCopilot observes the spreadsheet state and polishes generated solutions according to external action documents and error feedback, thereby improving its success rate and efficiency.
SheetCopilot is only available on Windows. Python 3.10 is required to support the asynchronous implementation of SheetCopilot.
conda create -n sheetcopilot python=3.10
pip install -r requirements.txt
We released a spreadsheet task dataset containing 28 workbooks and 221 tasks applied to these workbooks. Each task is given one or more hand-made solutions.
Here is the overview of the dataset:
Our dataset contains diverse task categories and involves a wide range of operations:
Our dataset provides tasks with diverse complexity:
44 operations are supported and more will be added:
- Entry & manipulation: Write, CopyPaste, CutPaste, SetHyperlink, RemoveHyperlink, AutoFill, InsertRow, InsertColumn, Delete, Clear
- Management: Sort, Filter, DeleteFilter, MoveRow, MoveColumn, RemoveDuplicate
- Formatting: SetFormat, DeleteFormat, SetDataType, SetCellMerge, AutoFit, ResizeRowColumn, SetConditionalFormat, SetDataValidation, SetCellLock, FreezePanes, UnfreezePanes
- Chart: CreateChart, SetChartTrendline, SetChartTitle, SetChartHasAxis, SetChartAxis, SetChartHasLegend, SetChartLegend, SetChartType, AddChartErrorBars, RemoveChartErrorBars, AddDataLabels, RemoveDataLabels, SetChartMarker
- Pivot Table: CreatePivotTable, CreateChartFromPivotTable, CreateSheet, RemoveSheet
This dataset can be used to evaluate any spreadsheet agent including RL, LLM-based, or rule-based methods.
In the dataset
folder, dataset.xlsx
lists the 221 tasks, containing the target workbook name, task number, instruction, task categories, and involved atomic actions.
The fields are explained one by one as follows:
-
Sheet Name
: The name of the sheet this task is applied to. -
No.
: The number of this task. -
Context
: The brief description of the sheet this task is applied to. This context will be added to the prompt to inform the LLM of the spreadsheet usage. -
Instructions
: The task content. -
Categories
: Each task is classified into multiple categories according to the atomic actions involved in the task. -
Atomic actions
: The atomic actions used to solve the task -
Seed task
: The number of the seed task (stored indataset/seed_tasks.xlsx
) this task originates from. Our 221 tasks were produced by adapting the 67 seed tasks to apply them to the task sheets (thetask_sheets
folder).
The task_sheets
folder contains the 28 evaluation workbooks these tasks are applied to.
The task_sheet_answers
folder contains the reference solutions of the tasks. Each solution consists of a reference workbook showing the expected outcome of the corresponding task and a *.yaml file listing the necessary sheet states to compare. If the necessary states of the result match those of one of the references, the result is seen as correct. (The v1 version is used in our paper while the v2 version contains more reference solutions collected after our paper was submitted)
Each solution folder (e.g. 1_BoomerangSales
) contains at least 1 reference, which comprises a final spreadsheet (1_BoomerangSales_gt1.xlsx) and a checking list (1_BoomerangSales_gt1_check.yaml). Different tasks need different atomic actions so the checking lists are tailored to corresponding tasks.
The dataset_20Samples.xlsx
file lists the 20 selected tasks used to compare the representative LLMs in our experiments (Table 1).
To dive deeper into the dataset collection details, refer to this tutorial.
This repo releases a simplified version of the SheetCopilot agent, whose state machine can do CoT reasoning and retrieve external documents.
SheetCopilot calls customized atomic actions to execute its generated solutions. We implement each atomic action using the pywin32
library. Please refer to API definitions to see the details. To compare with our SheetCopilot, your own agents should also adopt this action space.
Before running an experiment, please set max tokens, temperature, model_name, and API keys in config/config.yaml
. (As launching multiple Excels still encounters certain unknown issues, we recommend worker=1
. This can finish the evaluation in 1-2 hours.)
You can see two ChatGPT configs in this file - ChatGPT_1 is used to do planning while ChatGPT_2 is used to revise the format of the planning results. You can set use_same_LLM: true
to use ChatGPT_1 to carry out both two jobs.
The underlying implementation of SheetCopilot is a state machine that implements planning by transitioning among 4 states (See the below figure). max_cycle_times
is used to limit the number of times the agent visits the states.
Open an Excel workbook before running this command:
python interaction.py -c config/config.yaml
Now you can enter instructions and wait for SheetCoilot to finish them without human intervention.
To try SheetCopilot quickly, please open dataset/task_sheets/BoomerangSales.xlsx
and then enter these instructions in order:
-
Calculate the revenue for each transaction considering the corresponding retail price and discount.
-
Highlight the Revenue cells greater than 500 in blue text.
-
Create a pivot table in a new sheet to show the counts of the websites on which boomerangs were sold.
-
Plot a bar chart for the pivot table in the same sheet.
-
Set the y-axis title as "Count" and turn off legends.
-
Create another pivot table in a new sheet to show the revenue sums of each product.
-
Plot a pie chart for the pivot table with the chart title "Revenue by Product" in this sheet.
You can also try more vague instructions like: Analyze the data and plot charts for the results.
Afterward, you may see SheetCopilot create pivot tables and plot proper charts for you (see the figure below).
[Caution] Any operation executed by SheetCopilot cannot be undone by clicking the "Undo" button! We strongly recommend that our users use SheetCopilot on GoogleSheets to automate their spreadsheet tasks.
Open a GoogleSheets spreadsheet and install SheetCopilot on the Google Workspace Market like this:
Install SheetCopilot for GoogleSheets
Then you can hack SheetCopilot happily via chatting ...
Let SheetCopilot solve complex tasks for you
You can undo any operations executed by SheetCopilot by just using Ctrl + Z
.
The results generated by any method should be organized like this:
results
└── ([Order]_[Sheet Name])
└── 1_BoomerangSales
| └── ([Order]_[Sheet Name]_[Repeat_No.].xlsx)
| └── 1_BoomerangSales_log.yaml
...
└── 9_BoomerangSales
└── 10_DemographicProfile
...
└── 17_Dragging
...
└── 24_Dragging
...
└── 221_XYScatterPlot
[Order] is the row index of the task minus 1 and [Sheet Name] is the items of column A in dataset.xlsx
. [Repeat_NO.] is used to differentiate multiple repeats of the same task. If you run each task only once (controlled by repeat
in the config file), [Repeat_NO.] is 1.
1_BoomerangSales_log.yaml
is the running log of the task saving the content of the planning process. Likewise, your method should also record a log for each task.
You can also use the "[No.]_[Sheet Name]" naming convention as follows ([No.] are the items of column B in dataset.xlsx
):
results
└── ([No.]_[Sheet Name])
└── 1_BoomerangSales
| └── ([No.]_[Sheet Name]_[Repeat_No.].xlsx)
| └── 1_BoomerangSales_log.yaml
...
└── 9_BoomerangSales
...
└── 1_Dragging
...
└── 8_Dragging
...
You should set the global variable USE_NO_AND_SHEETNAME
in evaluation.py
as True to use such a naming convention.
As different agents may present plans in various formats, we recommend that each method outputs each step using this Chain-of-Thoughts (CoT) format:
Step X. [Thought]
Action API: @[Action call]@
For example,
Step 3. Fill the formula to other cells.
Action API: @AutoFill(source="Sheet1!A2", destination="Sheet1!A2:A36")@
agent/SheetCopilot_example_logs
shows examples of the required log format (use the "[Order]_[Sheet Name]" naming convention).
Specify the correct paths in agent/config/config.yaml
and then run this code within the agent
folder to evaluate your results:
python evaluation.py
The evaluation results will be recorded in a file named eval_result.yaml
under the result folder.
The evaluation can restart from a checkpoint if it has been aborted. If you want to re-evaluate, just delete the eval_result.yaml
in the result folder.
Important: NOTE that
- Every new sheet must be created to the left of the very first sheet for correct matching with the references since sheet names are not to be checked.
- The sheet content must start from cell A1 and each sheet is required to contain contiguous tables.
The performances of SheetCopilot with 3 leading LLMs as its back-end on dataset/dataset_20Samples.xlsx
.
Models | Exec@1 | Pass@1 | A50 | A90 |
---|---|---|---|---|
GPT-3.5-Turbo | 85.0% | 45.0% | 2.00 | 4.50 |
GPT-4 | 65.0% | 55.0% | 1.33 | 2.00 |
Claude | 80.0% | 40.0% | 1.50 | 4.40 |
The performances of SheetCopilot and a VBA-based method were evaluated on dataset/dataset.xlsx
using dataset/task_sheet_answers
as the ground truths. (Note: as we also included the functionally correct results generated by GPT-3.5-Turbo to dataset/task_sheet_answers_v2
, the evaluation results for this model remain the same whether you use v1 or v2 ground truths.)
Methods | Exec@1 | Pass@1 |
---|---|---|
GPT-3.5-Turbo | 87.3% | 44.3% |
VBA-based | 77.8% | 16.3% |
(1) Manipulation: Writing values and formulas, deleting cells, inserting a row/column, auto-filling, copy-pasting values, find-and-replacing, setting hyperlinks, removing duplicates, creating sheets, clearing formats.
(2) Management: Sorting, filtering, and freezing panes.
(3) Formatting: Setting format and conditional format (font, bold, italic, underline, text color, and fill color), setting data type (date, text, number, currency, time, general, percentage), and merging.
(4) Charts: Creating charts, creating charts from pivot tables, setting chart title/axis title/legends/chart type/marker/trendline/data labels.
(5) Pivot table: Creating pivot tables.
(More operations will be added once the developers finish testing them. Besides, you can raise issues to ask for more supported operations or pull requests to upload your implementations.)
This video shows that SheetCopilot conducts GDP data analysis successfully.
The video below shows SheetCopilot deployed on Google Sheets.
You can upload task_sheets/BoomerangSales.xlsx
and type in these instructions to reproduce the results in the demo:
- Calculate the revenue for each transaction in the sales table considering the corresponding retail price and discount.
- Highlight the Revenue cells greater than 500 in blue text.
- Create a pivot table in a new sheet to show the counts of the websites on which boomerangs were sold.
- Plot a bar chart for the pivot table in the same sheet.
- Set the y-axis title as "Count" and turn off legends.
- Create another pivot table in a new sheet to show the revenue sums of each product.
- Plot a pie chart for the pivot table with chart title "Revenue by Product" in this sheet.
SheetCopilot and the dataset can only be used for non-commercial purposes.
If you use the SheetCopilot agent and benchmark, feel free to cite us.
@inproceedings{li_sheetcopilot_2023,
title = {{SheetCopilot}: {Bringing} {Software} {Productivity} to the {Next} {Level} through {Large} {Language} {Models}},
volume = {36},
url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/0ff30c4bf31db0119a6219e0d250e037-Paper-Conference.pdf},
booktitle = {Advances in {Neural} {Information} {Processing} {Systems}},
publisher = {Curran Associates, Inc.},
author = {Li, Hongxin and Su, Jingran and Chen, Yuntao and Li, Qing and ZHANG, ZHAO-XIANG},
editor = {Oh, A. and Neumann, T. and Globerson, A. and Saenko, K. and Hardt, M. and Levine, S.},
year = {2023},
pages = {4952--4984},
}
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for SheetCopilot
Similar Open Source Tools
SheetCopilot
SheetCopilot is an assistant agent that manipulates spreadsheets by following user commands. It leverages Large Language Models (LLMs) to interact with spreadsheets like a human expert, enabling non-expert users to complete tasks on complex software such as Google Sheets and Excel via a language interface. The tool observes spreadsheet states, polishes generated solutions based on external action documents and error feedback, and aims to improve success rate and efficiency. SheetCopilot offers a dataset with diverse task categories and operations, supporting operations like entry & manipulation, management, formatting, charts, and pivot tables. Users can interact with SheetCopilot in Excel or Google Sheets, executing tasks like calculating revenue, creating pivot tables, and plotting charts. The tool's evaluation includes performance comparisons with leading LLMs and VBA-based methods on specific datasets, showcasing its capabilities in controlling various aspects of a spreadsheet.
RepoAgent
RepoAgent is an LLM-powered framework designed for repository-level code documentation generation. It automates the process of detecting changes in Git repositories, analyzing code structure through AST, identifying inter-object relationships, replacing Markdown content, and executing multi-threaded operations. The tool aims to assist developers in understanding and maintaining codebases by providing comprehensive documentation, ultimately improving efficiency and saving time.
MME-RealWorld
MME-RealWorld is a benchmark designed to address real-world applications with practical relevance, featuring 13,366 high-resolution images and 29,429 annotations across 43 tasks. It aims to provide substantial recognition challenges and overcome common barriers in existing Multimodal Large Language Model benchmarks, such as small data scale, restricted data quality, and insufficient task difficulty. The dataset offers advantages in data scale, data quality, task difficulty, and real-world utility compared to existing benchmarks. It also includes a Chinese version with additional images and QA pairs focused on Chinese scenarios.
storm
STORM is a LLM system that writes Wikipedia-like articles from scratch based on Internet search. While the system cannot produce publication-ready articles that often require a significant number of edits, experienced Wikipedia editors have found it helpful in their pre-writing stage. **Try out our [live research preview](https://storm.genie.stanford.edu/) to see how STORM can help your knowledge exploration journey and please provide feedback to help us improve the system 🙏!**
Taiyi-LLM
Taiyi (太一) is a bilingual large language model fine-tuned for diverse biomedical tasks. It aims to facilitate communication between healthcare professionals and patients, provide medical information, and assist in diagnosis, biomedical knowledge discovery, drug development, and personalized healthcare solutions. The model is based on the Qwen-7B-base model and has been fine-tuned using rich bilingual instruction data. It covers tasks such as question answering, biomedical dialogue, medical report generation, biomedical information extraction, machine translation, title generation, text classification, and text semantic similarity. The project also provides standardized data formats, model training details, model inference guidelines, and overall performance metrics across various BioNLP tasks.
LLMeBench
LLMeBench is a flexible framework designed for accelerating benchmarking of Large Language Models (LLMs) in the field of Natural Language Processing (NLP). It supports evaluation of various NLP tasks using model providers like OpenAI, HuggingFace Inference API, and Petals. The framework is customizable for different NLP tasks, LLM models, and datasets across multiple languages. It features extensive caching capabilities, supports zero- and few-shot learning paradigms, and allows on-the-fly dataset download and caching. LLMeBench is open-source and continuously expanding to support new models accessible through APIs.
rosa
ROSA is an AI Agent designed to interact with ROS-based robotics systems using natural language queries. It can generate system reports, read and parse ROS log files, adapt to new robots, and run various ROS commands using natural language. The tool is versatile for robotics research and development, providing an easy way to interact with robots and the ROS environment.
ollama-grid-search
A Rust based tool to evaluate LLM models, prompts and model params. It automates the process of selecting the best model parameters, given an LLM model and a prompt, iterating over the possible combinations and letting the user visually inspect the results. The tool assumes the user has Ollama installed and serving endpoints, either in `localhost` or in a remote server. Key features include: * Automatically fetches models from local or remote Ollama servers * Iterates over different models and params to generate inferences * A/B test prompts on different models simultaneously * Allows multiple iterations for each combination of parameters * Makes synchronous inference calls to avoid spamming servers * Optionally outputs inference parameters and response metadata (inference time, tokens and tokens/s) * Refetching of individual inference calls * Model selection can be filtered by name * List experiments which can be downloaded in JSON format * Configurable inference timeout * Custom default parameters and system prompts can be defined in settings
guidellm
GuideLLM is a powerful tool for evaluating and optimizing the deployment of large language models (LLMs). By simulating real-world inference workloads, GuideLLM helps users gauge the performance, resource needs, and cost implications of deploying LLMs on various hardware configurations. This approach ensures efficient, scalable, and cost-effective LLM inference serving while maintaining high service quality. Key features include performance evaluation, resource optimization, cost estimation, and scalability testing.
fuse-med-ml
FuseMedML is a Python framework designed to accelerate machine learning-based discovery in the medical field by promoting code reuse. It provides a flexible design concept where data is stored in a nested dictionary, allowing easy handling of multi-modality information. The framework includes components for creating custom models, loss functions, metrics, and data processing operators. Additionally, FuseMedML offers 'batteries included' key components such as fuse.data for data processing, fuse.eval for model evaluation, and fuse.dl for reusable deep learning components. It supports PyTorch and PyTorch Lightning libraries and encourages the creation of domain extensions for specific medical domains.
mosec
Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and the efficient online service API. * **Highly performant** : web layer and task coordination built with Rust 🦀, which offers blazing speed in addition to efficient CPU utilization powered by async I/O * **Ease of use** : user interface purely in Python 🐍, by which users can serve their models in an ML framework-agnostic manner using the same code as they do for offline testing * **Dynamic batching** : aggregate requests from different users for batched inference and distribute results back * **Pipelined stages** : spawn multiple processes for pipelined stages to handle CPU/GPU/IO mixed workloads * **Cloud friendly** : designed to run in the cloud, with the model warmup, graceful shutdown, and Prometheus monitoring metrics, easily managed by Kubernetes or any container orchestration systems * **Do one thing well** : focus on the online serving part, users can pay attention to the model optimization and business logic
zshot
Zshot is a highly customizable framework for performing Zero and Few shot named entity and relationships recognition. It can be used for mentions extraction, wikification, zero and few shot named entity recognition, zero and few shot named relationship recognition, and visualization of zero-shot NER and RE extraction. The framework consists of two main components: the mentions extractor and the linker. There are multiple mentions extractors and linkers available, each serving a specific purpose. Zshot also includes a relations extractor and a knowledge extractor for extracting relations among entities and performing entity classification. The tool requires Python 3.6+ and dependencies like spacy, torch, transformers, evaluate, and datasets for evaluation over datasets like OntoNotes. Optional dependencies include flair and blink for additional functionalities. Zshot provides examples, tutorials, and evaluation methods to assess the performance of the components.
VoiceStreamAI
VoiceStreamAI is a Python 3-based server and JavaScript client solution for near-realtime audio streaming and transcription using WebSocket. It employs Huggingface's Voice Activity Detection (VAD) and OpenAI's Whisper model for accurate speech recognition. The system features real-time audio streaming, modular design for easy integration of VAD and ASR technologies, customizable audio chunk processing strategies, support for multilingual transcription, and secure sockets support. It uses a factory and strategy pattern implementation for flexible component management and provides a unit testing framework for robust development.
LazyLLM
LazyLLM is a low-code development tool for building complex AI applications with multiple agents. It assists developers in building AI applications at a low cost and continuously optimizing their performance. The tool provides a convenient workflow for application development and offers standard processes and tools for various stages of application development. Users can quickly prototype applications with LazyLLM, analyze bad cases with scenario task data, and iteratively optimize key components to enhance the overall application performance. LazyLLM aims to simplify the AI application development process and provide flexibility for both beginners and experts to create high-quality applications.
NeMo-Guardrails
NeMo Guardrails is an open-source toolkit for easily adding _programmable guardrails_ to LLM-based conversational applications. Guardrails (or "rails" for short) are specific ways of controlling the output of a large language model, such as not talking about politics, responding in a particular way to specific user requests, following a predefined dialog path, using a particular language style, extracting structured data, and more.
For similar tasks
SheetCopilot
SheetCopilot is an assistant agent that manipulates spreadsheets by following user commands. It leverages Large Language Models (LLMs) to interact with spreadsheets like a human expert, enabling non-expert users to complete tasks on complex software such as Google Sheets and Excel via a language interface. The tool observes spreadsheet states, polishes generated solutions based on external action documents and error feedback, and aims to improve success rate and efficiency. SheetCopilot offers a dataset with diverse task categories and operations, supporting operations like entry & manipulation, management, formatting, charts, and pivot tables. Users can interact with SheetCopilot in Excel or Google Sheets, executing tasks like calculating revenue, creating pivot tables, and plotting charts. The tool's evaluation includes performance comparisons with leading LLMs and VBA-based methods on specific datasets, showcasing its capabilities in controlling various aspects of a spreadsheet.
minio
MinIO is a High Performance Object Storage released under GNU Affero General Public License v3.0. It is API compatible with Amazon S3 cloud storage service. Use MinIO to build high performance infrastructure for machine learning, analytics and application data workloads.
airbyte
Airbyte is an open-source data integration platform that makes it easy to move data from any source to any destination. With Airbyte, you can build and manage data pipelines without writing any code. Airbyte provides a library of pre-built connectors that make it easy to connect to popular data sources and destinations. You can also create your own connectors using Airbyte's no-code Connector Builder or low-code CDK. Airbyte is used by data engineers and analysts at companies of all sizes to build and manage their data pipelines.
labelbox-python
Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.
telemetry-airflow
This repository codifies the Airflow cluster that is deployed at workflow.telemetry.mozilla.org (behind SSO) and commonly referred to as "WTMO" or simply "Airflow". Some links relevant to users and developers of WTMO: * The `dags` directory in this repository contains some custom DAG definitions * Many of the DAGs registered with WTMO don't live in this repository, but are instead generated from ETL task definitions in bigquery-etl * The Data SRE team maintains a WTMO Developer Guide (behind SSO)
fasttrackml
FastTrackML is an experiment tracking server focused on speed and scalability, fully compatible with MLFlow. It provides a user-friendly interface to track and visualize your machine learning experiments, making it easy to compare different models and identify the best performing ones. FastTrackML is open source and can be easily installed and run with pip or Docker. It is also compatible with the MLFlow Python package, making it easy to integrate with your existing MLFlow workflows.
vertex-ai-samples
The Google Cloud Vertex AI sample repository contains notebooks and community content that demonstrate how to develop and manage ML workflows using Google Cloud Vertex AI.
argilla
Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency. It helps users improve AI output quality through data quality, take control of their data and models, and improve efficiency by quickly iterating on the right data and models. Argilla is an open-source community-driven project that provides tools for achieving and maintaining high-quality data standards, with a focus on NLP and LLMs. It is used by AI teams from companies like the Red Cross, Loris.ai, and Prolific to improve the quality and efficiency of AI projects.
For similar jobs
SheetCopilot
SheetCopilot is an assistant agent that manipulates spreadsheets by following user commands. It leverages Large Language Models (LLMs) to interact with spreadsheets like a human expert, enabling non-expert users to complete tasks on complex software such as Google Sheets and Excel via a language interface. The tool observes spreadsheet states, polishes generated solutions based on external action documents and error feedback, and aims to improve success rate and efficiency. SheetCopilot offers a dataset with diverse task categories and operations, supporting operations like entry & manipulation, management, formatting, charts, and pivot tables. Users can interact with SheetCopilot in Excel or Google Sheets, executing tasks like calculating revenue, creating pivot tables, and plotting charts. The tool's evaluation includes performance comparisons with leading LLMs and VBA-based methods on specific datasets, showcasing its capabilities in controlling various aspects of a spreadsheet.
LangGraph-Expense-Tracker
LangGraph Expense tracker is a small project that explores the possibilities of LangGraph. It allows users to send pictures of invoices, which are then structured and categorized into expenses and stored in a database. The project includes functionalities for invoice extraction, database setup, and API configuration. It consists of various modules for categorizing expenses, creating database tables, and running the API. The database schema includes tables for categories, payment methods, and expenses, each with specific columns to track transaction details. The API documentation is available for reference, and the project utilizes LangChain for processing expense data.
receipt-scanner
The receipt-scanner repository is an AI-Powered Receipt and Invoice Scanner for Laravel that allows users to easily extract structured receipt data from images, PDFs, and emails within their Laravel application using OpenAI. It provides a light wrapper around OpenAI Chat and Completion endpoints, supports various input formats, and integrates with Textract for OCR functionality. Users can install the package via composer, publish configuration files, and use it to extract data from plain text, PDFs, images, Word documents, and web content. The scanned receipt data is parsed into a DTO structure with main classes like Receipt, Merchant, and LineItem.
actual-ai
Actual AI is a project designed to categorize uncategorized transactions for Actual Budget using OpenAI or OpenAI specification compatible API. It sends requests to the OpenAI API to classify transactions based on their description, amount, and notes. Transactions that cannot be classified are marked as 'not guessed' in notes. The tool allows users to sync accounts before classification and classify transactions on a cron schedule. Guessed transactions are marked in notes for easy review.
vscode-dbt-power-user
The vscode-dbt-power-user is an open-source extension that enhances the functionality of Visual Studio Code to seamlessly work with dbt™. It provides features such as auto-complete for dbt™ code, previewing query results, column lineage visualization, generating dbt™ models, documentation generation, deferring model builds, running parent/child models and tests with a click, compiled query preview and explanation, project health check, SQL validation, BigQuery cost estimation, and other features like dbt™ logs viewer. The extension is fully compatible with dev containers, code spaces, and remote extensions, supporting dbt™ versions above 1.0.
wren-engine
Wren Engine is a semantic engine designed to serve as the backbone of the semantic layer for LLMs. It simplifies the user experience by translating complex data structures into a business-friendly format, enabling end-users to interact with data using familiar terminology. The engine powers the semantic layer with advanced capabilities to define and manage modeling definitions, metadata, schema, data relationships, and logic behind calculations and aggregations through an analytics-as-code design approach. By leveraging Wren Engine, organizations can ensure a developer-friendly semantic layer that reflects nuanced data relationships and dynamics, facilitating more informed decision-making and strategic insights.
mslearn-knowledge-mining
The mslearn-knowledge-mining repository contains lab files for Azure AI Knowledge Mining modules. It provides resources for learning and implementing knowledge mining techniques using Azure AI services. The repository is designed to help users explore and understand how to leverage AI for knowledge mining purposes within the Azure ecosystem.
extension-gen-ai
The Looker GenAI Extension provides code examples and resources for building a Looker Extension that integrates with Vertex AI Large Language Models (LLMs). Users can leverage the power of LLMs to enhance data exploration and analysis within Looker. The extension offers generative explore functionality to ask natural language questions about data and generative insights on dashboards to analyze data by asking questions. It leverages components like BQML Remote Models, BQML Remote UDF with Vertex AI, and Custom Fine Tune Model for different integration options. Deployment involves setting up infrastructure with Terraform and deploying the Looker Extension by creating a Looker project, copying extension files, configuring BigQuery connection, connecting to Git, and testing the extension. Users can save example prompts and configure user settings for the extension. Development of the Looker Extension environment includes installing dependencies, starting the development server, and building for production.