ShortcutsBench

ShortcutsBench

ShortcutsBench: A Large-Scale Real-World Benchmark for API-Based Agents

Stars: 72

Visit
 screenshot

ShortcutsBench is a project focused on collecting and analyzing workflows created in the Shortcuts app, providing a dataset of shortcut metadata, source files, and API information. It aims to study the integration of large language models with Apple devices, particularly focusing on the role of shortcuts in enhancing user experience. The project offers insights for Shortcuts users, enthusiasts, and researchers to explore, customize workflows, and study automated workflows, low-code programming, and API-based agents.

README:

๐Ÿ”งShortcutsBench๐Ÿ“ฑ

Dialogues Dialogues Dialogues Dialogues Dialogues

Read this in ไธญๆ–‡.

What are Shortcuts?

Shortcuts are workflows built by developers in the Shortcuts app using a user-friendly graphical interface ๐Ÿ–ผ๏ธ with the provided basic actions. Apple describes them as "a quick way to get one or more tasks done with your apps." ๐Ÿ“ฑ

Project Task List (Continuously Updated) ๐Ÿ“‹

All data, data acquisition processes, data generated during cleaning, cleaning scripts, experiment scripts, results, and related files can be found in the following documents: deves_dataset/dataset_src/README.md (English) or Chinese, deves_dataset/dataset_src_valid_apis/README.md (English) or Chinese, and experiments/README.md (English) or Chinese.

  • [x] ShortcutsBench Paper Main Text
  • [x] ShortcutsBench Paper Appendix
  • [x] Scripts for Data Acquisition, Data Cleaning and Processing, Experiment Code, and Experiment Results
  • [x] We provide shortcuts with bilingual explanations for regular users: listed in users_dataset/${website name}/${category name}/README.md (English) or users_dataset/${website name}/${category name}/README_ZH.md (Chinese). Regular users can find suitable shortcuts for their work or life in our repository, which they can import into the Shortcuts app on Apple devices. Each shortcut includes:
    1. The iCloud link for the shortcut
    2. A description of the shortcut's functionality
    3. The source of the shortcut
  • For Shortcut Researchers: ShortcutsBench provides: (1) Shortcuts (i.e., sequences of actions in golden); (2) Queries (i.e., tasks assigned to the agent); (3) APIs (i.e., tools available to the agent).
    • [x] Shortcuts

    • [x] Queries. The generated queries are shown in generated_success_queries.json, which can be obtained from Google Drive or Baidu Cloud (password: shortcutsbench).

      The queries are generated based on 1_final_detailed_records_filter_apis_leq_30.json.

    • [x] APIs. The obtained APIs are shown in 4_api_json_filter.json, which can be obtained from Google Drive or Baidu Cloud (password: shortcutsbench).

      4_api_json_filter.json has been manually deduplicated, but a few duplicates remain. The raw unprocessed files extracted directly from the app are in 4_api_json.json, which can be obtained from Google Drive or Baidu Cloud (password: shortcutsbench).

How can this project help you?

The Apple Developer Conference WWDC'24 introduced a lot of AI features on Apple devices ๐Ÿค–. We are very interested in how Apple combines large language models like ChatGPT with devices to provide users with a smarter experience ๐Ÿ’ก. In this process, shortcuts will play a significant role! ๐Ÿš€

As a Shortcut User and Enthusiast ๐Ÿ“ฑ

You can find your favorite shortcuts in this dataset ๐Ÿ“ฑ to help you complete various complex tasks with one click! For example:

As a Researcher ๐Ÿ”ฌ

  • Research on building automated workflows: Shortcuts are essentially workflows composed of a series of API calls (actions) provided by Apple and third-party apps ๐Ÿ”.
  • Research on low-code programming: Shortcuts include features like branches, loops, and variable assignments, while having a user-friendly graphical interface ๐Ÿ–ฅ๏ธ.
  • Research on API-based agents: Enabling large language models to autonomously decide whether, when, and how to use APIs based on user queries (tasks) ๐Ÿ”ง.
  • Research on fine-tuning large language models using shortcuts to closely integrate language models with phones, computers, and smartwatches, achieving the vision of an "operating system based on large language models" ๐Ÿ“ˆ.
  • ......

๐ŸŒŸAdvantages of ShortcutsBench Over Existing API-Based Agent Datasets๐ŸŒŸ

ShortcutsBench has significant advantages in terms of the authenticity, richness, and complexity of APIs, the validity of queries and corresponding action sequences, the accurate filling of parameter values, the awareness of obtaining information from the system or users, and the overall scale.

To our knowledge, ShortcutsBench is the first large-scale agent benchmark based on real APIs, considering APIs, queries, and corresponding action sequences. ShortcutsBench provides a rich set of real APIs, queries of varying difficulty and task types, high-quality human-annotated action sequences (provided by shortcut developers), and queries from real user needs. Additionally, it offers precise parameter value filling, including raw data types, enumeration types, and using outputs from previous actions as parameter values, and evaluates the agent's awareness of requesting necessary information from the system or users. Moreover, the scale of APIs, queries, and corresponding action sequences in ShortcutsBench rivals or even surpasses benchmarks and datasets created by LLMs or modified from existing datasets. A comprehensive comparison between ShortcutsBench and existing benchmarks/datasets is shown in the table below.

Example Image

If you find this project helpful, please give us a Star โญ๏ธ! Thank you for your support! ๐Ÿ™

Keywords: Shortcuts, Apple, WWDC'24, Siri, iOS, macOS, watchOS, Workflow, API Calls, Low-Code Programming, Agent, Large Language Model

User Guide for Shortcuts (For Users) ๐Ÿ“ฑ

Search for the Shortcut You Want ๐Ÿ”

In this repository, the users_dataset/${website name}/${category name}/README.md file records the metadata of all shortcuts in the category, including name, description, iCloud download link, etc. Each README.md file follows this structure:

### Name: Wine Shops # Shortcut Name
- URL: https://www.icloud.com/shortcuts/78ffd18288fd4da286bfd570993ea46e # iCloud Link
- Source: https://shortcutsgallery.com # Source
- Description: Look for Wine shops near you # Description

Use the shortcut Ctrl + F to search by keyword in the shortcut name directly in your browser ๐Ÿ”Ž. You can also visit Shortcut Collection Sites to search for the shortcuts you want ๐ŸŒ.

Import the Found Shortcut ๐Ÿ“ฅ

On your Apple device, click the iCloud link in the URL, and the shortcut will automatically open and be imported into your Shortcuts app ๐Ÿ“ฒ.

Download Shortcut Source Files

Besides downloading shortcuts one by one using the iCloud links, you can directly get the complete data from the following links:

Data Sources and Links ๐ŸŒ

Data Source Metadata Location Cloud Link
Matthewcassinelli Location in this repository Google Drive Link | Baidu Cloud Link
Routinehub Location in this repository Google Drive Link | Baidu Cloud Link
MacStories Location in this repository Google Drive Link | Baidu Cloud Link
ShareShortcuts Location in this repository Google Drive Link | Baidu Cloud Link
ShortcutsGallery Location in this repository Google Drive Link | Baidu Cloud Link
iSpazio Location in this repository Google Drive Link | Baidu Cloud Link
Jiejingku Location in this repository Google Drive Link | Baidu Cloud Link
SSPai Location in this repository Google Drive Link | Baidu Cloud Link
Jiejing.fun Location in this repository Google Drive Link | Baidu Cloud Link
Kejicut Location in this repository Google Drive Link | Baidu Cloud Link
RCuts Location in this repository Google Drive Link | Baidu Cloud Link

Introduction to Shortcut Source Files

The shortcut source data in the cloud drive is organized in the following directory structure:

users_dataset/
โ”œโ”€โ”€ matthewcassinelli.com_sirishortcuts_library_free # Website Name
โ”‚   โ”œโ”€โ”€ file1
โ”‚   โ”œโ”€โ”€ file2
โ”‚   โ””โ”€โ”€ file3

or

users_dataset/
โ”œโ”€โ”€ jiejingku.net # Website Name
โ”‚   โ”œโ”€โ”€ category1 # Category
โ”‚   โ”‚   โ”œโ”€โ”€ file1 # Each specific shortcut
โ”‚   โ”‚   โ””โ”€โ”€ file2
โ”‚   โ”œโ”€โ”€ category2
โ”‚   โ”‚   โ””โ”€โ”€ file3

Each file represents a shortcut. The file name is generated by simply processing the shortcut name, using the following code:

file_name = re.sub(r'[^a-zA-Z0-9]', '_', name)

The shortcut source files we provide are in JSON format, whereas shortcuts exported from Apple devices are in the form of iCloud links (shared as links) or encrypted shortcut files with the .shortcut extension.

To import a shortcut source file into the Shortcuts app on macOS, follow these steps:

  • Convert the JSON file format to PLIST format ๐Ÿ“‘:
    import xml.etree.ElementTree as ET
    
    def parse_element(element):
      """
      Recursively parse XML elements and return dictionaries and lists.
      """
      if element.tag == 'dict':
          return {element[i].text: parse_element(element[i+1]) for i in range(0, len(element), 2)}
      elif element.tag == 'array':
          return [parse_element(child) for child in element]
      elif element.tag == 'true':
          return True
      elif element.tag == 'false':
          return False 
      elif element.tag == 'integer':
          return int(element.text)
      elif element.tag == 'string':
          return element.text
      elif element.tag == 'real':
          return float(element.text)
      else:
          raise ValueError("Unsupported tag: " + element.tag)
    
    tree = ET.parse(file_path)
    root_element = tree.getroot()
    parsed_data = parse_element(root_element[0])
    data = parsed_data
    
    save_path = "./"
    with open(save_path, 'w') as f:
        json.dump(data, f, indent=4)
  • Sign the PLIST file ๐Ÿ” using shortcuts sign --mode anyone --input $input_file --output $output_file, replacing $input_file and $output_file with the actual file paths.
  • Import the signed file into the Shortcuts app ๐Ÿ“ฒ.

ShortcutsBench Dataset Construction Guide ๐Ÿ“š

Data Acquisition Process

We detail the construction process of ShortcutsBench in the main text of our paper. For more details, please refer to our paper. Below are some additional details.

How to use shortcuts? How to share shortcuts? How to view the source files of shortcuts?

  1. Import shortcuts into the Shortcuts app.

    You can import shortcuts into the Shortcuts app on Apple devices by clicking the iCloud link and using the shortcut as a regular user.

  2. Share shortcuts.

    • You can share the shortcut as an iCloud link using the Share option in the Shortcuts app on macOS or iOS.
    • You can share the shortcut as a source file using the Share option in the Shortcuts app on macOS, resulting in a shortcut file with the .shortcut extension. Note: The shared source file is encrypted by Apple and cannot be directly parsed using the plist package in Python.
  3. Decrypt single or multiple shortcuts. If you want to decrypt a specific shortcut, you can use the following shortcuts to decrypt other shortcuts. The decrypted files will be in plist format.

    To make it easier to read, you can choose to convert the plist files to json format. The shortcut source files we provide are all in json format.

  4. How to acquire shortcut source files on a large scale?

    Instead of using Get Plist and Get Plist Loop to parse shortcuts, we follow these two steps for quicker and more efficient mass acquisition of shortcut source files:

    1. Obtain iCloud links in the format https://www.icloud.com/shortcuts/${unique_id}.
    2. Request partial metadata of the shortcut from https://www.icloud.com/shortcuts/api/records/${unique_id}, including the shortcut name and download link for the source file.
    3. Use the download link cur_dict["fields"]["shortcut"]["value"]["downloadURL"] obtained in the previous step to request the source file of the shortcut. Note: The download link expires quickly, so you need to use it promptly.

    The directly downloaded source file is in plist format. You can choose to convert the plist format to json format.

    The following code (simplified) demonstrates the entire process, with the final response_json being the json format shortcut source file:

    response = requests.get(f"https://www.icloud.com/shortcuts/api/records/{unique_id}")
    
    cur_dict = response.json()
    downloadURL = cur_dict["fields"]["shortcut"]["value"]["downloadURL"]
    new_response = requests.get(downloadURL)
    # Convert using the plist package to json and store in response_json
    response_json = biplist.readPlistFromString(new_response.content)

License Statement ๐Ÿ“œ

All code and datasets in this project are licensed under the Apache License 2.0. This means you are free to use, copy, modify, and distribute the content of this project, but must comply with the following conditions:

  • Copyright Notice: The original copyright notice and license statement must be included in all copies of the project.
  • State Changes: If you modify the code, you must indicate the changes in any modified files.
  • Trademark Use: This license does not grant the right to use project trademarks, service marks, or trade names.

For the full text of the license, please see LICENSE.

Additionally, you must comply with the license agreements of the shortcut sharing sites that provided the data sources for this project.

Citation

If you find this project helpful, please consider citing our work:

@misc{
    shen2024shortcutsbenchlargescalerealworldbenchmark,
    title={ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents}, 
    author={Haiyang Shen and Yue Li and Desong Meng and Dongqi Cai and Sheng Qi and Li Zhang and Mengwei Xu and Yun Ma},
    year={2024},
    eprint={2407.00132},
    archivePrefix={arXiv},
    primaryClass={cs.SE},
    url={https://arxiv.org/abs/2407.00132}, 
}

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for ShortcutsBench

Similar Open Source Tools

For similar tasks

For similar jobs