playword

Automate browsers with AI to boost productivity and make testing more enjoyable!

Stars: 52

Visit

PlayWord is a tool designed to supercharge web test automation experience with AI. It provides core features such as enabling browser operations and validations using natural language inputs, as well as monitoring interface to record and dry-run test steps. PlayWord supports multiple AI services including Anthropic, Google, and OpenAI, allowing users to select the appropriate provider based on their requirements. The tool also offers features like assertion handling, frame handling, custom variables, test recordings, and an Observer module to track user interactions on web pages. With PlayWord, users can interact with web pages using natural language commands, reducing the need to worry about element locators and providing AI-powered adaptation to UI changes.

README:

PlayWord

Supercharge your web test automation experience with AI.

📦 Installation

Choose the package that best suits your needs.

@playword/core

The @playword/core package provides the core features of PlayWord and can be used as Node.js modules.

It includes the following modules:

PlayWord: Enables browser operations and validations using natural language inputs to interact with web pages.
Observer: Mounts a monitoring interface on the browser to record and dry-run captured test steps.

# Install with any package manager you prefer
npm install @playword/core --save-dev

@playword/cli

The @playword/cli package enables you to use the features of PlayWord directly through the command line.

For ease of use, I recommend running this package with npx.

# Run a PlayWord test
npx @playword/cli test --headed --verbose -b webkit

# Start the Observer
npx @playword/cli observe -b chromium -v

See documentation for usage examples and options.

📘 Getting Started

PlayWord supports multiple AI services, including Anthropic, Google, and OpenAI. You can select the appropriate provider based on your requirements.

OpenAI

There are two ways to provide the required API key to PlayWord:

1. Export the API key as an environment variable:

export OPENAI_API_KEY="sk-..."

2. Pass the API key as a parameter during initialization:

import { chromium } from 'playwright'

const browser = await chromium.launch()
const context = await browser.newContext()

const playword = new PlayWord(context, {
  aiOptions: {
    baseURL: 'https://...', // Custom API endpoint (If applicable)
    openAIApiKey: 'sk-...',
    model: 'gpt-4o' // If not specified, the default model is gpt-4o-mini.
  }
})

Google

1. Export the API key as an environment variable:

export GOOGLE_API_KEY="AI..."

2. Pass the API key as a parameter during initialization:

const playword = new PlayWord(context, {
  aiOptions: {
    googleApiKey: 'AI...',
    model: 'gemini-2.0-flash' // If not specified, the default model is gemini-2.0-flash-lite.
  }
})

Anthropic

Since Anthropic does not offer its own embeddings model, integrating Anthropic requires an additional API key for embeddings.

Currently, PlayWord supports the following providers for embeddings:

VoyageAI (officially recommended by Anthropic)
OpenAI
Google

1. Export API keys as environment variables:

export ANTHROPIC_API_KEY="sk-..."
export VOYAGEAI_API_KEY="pa-..."

2. Pass the API keys as parameters during initialization:

const playword = new PlayWord(context, {
  aiOptions: {
    anthropicApiKey: 'sk-...',
    voyageAIApiKey: 'pa-...',
    model: 'claude-3-7-sonnet-latest' // If not specified, the default model is claude-3-5-haiku-latest.
  }
})

📜 PlayWord Options

Name	Type	Default	Description
aiOptions	object	{}	Configuration options for the AI instance.
debug	boolean	false	Whether to enable debug mode.
delay	number	250	Delay between each step in milliseconds.
record	boolean \| string	false	Whether to record actions performed and where to save the recordings.

💬 Communicate with Browser

In its basic usage, you can use the say method to interact with the page.

No need to worry about locating elements or performing interactions—PlayWord handles all of that for you.

await playword.say('Navigate to https://www.google.com')

await playword.say('Type "Hello, World!" in the search bar')

await playword.say('Press enter')

✅ Assertion

PlayWord uses keywords to identify whether a step is an assertion. This approach ensures more stable results compared to relying solely on AI judgment.

Using PlayWord within Playwright Test

import { PlayWord } from '@playword/core'
import { expect, test } from '@playwright/test'

test('get started link', async ({ context }) => {
  const playword = new PlayWord(context, { debug: true, record: 'recordings/getStartLink.json' })

  await playword.say('go to https://playwright.dev/')
  await playword.say('click the link "Get started"')

  expect(await playword.say('Verify if the installation heading is visible')).toBe(true)
})

The input starting with any of the following case-insensitive keywords will be recognized as an assertion:

are
assert
assure
can
check
compare
confirm
could
did
do
does
ensure
expect
guarantee
has
have
is
match
satisfy
shall
should
test
then
was
were
validate
verify

🖼️ Frame Handling

To interact with elements inside frames, simply instruct PlayWord to switch to the desired frame.

await playword.say('Go to https://iframetester.com')

await playword.say('Type "https://www.saucedemo.com" in the URL field')

await playword.say('Click the render button')

await playword.say('Switch to the frame with the url "https://www.saucedemo.com"')

// Perform actions inside the frame
await playword.say('Type standard_user into the username field')

🔧 Custom Variables

Hardcoding sensitive information in your test cases is not a good practice. Instead, use custom variables with the syntax {VARIABLE_NAME} and define them in your environment settings.

# .env
USERNAME=standard_user
PASSWORD=secret_sauce

// Load environment variables
import 'dotenv/config'

// {USERNAME} and {PASSWORD} will be replaced with the values from the environment
await playword.say('Input {USERNAME} in the username field')
await playword.say('Input {PASSWORD} in the password field')

🔴 Recordings

PlayWord supports recording test executions and replaying them later for efficient and consistent testing.

// Save recordings to the default path (.playword/recordings.json)
const playword = new PlayWord(context, { record: true })

// Save recordings to a custom path (Must be `.json`)
const playword = new PlayWord(context, { record: 'spec/test-shopping-cart.json' })

If recordings are available, PlayWord prioritizes using them to execute tests, reducing the need to consume API tokens.

If a recorded action fails, PlayWord automatically retries it using AI.

✨ Using AI during Playback

To ensure PlayWord uses AI for specific steps during playback, start the input with [AI].

await playword.say('[AI] click the "Login" button')

await playword.say('[AI] verify the URL matches "https://www.saucedemo.com/inventory.html"')

🖥️ Observer

The Observer module tracks user interactions on web pages and swiftly generates accurate test steps using AI.

Upon activation, Playwright injects the Observer UI into every launched browser webpage. As you manually interact with the page, the AI interprets your actions, generates corresponding test steps, and records action details.

✨ Observer Features

The Observer provides several controls to manage and interact with your test recordings:

Accept: Add test steps to the recording. (Can also be invoked by pressing the a key)
Cancel: Skip test steps without adding them to the recording. (Can also be invoked by pressing the c key)
Preview: View the test steps recorded so far.
Clear: Delete recorded test steps.
Dry Run: Trial-run the recorded test steps. (Can press the esc key to stop the dry-run process)

And it captures various user interactions on the webpage as follows:

Click: Triggered when an element on the webpage is clicked.
Hover: Triggered when hovering over an element for more than three seconds
Input: Triggered after entering content into an input field and then clicking the input field again.
Navigate: Triggered when the page navigates to a new URL or is refreshed.
Select: Triggered after selecting an option from a dropdown menu.

For complex actions and assertions that the Observer cannot directly record, you can manually edit the step descriptions, enabling the AI to accurately capture your intentions.

📘 Getting Started with Observer

To start using the Observer, create a PlayWord instance in headed mode, pass it to the Observer, and initiate observation with Playwright.

import { chromium } from 'playwright'
import { Observer, PlayWord } from '@playword/core'

const browser = await chromium.launch({ headless: false /** Enable headed mode */ })
const context = await browser.newContext()

const playword = new PlayWord(context)
const observer = new Observer(playword, {
  delay: 500,
  recordPath: 'spec/test-login.json'
})

// Start the Observer
await observer.observe()

// Open a new page to observe
await context.newPage()

📜 Observer Options

Name	Type	Default	Description
delay	number	250	Delay between each step in milliseconds during the dry-run process.
recordPath	string	.playword/recordings.json	Where to save the recordings. (Must be `.json`)

🌟 Why use PlayWord?

Aspect	Traditional Testing	PlayWord
Dev Experience	Locating elements is very frustrating.	AI takes care of locating elements. Say goodbye to locators.
Dev Speed	Time is needed for writing both test cases and code.	Test cases serve both as documentation and executable tests.
Maintainance	High maintenance cost due to UI changes.	AI-powered adaption to UI changes.
Learning Curve	Requires knowledge of testing frameworks and tools.	Just use natural language to execute tests.

📜 Supported Actions in PlayWord and PlayWord Observer

Page Actions

Click on an element
Go to a specific URL
Hover over an element
Press a key or keys
Scroll in a specific direction (top, bottom, up, down)
Select an option from a select element
Sleep for a specific duration in milliseconds
Switch to a specific frame
Switch to other pages
Type text into an input field or textarea
Wait for text to appear on the page

Assertion

Check if an element contains specific text
Check if an element does not contain specific text
Check if an element content is equal to specific text
Check if an element content is not equal to specific text
Check if an element is visible
Check if an element is not visible
Check if the page contains specific text
Check if the page does not contain specific text
Check if the page title is equal to specific text
Check if the page URL matches specific RegExp patterns

Enjoy PlayWord and stay tuned for more features in future releases! 🚀🎉

For Tasks:

Click tags to check more tools for each tasks

interact with web pages record and dry-run test steps handle assertions manage frames track user interactions

For Jobs:

automation engineer quality assurance analyst software developer in test test automation architect web application tester

Alternative AI tools for playword

Similar Open Source Tools

playword

github

: 52

pgai

pgai simplifies the process of building search and Retrieval Augmented Generation (RAG) AI applications with PostgreSQL. It brings embedding and generation AI models closer to the database, allowing users to create embeddings, retrieve LLM chat completions, reason over data for classification, summarization, and data enrichment directly from within PostgreSQL in a SQL query. The tool requires an OpenAI API key and a PostgreSQL client to enable AI functionality in the database. Users can install pgai from source, run it in a pre-built Docker container, or enable it in a Timescale Cloud service. The tool provides functions to handle API keys using psql or Python, and offers various AI functionalities like tokenizing, detokenizing, embedding, chat completion, and content moderation.

github

: 4.6k

RainbowGPT

RainbowGPT is a versatile tool that offers a range of functionalities, including Stock Analysis for financial decision-making, MySQL Management for database navigation, and integration of AI technologies like GPT-4 and ChatGlm3. It provides a user-friendly interface suitable for all skill levels, ensuring seamless information flow and continuous expansion of emerging technologies. The tool enhances adaptability, creativity, and insight, making it a valuable asset for various projects and tasks.

github

: 86

runpod-worker-comfy

runpod-worker-comfy is a serverless API tool that allows users to run any ComfyUI workflow to generate an image. Users can provide input images as base64-encoded strings, and the generated image can be returned as a base64-encoded string or uploaded to AWS S3. The tool is built on Ubuntu + NVIDIA CUDA and provides features like built-in checkpoints and VAE models. Users can configure environment variables to upload images to AWS S3 and interact with the RunPod API to generate images. The tool also supports local testing and deployment to Docker hub using Github Actions.

github

: 412

wanda

Official PyTorch implementation of Wanda (Pruning by Weights and Activations), a simple and effective pruning approach for large language models. The pruning approach removes weights on a per-output basis, by the product of weight magnitudes and input activation norms. The repository provides support for various features such as LLaMA-2, ablation study on OBS weight update, zero-shot evaluation, and speedup evaluation. Users can replicate main results from the paper using provided bash commands. The tool aims to enhance the efficiency and performance of language models through structured and unstructured sparsity techniques.

github

: 560

rag-chatbot

The RAG ChatBot project combines Lama.cpp, Chroma, and Streamlit to build a Conversation-aware Chatbot and a Retrieval-augmented generation (RAG) ChatBot. The RAG Chatbot works by taking a collection of Markdown files as input and provides answers based on the context provided by those files. It utilizes a Memory Builder component to load Markdown pages, divide them into sections, calculate embeddings, and save them in an embedding database. The chatbot retrieves relevant sections from the database, rewrites questions for optimal retrieval, and generates answers using a local language model. It also remembers previous interactions for more accurate responses. Various strategies are implemented to deal with context overflows, including creating and refining context, hierarchical summarization, and async hierarchical summarization.

github

: 194

EasyInstruct

EasyInstruct is a Python package proposed as an easy-to-use instruction processing framework for Large Language Models (LLMs) like GPT-4, LLaMA, ChatGLM in your research experiments. EasyInstruct modularizes instruction generation, selection, and prompting, while also considering their combination and interaction.

github

: 381

ResumeFlow

ResumeFlow is an automated system that leverages Large Language Models (LLMs) to streamline the job application process. By integrating LLM technology, the tool aims to automate various stages of job hunting, making it easier for users to apply for jobs. Users can access ResumeFlow as a web tool, install it as a Python package, or download the source code from GitHub. The tool requires Python 3.11.6 or above and an LLM API key from OpenAI or Gemini Pro for usage. ResumeFlow offers functionalities such as generating curated resumes and cover letters based on job URLs and user's master resume data.

github

: 93

DevDocs

DevDocs is a platform designed to simplify the process of digesting technical documentation for software engineers and developers. It automates the extraction and conversion of web content into markdown format, making it easier for users to access and understand the information. By crawling through child pages of a given URL, DevDocs provides a streamlined approach to gathering relevant data and integrating it into various tools for software development. The tool aims to save time and effort by eliminating the need for manual research and content extraction, ultimately enhancing productivity and efficiency in the development process.

github

: 469

MetaGPT

MetaGPT is a multi-agent framework that enables GPT to work in a software company, collaborating to tackle more complex tasks. It assigns different roles to GPTs to form a collaborative entity for complex tasks. MetaGPT takes a one-line requirement as input and outputs user stories, competitive analysis, requirements, data structures, APIs, documents, etc. Internally, MetaGPT includes product managers, architects, project managers, and engineers. It provides the entire process of a software company along with carefully orchestrated SOPs. MetaGPT's core philosophy is "Code = SOP(Team)", materializing SOP and applying it to teams composed of LLMs.

github

: 51.4k

giskard

Giskard is an open-source Python library that automatically detects performance, bias & security issues in AI applications. The library covers LLM-based applications such as RAG agents, all the way to traditional ML models for tabular data.

github

: 4.4k

laravel-ai-translator

Laravel AI Translator is a powerful tool designed to streamline the localization process in Laravel projects. It automates the task of translating strings across multiple languages using advanced AI models like GPT-4 and Claude. The tool supports custom language styles, preserves variables and nested structures, and ensures consistent tone and style across translations. It integrates seamlessly with Laravel projects, making internationalization easier and more efficient. Users can customize translation rules, handle large language files efficiently, and validate translations for accuracy. The tool offers contextual understanding, linguistic precision, variable handling, smart length adaptation, and tone consistency for intelligent translations.

github

: 160

AirConnect-Synology

AirConnect-Synology is a minimal Synology package that allows users to use AirPlay to stream to UPnP/Sonos & Chromecast devices that do not natively support AirPlay. It is compatible with DSM 7.0 and DSM 7.1, and provides detailed information on installation, configuration, supported devices, troubleshooting, and more. The package automates the installation and usage of AirConnect on Synology devices, ensuring compatibility with various architectures and firmware versions. Users can customize the configuration using the airconnect.conf file and adjust settings for specific speakers like Sonos, Bose SoundTouch, and Pioneer/Phorus/Play-Fi.

github

: 303

TaxHacker

github

: 230

any-parser

AnyParser provides an API to accurately extract unstructured data (e.g., PDFs, images, charts) into a structured format. Users can set up their API key, run synchronous and asynchronous extractions, and perform batch extraction. The tool is useful for extracting text, numbers, and symbols from various sources like PDFs and images. It offers flexibility in processing data and provides immediate results for synchronous extraction while allowing users to fetch results later for asynchronous and batch extraction. AnyParser is designed to simplify data extraction tasks and enhance data processing efficiency.

github

: 129

tts-generation-webui

TTS Generation WebUI is a comprehensive tool that provides a user-friendly interface for text-to-speech and voice cloning tasks. It integrates various AI models such as Bark, MusicGen, AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, and MAGNeT. The tool offers one-click installers, Google Colab demo, videos for guidance, and extra voices for Bark. Users can generate audio outputs, manage models, caches, and system space for AI projects. The project is open-source and emphasizes ethical and responsible use of AI technology.

github

: 1.6k

For similar tasks

playword

github

: 52

AutoNode

AutoNode is a self-operating computer system designed to automate web interactions and data extraction processes. It leverages advanced technologies like OCR (Optical Character Recognition), YOLO (You Only Look Once) models for object detection, and a custom site-graph to navigate and interact with web pages programmatically. Users can define objectives, create site-graphs, and utilize AutoNode via API to automate tasks on websites. The tool also supports training custom YOLO models for object detection and OCR for text recognition on web pages. AutoNode can be used for tasks such as extracting product details, automating web interactions, and more.

github

: 116

For similar jobs

aiscript

AiScript is a lightweight scripting language that runs on JavaScript. It supports arrays, objects, and functions as first-class citizens, and is easy to write without the need for semicolons or commas. AiScript runs in a secure sandbox environment, preventing infinite loops from freezing the host. It also allows for easy provision of variables and functions from the host.

github

: 201

askui

AskUI is a reliable, automated end-to-end automation tool that only depends on what is shown on your screen instead of the technology or platform you are running on.

github

: 83

bots

The 'bots' repository is a collection of guides, tools, and example bots for programming bots to play video games. It provides resources on running bots live, installing the BotLab client, debugging bots, testing bots in simulated environments, and more. The repository also includes example bots for games like EVE Online, Tribal Wars 2, and Elvenar. Users can learn about developing bots for specific games, syntax of the Elm programming language, and tools for memory reading development. Additionally, there are guides on bot programming, contributing to BotLab, and exploring Elm syntax and core library.

github

: 179

ain

Ain is a terminal HTTP API client designed for scripting input and processing output via pipes. It allows flexible organization of APIs using files and folders, supports shell-scripts and executables for common tasks, handles url-encoding, and enables sharing the resulting curl, wget, or httpie command-line. Users can put things that change in environment variables or .env-files, and pipe the API output for further processing. Ain targets users who work with many APIs using a simple file format and uses curl, wget, or httpie to make the actual calls.

github

: 592

LaVague

LaVague is an open-source Large Action Model framework that uses advanced AI techniques to compile natural language instructions into browser automation code. It leverages Selenium or Playwright for browser actions. Users can interact with LaVague through an interactive Gradio interface to automate web interactions. The tool requires an OpenAI API key for default examples and offers a Playwright integration guide. Contributors can help by working on outlined tasks, submitting PRs, and engaging with the community on Discord. The project roadmap is available to track progress, but users should exercise caution when executing LLM-generated code using 'exec'.

github

: 5.8k

robocorp

Robocorp is a platform that allows users to create, deploy, and operate Python automations and AI actions. It provides an easy way to extend the capabilities of AI agents, assistants, and copilots with custom actions written in Python. Users can create and deploy tools, skills, loaders, and plugins that securely connect any AI Assistant platform to their data and applications. The Robocorp Action Server makes Python scripts compatible with ChatGPT and LangChain by automatically creating and exposing an API based on function declaration, type hints, and docstrings. It simplifies the process of developing and deploying AI actions, enabling users to interact with AI frameworks effortlessly.

github

: 501

Open-Interface

Open Interface is a self-driving software that automates computer tasks by sending user requests to a language model backend (e.g., GPT-4V) and simulating keyboard and mouse inputs to execute the steps. It course-corrects by sending current screenshots to the language models. The tool supports MacOS, Linux, and Windows, and requires setting up the OpenAI API key for access to GPT-4V. It can automate tasks like creating meal plans, setting up custom language model backends, and more. Open Interface is currently not efficient in accurate spatial reasoning, tracking itself in tabular contexts, and navigating complex GUI-rich applications. Future improvements aim to enhance the tool's capabilities with better models trained on video walkthroughs. The tool is cost-effective, with user requests priced between $0.05 - $0.20, and offers features like interrupting the app and primary display visibility in multi-monitor setups.

github

: 934

AI-Case-Sorter-CS7.1

AI-Case-Sorter-CS7.1 is a project focused on building a case sorter using machine vision and machine learning AI to sort cases by headstamp. The repository includes Arduino code and 3D models necessary for the project.

github

: 67