ComputerGYM

ComputerGYM

Foundation Model Training Using Human Demonstrations

Stars: 68

Visit
 screenshot

README:

Optexity Logo

Visit our website: optexity.com

Optexity Demo Video

▶️ Click the image above to watch Optexity demo: Trained Llama 3-8B Beats Gemini 2.0 Flash & GPT-4o on Software Automation



Optexity: Foundation Model Training Using Human Demonstrations

Overview

Optexity enables training foundation models using human demonstrations of computer tasks. This framework allows for recording, processing, and using demonstrations to train AI agents to complete web-based tasks. We will be adding training using self exploration using reinforement learning, training from software documentations and training using youtube videos in future.

Detailed Tutorial Videos

Explore our step-by-step video guides to get started with Optexity:

  1. Optexity Tutorial Part 1 | Introduction and State of Browser Agents for Software Use
  2. Optexity Tutorial Part 2 | Training AI with Human Demonstrations
  3. Optexity Tutorial Part 3 | AI Agent in Action!

Setup

  1. Repository Setup Clone the necessary repositories:

    mkdir optexity
    cd optexity
    git clone https://github.com/Optexity/ComputerGYM.git
    git clone https://github.com/Optexity/AgentAI.git
    git clone https://github.com/Optexity/playwright.git
  2. Environment Setup Create and activate a Conda environment with the required Python and Node.js versions:

    conda create -n optexity python=3.10 nodejs
    conda activate optexity
  3. Installing Dependencies Install the required packages and build the Playwright framework:

    pip install -e ComputerGym
    pip install -e AgentAI
    cd playwright
    git checkout playwright_optexity
    npm install
    npm run build
    playwright install
    cd ..

Testing Vanilla Gemini Directly(Optional)

To evaluate vanilla gemini 2.0 flash for a specific web task, execute:

EXPORT GEMINI_API_KEY=<YOUR_GEMINI_API_KEY>
python AgentAI/agentai/main.py --url "https://app.hubspot.com" --port 8000 --log_to_console --goal "change currency to SGD" --storage_state cache_dir/auth.json --model gemini

Next section shows you how to improve the performance of these agents on specific tasks.

Pro Tip: You can visit https://aistudio.google.com/apikey to create a free gemini api key to test out any task on any website.

Workflow

  1. Recording Demonstrations Record human demonstrations by creating a configuration file and running the demonstration script:

    ./ComputerGYM/computergym/demonstrations/demonstrate.sh ComputerGYM/computergym/demonstrations/demonstration_config.yaml

    Note: Create your own demonstration_config.yaml configuration file before running this script.

  2. Processing Demonstrations Process the recorded demonstrations to prepare them for training:

    python ComputerGYM/computergym/demonstrations/process_demonstration.py --yaml ComputerGYM/computergym/demonstrations/demonstration_config.yaml --seed 5
  3. Generating Training Data Convert processed demonstrations into a format suitable for model training:

    python AgentAI/agentai/sft/prepare_training_data.py --agent_config AgentAI/agentai/train_configs/hubspot_agent.yaml
  4. Training the Model Our data preparation scripts generate JSON data in a format compatible with LLaMA-Factory. The generated training and inference configurations are stored in the train_data directory. Please refer to the LLaMA-Factory documentation for detailed instructions on model training.

  5. Evaluating the Trained Agent After training your model, deploy it as an inference service on http://localhost:8000. By default, our framework is configured to work with the vLLM serving capability provided by LLaMA-Factory. If you're using an alternative serving method, you'll need to modify the appropriate scripts.

    To evaluate your trained agent on a specific web task, execute:

    python AgentAI/agentai/main.py --url "https://app.hubspot.com" --port 8000 --log_to_console --goal "change currency to SGD" --storage_state cache_dir/auth.json --model vllm

Documentation

For comprehensive information on configuration options and advanced usage patterns, please refer to the detailed documentation available in each repository:

  • ComputerGYM: Environment setup, demonstration recording, and processing
  • AgentAI: Model training configurations, inference settings, and evaluation metrics
  • Playwright Integration: Custom extensions and modifications for web automation

Configuration References

  • Demonstration configuration: See ComputerGYM/computergym/demonstrations/demonstration_config_example.yaml
  • Training parameters: See AgentAI/agentai/train_configs/README.md

Acknowledgements

This project builds upon and extends the work of:

  • BrowserGym - For the browser automation environment foundation
  • Playwright - For reliable web testing and automation capabilities
  • LLaMA-Factory - For efficient foundation model fine-tuning

Community & Support

  • Report issues on GitHub
  • Follow us on Twitter for the latest updates

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for ComputerGYM

Similar Open Source Tools

For similar tasks

No tools available

For similar jobs

No tools available