midscene

midscene

Let AI be your browser operator.

Stars: 5179

Visit
 screenshot

Midscene.js is an AI-powered automation SDK that allows users to control web pages, perform assertions, and extract data in JSON format using natural language. It offers features such as natural language interaction, understanding UI and providing responses in JSON, intuitive assertion based on AI understanding, compatibility with public multimodal LLMs like GPT-4o, visualization tool for easy debugging, and a brand new experience in automation development.

README:

Midscene.js

Midscene.js

English | įŽ€äŊ“中文

Let AI be your browser operator.

npm version huagging face model downloads License discord twitter

Midscene.js lets AI be your browser operator 🤖.Just describe what you want to do in natural language, and it will help you operate web pages, validate content, and extract data. Whether you want a quick experience or deep development, you can get started easily.

Showcases

The following recorded example video is based on the UI-TARS 7B SFT model, and the video has not been sped up at all~

Instruction Video
Post a Tweet
Use JS code to drive task orchestration, collect information about Jay Chou's concert, and write it into Google Docs

đŸ“ĸ New open-source model choice - UI-TARS

From version v0.10.0, we support a new open-source model named UI-TARS. Read more about it in Choose a model.

💡 Features

  • Natural Language Interaction 👆: Just describe your goals and steps, and Midscene will plan and operate the user interface for you.
  • Chrome Extension Experience đŸ–Ĩī¸: Start experiencing immediately through the Chrome extension, no coding required.
  • Puppeteer/Playwright Integration 🔧: Supports Puppeteer and Playwright integration, allowing you to combine AI capabilities with these powerful automation tools for easy automation.
  • Support Private Deployment 🤖: Supports private deployment of UI-TARS model, which outperforms closed-source models like GPT-4o and Claude in UI automation scenarios while better protecting data security.
  • Support General Models 🌟: Supports general large models like GPT-4o and Claude, adapting to various scenario needs.
  • Visual Reports for Debugging 🎞ī¸: Through our test reports and Playground, you can easily understand, replay and debug the entire process.
  • Completely Open Source đŸ”Ĩ: Experience a whole new automation development experience, enjoy!
  • Understand UI, JSON Format Responses 🔍: You can specify data format requirements and receive responses in JSON format.
  • Intuitive Assertions 🤔: Express your assertions in natural language, and AI will understand and process them.

✨ Model Choices

  • You can use general-purpose LLMs like gpt-4o, it works well for most cases. And also, gemini-1.5-pro, qwen-vl-max-latest are supported.
  • You can also use UI-TARS model, which is an open-source model dedicated for UI automation. You can deploy it on your own server, and it will dramatically improve the performance and data privacy.
  • Read more about Choose a model

👀 Comparing to ...

There are so many UI automation tools out there, and each one seems to be all-powerful. What's special about Midscene.js?

  • Debugging Experience: You will soon find that debugging and maintaining automation scripts is the real challenge point. No matter how magic the demo is, you still need to debug the process to make it stable over time. Midscene.js offers a visualized report file, a built-in playground, and a Chrome Extension to debug the entire process. This is what most developers really need. And we're continuing to work on improving the debugging experience.

  • Open Source, Free, Deploy as you want: Midscene.js is an open-source project. It's decoupled from any cloud service and model provider, you can choose either public or private deployment. There is always a suitable plan for your business.

  • Integrate with Javascript: You can always bet on Javascript 😎

📄 Resources

🤝 Community

Citation

If you use Midscene.js in your research or project, please cite:

@software{Midscene.js,
  author = {Zhou, Xiao and Yu, Tao},
  title = {Midscene.js: Let AI be your browser operator.},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/web-infra-dev/midscene}
}

📝 License

Midscene.js is MIT licensed.


If this project helps you or inspires you, please give us a ⭐ī¸

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for midscene

Similar Open Source Tools

For similar tasks

For similar jobs