Whimbox

奇想盒Whimbox，一个基于大语言模型和图像识别技术的游戏AI智能体，带给你全新的游戏体验！

Stars: 72

Visit

Whimbox is a game AI agent based on large language models and image recognition technology, providing users with a new gaming experience. It automates daily tasks such as mining, material collection, and wish checking, as well as features like route recording, image recognition, and AI dialogue. The tool does not modify game files or memory, only captures screenshots and simulates mouse and keyboard actions. It is designed for games running in a 1920x1080 windowed mode on mid to high-end PCs, with plans for future cloud gaming support. Whimbox is grateful to open-source projects like GIA and BetterGI, as well as AI models and programming tools like chatgpt and cursor. Developers interested in contributing to the project can join the development community and explore various functionalities that need development and adaptation.

README:

~~不会画画，先放个红温星凑合一下~~

Whimbox(奇想盒)

Whimbox，一个基于大语言模型和图像识别技术的游戏AI智能体，带给你全新的游戏体验！

运行

⚠️目前项目仍在开发阶段，只建议有python开发能力的用户使用。

安装依赖（需要python3.12）

开发者建议手动安装依赖

pip install -r requirements.txt
# 安装paddleocr运行环境（可选，目前默认使用rapidocr，也可以不装）
python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

其他用户可运行自动安装脚本setup_env.bat

创建配置文件

将config目录下的config_example.ini重命名为config.ini
修改Agent下的配置项修改为自己的大模型api（只要是openai格式的都可以）

创建提示词

将config目录下的prompt_example.txt重命名为prompt.txt
按自己喜好添加提示词，也可以不修改

打开游戏，将游戏设置为窗口模式，分辨率1920*1080

开发者请用管理员权限运行ide，并运行whimbox.py
其他用户可用管理员权限运行一键启动脚本run.bat

程序启动后请稍等片刻。在游戏界面的左侧看到📦图标后，按/打开对话框，按esc关闭对话框

已有功能

每日任务
- 自动美鸭梨挖掘
- 自动素材激化幻境
- 自动检查朝夕心愿
自动跑图
- 跑图路线录制、编辑
- 自动跑图（暂时只支持大世界和星海）
- 自动采集
AI对话
- 通过自然语言编排以上所有功能
- 随时中断任务

未来计划

框架完善：回退机制、重试机制。
多地图适配
自动战斗、钓鱼、捕虫、清洁
自动弹琴（我必须立刻演奏春日影！）
家园适配
单独的启动器

注意事项

Whimbox不会修改游戏文件、读写游戏内存，只会截图和模拟鼠标键盘，理论上不会被封号。但游戏的用户条款非常完善，涵盖了所有可能出现的情况。所以使用Whimbox导致的一切后果请自行承担。
由于游戏本身已经消耗PC的大量性能，图像识别还会额外消耗性能，所以目前仅支持中高配PC运行，正式发布后会推出云游戏版本。
Whimbox目前仅支持1920x1080窗口化运行的游戏。

致谢

感谢各个大世界游戏开源项目的先行者，供Whimbox学习参考。

感谢chatgpt、cursor、claude等各种AI模型和AI编程工具

加入开发

目前项目仅完成了基本框架的验证，还有大量功能需要开发和适配。如果你对此感兴趣，欢迎加入一起研究。开发Q群：821908945。

项目结构

Whinbox/
├── assets/                          
│   ├── imgs/                     # 图像资源
│   │   ├── Game/                 # 游戏解包素材
│   │   ├── Maps/                 # 地图相关资源
│   │   ├── Windows/              # 游戏UI截图
│   ├── paths/                    # 自动寻路脚本
│   └── PPOCRModels/              # OCR模型文件
├── source/                        
│   ├── action/                   # 动作模块（拾取、钓鱼、战斗等等）
│   ├── api/                      # ocr，yolo等第三方模型
│   ├── common/                   # 公共模块（日志、工具等等）
│   ├── config/                   # 配置模块
│   ├── dev_tool/                 # 开发工具
│   ├── ingame_ui/                # 游戏内聊天框
│   ├── interaction/              # 交互核心模块（截图、操作）
│   ├── map/                      # 地图模块（小地图识别，大地图操作）
│   ├── task/                     # 任务模块（各种功能脚本，供mcp调用）
│   │   ├── daily_task/           # 各种日常任务的脚本
│   │   └── navigation_task/      # 自动寻路脚本
│   ├── ui/                       # 游戏UI模块（页面、UI）
│   ├── view_and_move/            # 视角和移动模块
│   ├── mcp_agent.py              # 大模型agent
│   └── mcp_server.py             # MCP服务器
├── config/                       # 配置文件
│   ├── config.ini                # 程序的配置文件
│   └── prompt.txt                # 大模型提示词
├── Logs/                         # 日志文件
├── whimbox.py                    # 主程序入口

MCP工具编写

可参考source\task\daily_task内的几个task，并在source\mcp_server.py中注册，就能被大模型调用。

自动跑图路线录制

详情请查看如何录制和编辑跑图路线

For Tasks:

Click tags to check more tools for each tasks

automate mining route recording ai dialogue control automate combat automate fishing

For Jobs:

game developer ai engineer software engineer game tester ai researcher

Alternative AI tools for Whimbox

Similar Open Source Tools

Whimbox

github

: 72

private-llm-qa-bot

This is a production-grade knowledge Q&A chatbot implementation based on AWS services and the LangChain framework, with optimizations at various stages. It supports flexible configuration and plugging of vector models and large language models. The front and back ends are separated, making it easy to integrate with IM tools (such as Feishu).

github

: 262

ai-factory

AI Factory is a CLI tool and skill system that streamlines AI-powered development by handling context setup, skill installation, and workflow configuration. It supports multiple AI coding agents, offers spec-driven development, and integrates with popular tech stacks like Next.js, Laravel, Django, and Express. The tool ensures zero configuration, best practices adherence, community skills utilization, and multi-agent support. Users can create plans, tasks, and commits for structured feature development, bug fixes, and self-improvement. Security is a priority with mandatory two-level scans for external skills. The tool's learning loop generates patches from bug fixes to enhance future implementations.

github

: 83

AiToEarn

AiToEarn is a one-click publishing tool for multiple self-media platforms such as Douyin, Xiaohongshu, Video Number, and Kuaishou. It allows users to publish videos with ease, observe popular content across the web, and view rankings of explosive articles on Xiaohongshu. The tool is also capable of providing daily and weekly rankings of popular content on Xiaohongshu, Douyin, Video Number, and Kuaishou. In progress features include expanding publishing parameters to support short video e-commerce, adding an AI tool ranking list, enabling AI automatic comments, and AI comment search.

github

: 661

mimiclaw

MimiClaw is a pocket AI assistant that runs on a $5 chip, specifically designed for the ESP32-S3 board. It operates without Linux or Node.js, using pure C language. Users can interact with MimiClaw through Telegram, enabling it to handle various tasks and learn from local memory. The tool is energy-efficient, running on USB power 24/7. With MimiClaw, users can have a personal AI assistant on a chip the size of a thumb, making it convenient and accessible for everyday use.

github

: 175

claudex

Claudex is an open-source, self-hosted Claude Code UI that runs entirely on your machine. It provides multiple sandboxes, allows users to use their own plans, offers a full IDE experience with VS Code in the browser, and is extensible with skills, agents, slash commands, and MCP servers. Users can run AI agents in isolated environments, view and interact with a browser via VNC, switch between multiple AI providers, automate tasks with Celery workers, and enjoy various chat features and preview capabilities. Claudex also supports marketplace plugins, secrets management, integrations like Gmail, and custom instructions. The tool is configured through providers and supports various providers like Anthropic, OpenAI, OpenRouter, and Custom. It has a tech stack consisting of React, FastAPI, Python, PostgreSQL, Celery, Redis, and more.

github

: 197

Shannon

Shannon is a battle-tested infrastructure for AI agents that solves problems at scale, such as runaway costs, non-deterministic failures, and security concerns. It offers features like intelligent caching, deterministic replay of workflows, time-travel debugging, WASI sandboxing, and hot-swapping between LLM providers. Shannon allows users to ship faster with zero configuration multi-agent setup, multiple AI patterns, time-travel debugging, and hot configuration changes. It is production-ready with features like WASI sandbox, token budget control, policy engine (OPA), and multi-tenancy. Shannon helps scale without breaking by reducing costs, being provider agnostic, observable by default, and designed for horizontal scaling with Temporal workflow orchestration.

github

: 258

solo-server

Solo Server is a lightweight server designed for managing hardware-aware inference. It provides seamless setup through a simple CLI and HTTP servers, an open model registry for pulling models from platforms like Ollama and Hugging Face, cross-platform compatibility for effortless deployment of AI models on hardware, and a configurable framework that auto-detects hardware components (CPU, GPU, RAM) and sets optimal configurations.

github

: 225

mesh

MCP Mesh is an open-source control plane for MCP traffic that provides a unified layer for authentication, routing, and observability. It replaces multiple integrations with a single production endpoint, simplifying configuration management. Built for multi-tenant organizations, it offers workspace/project scoping for policies, credentials, and logs. With core capabilities like MeshContext, AccessControl, and OpenTelemetry, it ensures fine-grained RBAC, full tracing, and metrics for tools and workflows. Users can define tools with input/output validation, access control checks, audit logging, and OpenTelemetry traces. The project structure includes apps for full-stack MCP Mesh, encryption, observability, and more, with deployment options ranging from Docker to Kubernetes. The tech stack includes Bun/Node runtime, TypeScript, Hono API, React, Kysely ORM, and Better Auth for OAuth and API keys.

github

: 331

gpt-all-star

GPT-All-Star is an AI-powered code generation tool designed for scratch development of web applications with team collaboration of autonomous AI agents. The primary focus of this research project is to explore the potential of autonomous AI agents in software development. Users can organize their team, choose leaders for each step, create action plans, and work together to complete tasks. The tool supports various endpoints like OpenAI, Azure, and Anthropic, and provides functionalities for project management, code generation, and team collaboration.

github

: 125

aippt_PresentationGen

A SpringBoot web application that generates PPT files using a llm. The tool preprocesses single-page templates and dynamically combines them to generate PPTX files with text replacement functionality. It utilizes technologies such as SpringBoot, MyBatis, MySQL, Redis, WebFlux, Apache POI, Aspose Slides, OSS, and Vue2. Users can deploy the tool by configuring various parameters in the application.yml file and setting up necessary resources like MySQL, OSS, and API keys. The tool also supports integration with open-source image libraries like Unsplash for adding images to the presentations.

github

: 131

anythingllm-docs

anythingllm-docs is a documentation repository for the AnythingLLM project. It contains detailed guides, setup instructions, and information on features and legal aspects of the project. The repository structure is organized into public, pages, components, and configuration files. Users can contribute by creating issues and pull requests following specific guidelines. The project is licensed under the MIT License and has been migrated to NextJS with the help of @ShadowArcanist.

github

: 200

tinyclaw

TinyClaw is a lightweight wrapper around Claude Code that connects WhatsApp via QR code, processes messages sequentially, maintains conversation context, runs 24/7 in tmux, and is ready for multi-channel support. Its key innovation is the file-based queue system that prevents race conditions and enables multi-channel support. TinyClaw consists of components like whatsapp-client.js for WhatsApp I/O, queue-processor.js for message processing, heartbeat-cron.sh for health checks, and tinyclaw.sh as the main orchestrator with a CLI interface. It ensures no race conditions, is multi-channel ready, provides clean responses using claude -c -p, and supports persistent sessions. Security measures include local storage of WhatsApp session and queue files, channel-specific authentication, and running Claude with user permissions.

github

: 882

myclaw

myclaw is a personal AI assistant built on agentsdk-go that offers a CLI agent for single message or interactive REPL mode, full orchestration with channels, cron, and heartbeat, support for various messaging channels like Telegram, Feishu, WeCom, WhatsApp, and a web UI, multi-provider support for Anthropic and OpenAI models, image recognition and document processing, scheduled tasks with JSON persistence, long-term and daily memory storage, custom skill loading, and more. It provides a comprehensive solution for interacting with AI models and managing tasks efficiently.

github

: 127

memsearch

Memsearch is a tool that allows users to give their AI agents persistent memory in a few lines of code. It enables users to write memories as markdown and search them semantically. Inspired by OpenClaw's markdown-first memory architecture, Memsearch is pluggable into any agent framework. The tool offers features like smart deduplication, live sync, and a ready-made Claude Code plugin for building agent memory.

github

: 188

gin-vue-admin

Gin-vue-admin is a full-stack development platform based on Vue and Gin, integrating features like JWT authentication, dynamic routing, dynamic menus, Casbin authorization, form generator, code generator, etc. It provides various example files to help users focus more on business development. The project offers detailed documentation, video tutorials for setup and deployment, and a community for support and contributions. Users need a certain level of knowledge in Golang and Vue to work with this project. It is recommended to follow the Apache2.0 license if using the project for commercial purposes.

github

: 23.5k

For similar tasks

Whimbox

github

: 72

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 697

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k