
factorio-learning-environment
A non-saturating, open-ended environment for evaluating LLMs in Factorio
Stars: 154

Factorio Learning Environment is an open source framework designed for developing and evaluating LLM agents in the game of Factorio. It provides two settings: Lab-play with structured tasks and Open-play for building large factories. Results show limitations in spatial reasoning and automation strategies. Agents interact with the environment through code synthesis, observation, action, and feedback. Tools are provided for game actions and state representation. Agents operate in episodes with observation, planning, and action execution. Tasks specify agent goals and are implemented in JSON files. The project structure includes directories for agents, environment, cluster, data, docs, eval, and more. A database is used for checkpointing agent steps. Benchmarks show performance metrics for different configurations.
README:
Leaderboard | Paper | Website
An open source framework for developing and evaluating LLM agents in the game of Factorio.
Claude 3.5 plays Factorio
We provide two settings:
- Lab-play: 24 structured tasks with fixed resources.
- Open-play An unbounded task of building the largest possible factory on a procedurally generated map.
Our results demonstrate that models still lack strong spatial reasoning. In lab-play, we find that while LLMs exhibit promising short-horizon skills, they are unable to operate effectively in constrained environments, reflecting limitations in error analysis. In open-play, while LLMs discover automation strategies that improve growth (e.g electric-powered drilling), they fail to achieve complex automation (e.g electronic-circuit manufacturing).
- Factorio (version 1.1.110)
- Docker
- Python 3.10+
- Clone the repository:
git clone https://github.com/JackHopkins/factorio-learning-environment.git
cd src
pip install -e .
- Set up Factorio client:
- Purchase Factorio from the official website or on Steam.
- Downgrade to version 1.1.110:
- Steam: Right-click Factorio → Properties → Betas → Select 1.1.110
- Launch FLE Docker server:
# Start Docker daemon
sudo systemctl start docker
# Build Docker image
cd cluster/docker
docker build -t factorio .
# Run a single server
cd ../local
docker-compose -f docker-compose-1.yml up -d
- Activate server:
- Open Factorio client
- Navigate to Multiplayer
- Connect to
localhost:34197
(default) or your configured address in Docker.- You may disconnect from each server once it has been activated
- Run Eval:
First create the .env file. Note that API keys are only required for the respective model providers that will be used to run eval on
# model providers
OPENAI_API_KEY=<KEY>
ANTHROPIC_API_KEY=<KEY>
TOGETHER_API_KEY=<KEY>
OPEN_ROUTER_API_KEY=<KEY>
# If using Postgres DB, NOT REQUIRED (See section on Database)
SKILLS_DB_PORT=""
SKILLS_DB_NAME=""
SKILLS_DB_USER=""
SKILLS_DB_PASSWORD=""
# AWS credentials if wanting to use Cloudformation, NOT REQUIRED
AWS_SECRET_ACCESS_KEY=<KEY>
AWS_ACCESS_KEY_ID=""
AWS_DEFAULT_REGION=""
CLUSTER_NAME=""
Running open and lab play with example run configs:
- Open Play (one parallel run):
python eval/open/independent_runs/run.py --run_config=eval/open/independent_runs/run_config_example_open_play.json
- Tasks (one parallel run of iron-ore task):
python eval/open/independent_runs/run.py --run_config=eval/open/independent_runs/run_config_example_lab_play.json
FLE is an agent evaluation environment built on the game of Factorio, a popular resource management simulation game.
Agents interact with FLE by code synthesis through a REPL (Read-Eval-Print-Loop) pattern:
- Observation: The agent observes the world through the output streams (stderr/stdout) of their last program.
- Action: The agent generates a Python program to perform their desired action.
- Feedback: The environment executes the program, assigns variables, add classes/functions to the namespace, and provides an output stream.
Action
# 1. Get iron patch and place mining drill drill = place_entity( entity=Prototype.MiningDrill, position=nearest(Prototype.IronOre)), direction=Direction.NORTH ) # 2. Add output storage chest = place_entity_next_to( entity=Prototype.IronChest, reference_position=drill.drop_position, direction=Direction.SOUTH ) # 3. Verify automation chain and observe entities sleep(10) # Sleep for 10 seconds assert drill.status == EntityStatus.WORKING print(get_entities()) |
Feedback
>>> [ BurnerMiningDrill(fuel=Inventory({'coal': 4}), >>> name='burner-mining-drill', >>> direction=Direction.DOWN, >>> position=Position(x=-28.0, y=-61.0), >>> energy=2666.6666666667, >>> tile_dimensions=TileDimensions(tile_width=2.0, tile_height=2.0), >>> status=EntityStatus.WORKING, >>> neighbours=[Entity(name='iron-chest', direction=DOWN, position=Position(x=-27.5 y=-59.5)], >>> drop_position=Position(x=-27.5, y=-59.5), >>> resources=[Ingredient(name='iron-ore', count=30000, type=None)]), >>> Chest(name='iron-chest', >>> direction=Direction.UP, >>> position=Position(x=-27.5, y=-59.5), >>> energy=0.0, >>> tile_dimensions=TileDimensions(tile_width=1.0, tile_height=1.0), >>> status=EntityStatus.NORMAL, >>> inventory=Inventory({'iron-ore': 75}))] |
Agents are provided with the Python standard library, and an API comprising tools that they can use.
Tools are functions that perform a game action and return a typed object (e.g an Inventory), which can be stored as a named variable in the Python namespace for later use.
The namespace acts as an episodic symbolic memory system, and saved objects represent an observation of the environment at the moment of query.
This enables agents to maintain complex state representations and build hierarchical abstractions as the factories scale.
Agents observe stdout and stderr - the output streams of their program. Agents may intentionally choose to print relevant objects and computations to the output stream to construct observations.
Mistakes in the code or invalid operations raise typed exceptions with detailed context that is written to stderr.
This enables agents to reactively debug their programs after execution, and proactively use runtime assertions during execution to self-verify their actions.
Agents are able to enhance their internal representation of the game state by defining:
- Utility functions for reuse throughout an episode, to encapsulate previously successful logic
- Classes in the namespace to better organize the data retrieved from the game.
The Factorio Learning Environment provides a straightforward agent architecture for developing and evaluating AI models that can play Factorio.
Agents operate in episodes, with each step involving observation, planning, and action execution through Python code synthesis. The agent maintains state through a conversation history that includes its actions (assistant) and the stdout/stderr from the environment (user). At each step, agents generate Python code policies that are executed in the environment.
Agents live in agents
, and implement an abstract base class (AgentABC) that defines the core interface for interacting with the environment.
The abstract base class defines two methods that all agents must implement:
# Generates the next action based on conversation history and environment response (including score / achievements etc).
step(conversation: Conversation, response: Response) -> Policy:
# Handles cleanup when an episode terminates, i.e for reporting results etc.
end(conversation: Conversation, completion: CompletionState) -> None:
Each agent takes input a task (discussed in the next section) which specifies the goal of the agent.
Our default agent is BasicAgent
, which incorporates some basic mechanisms for managing context over long (+1000 step) runs:
- Every 32 steps, the all older interactions are summarised into a report in the system message.
- Conversations are clipped to remain under 200k characters (~87k tokens).
- We strip out all historical observations of game entities, as this both fills up the context, and confuses the agent.
We include some basic utilities for calling different LLMs (agents/utils/llm_factory.py
), for formatting the conversation history (agents/utils/formatters/conversation_formatter_abc.py
), and for parsing responses into valid Python (agents/utils/parse_response.py
)
# ./agents/minimal_agent.py
class MinimalAgent(AgentABC):
"""
This is a minimal Agent implementation, which takes the current conversation (including the most recent response)
and generates a simple Python code policy to execute the next step.
Note: This will blow up context length on longer runs, without some context pruning/management.
"""
def __init__(self, model, system_prompt, goal_description, *args, **kwargs):
system_prompt += f"\n\n### Goal\n{goal_description}\n\n"
super().__init__(model, system_prompt, *args, **kwargs)
self.llm_factory = LLMFactory(model)
@tenacity.retry(
retry=retry_if_exception_type(Exception),
wait=wait_exponential(multiplier=1, min=4, max=10)
)
async def step(self, conversation: Conversation, response: Response) -> Policy:
# Generate and return next policy
response = await self.llm_factory.acall(
messages=self.formatter.to_llm_messages(conversation),
n_samples=1, # We only need one program per iteration
temperature=self.generation_params.temperature,
max_tokens=self.generation_params.max_tokens,
model=self.generation_params.model,
)
# Parse LLM response into a Policy object
policy = parse_response(response)
if not policy:
raise Exception("Not a valid Python policy")
return policy
async def end(self, conversation: Conversation, completion: CompletionResult):
pass
Each agent is given a task
, which specifies the goal the agent will carry out in FLE. A task consists of a task object defining the core interface of the task category and a json file specifying the parameters of the task.
Tasks live in eval/tasks
, and implement an abstract base class in eval/tasks/task_abc.py
that defines the core interface for defining the task, setting up the environment and verifying success
The abstract base class defines three methods that all tasks must implement:
verify(self, score: float, step: int, instance: FactorioInstance, step_statistics: Dict) -> bool:
""" Return true if the task is completed"""
setup_instance(self, instance):
"""Code to provision the initial game state for the task environment"""
enhance_response_with_task_output(self, response: str, task_response: TaskResponse) -> str:
"""Add task specific information to the environment response if needed"""
We provide two default tasks:
- OpenPlayTask - Task for the open-play setting, where the agent plays the game until a specified number of steps is finished. The verify function will always return False
- ThroughputTask - Task for requiring the agent to build a factory that achieves a specified throughput in the holdout period. The verify function will return True if the holdout period throughput is above the threshold
The task jsons specifies the "task_type" and the "config" parameters. task_type
specifies the mapping from the json to the task type (the creation of task objects from the json is done in eval\tasks\task_factory.py
). config
specifies all required attributes to substantiate the respective task object. Each config must at minimum define the "goal_description", "trajectory_length" and "task_key" parameters.
Examples of task json
# Open play task json
{ "task_type": "default",
"config": {
"goal_description":"- Build the biggest possible factory\n- Maximise automation, efficiency and scale",
"trajectory_length": 5000,
"task_key": "open_play"
}
}
# One example of a throughput task json
{
"task_type": "throughput",
"config":
{"goal_description":"Create an automatic iron gear wheel factory that produces 16 iron gear wheel per 60 ingame seconds",
"throughput_entity":"iron-gear-wheel",
"quota":16,
"trajectory_length": 128,
"holdout_wait_period": 60,
"pre_holdout_wait_period": 60,
"task_key": "iron_gear_wheel_throughput_16"}
}
Example open play task object can be seen below. The throughput task object can be found here eval/tasks/throughput_task.py
class OpenPlayTask(TaskABC):
def __init__(self, trajectory_length, goal_description: str, task_key: str):
super().__init__(trajectory_length, starting_inventory = {}, goal_description=goal_description, task_key = task_key)
self.starting_game_state = None
def verify(self, score: float, instance: FactorioInstance, step_statistics: Dict) -> TaskResponse:
return TaskResponse(success = False,
meta = {})
def _to_dict(self) -> Dict[str, Any]:
return {
"goal_description": self.goal_description,
"trajectory_length": self.trajectory_length,
"starting_inventory": self.starting_inventory,
"initial_state": self.starting_game_state.to_raw() if self.starting_game_state else None,
}
def setup_instance(self, instance):
"""Code to provision the task environment"""
pass
The entrypoint to run tasks is eval\open\independent_runs\run.py
which reads in a run config json file, runs the tasks specified in parallel and saves each generated program with the environment output and task verification result into the database. The location of the run config json is sent in through the --run_config
inline argument. If no argument is sent, the default run config eval\open\independent_runs\run_config.json
is used.
The run config json is a list of dictionaries specifying the task_json location, model and version (optional). One example to run 3 tasks in parallel
[
{"task": "iron_gear_wheel_throughput_16.json",
"model": "gpt-4o-mini-2024-07-18",
"version": 768},
{"task": "plastic_bar_throughput_16.json",
"model": "anthropic/claude-3.5-sonnet-open-router"},
{"task": "open_play.json",
"model": "gpt-4o-mini-2024-07-18"}
]
Each task is run until either verify
returns True or the maximum number of steps (trajectory_length
) is reached
Agents interact with the game using tools, which represent a narrow API into the game.
Tools live in env/src/tools
, and are either admin
tools (non-agent accessible) or agent
tools (used by the agent).
A tool requires 3 files:
-
agent.md
: The agent documentation for the tool, including usage patterns, best practices and failure modes. -
client.py
: The client-side implementation, which is a Python class that can be invoked by the agent. -
server.lua
: The server-side implementation, which handles most of the logic and heavy lifting.
---
config:
layout: fixed
flowchart:
defaultRenderer:
elk
---
flowchart LR
A("fa:fa-comment-dots Agent")
subgraph s1["Learning Environment"]
B("fa:fa-code Interpreter")
n1("client.py")
end
subgraph s2["Factorio Server"]
E1["fa:fa-shapes server.lua"]
F("fa:fa-cog Factorio Engine")
end
A -- Synthesises Python --> B
B -- Invokes --> n1
n1 -. Exceptions .-> B
n1 -. Objects .-> B
n1 --Remote TCP Call--> E1
E1 -- Execute --> F
F-. Result .-> E1
E1 -. TCP Response .-> n1
B -. Observation .-> A
- Create a new directory in
env/src/tools/agent
, e.genv/src/tools/agent/my_tool
- Add a
client.py
file, which should contain a class inheritingTool
and implementing a__call__
function to treat the class as a callable function. The method signature should contain type annotations. This function must callself.execute
to invoke the server-side logic. - Add a
server.lua
file, containing a function structured likeglobal.actions.my_tool = function(arg1, arg2, ...)
. This file should invoke the Factorio API to perform the desired action, and return a table that will be serialized and sent back to the client. - Add an
agent.md
file, which should contain a markdown description of the tool. This file will be used by the agent to understand how to use the tool
Next time you run an eval, the tool will automatically be available to the agent and documented in the agent context.
- (Optional) Create a test suite in
env/tests/actions
for your new tool.
Tool | Description | Key Features |
---|---|---|
inspect_inventory |
Checks contents of player or entity inventories | - Supports various inventory types (chests, furnaces, etc.) - Returns Inventory object with count methods - Can query specific items |
insert_item |
Places items from player inventory into entities | - Works with machines, chests, belts - Validates item compatibility - Returns updated entity |
extract_item |
Removes items from entity inventories | - Supports all inventory types - Auto-transfers to player inventory - Returns quantity extracted |
place_entity |
Places entities in the world | - Handles direction and positioning - Validates placement requirements - Returns placed Entity object |
place_entity_next_to |
Places entities relative to others | - Automatic spacing/alignment - Handles entity dimensions - Supports all entity types |
pickup_entity |
Removes entities from the world | - Returns items to inventory - Handles entity groups - Supports all placeable items |
rotate_entity |
Changes entity orientation | - Affects entity behavior (e.g., inserter direction) - Validates rotation rules - Returns updated entity |
get_entity |
Retrieves entity objects at positions | - Updates stale references - Returns typed Entity objects - Handles all entity types |
get_entities |
Finds multiple entities in an area | - Supports filtering by type - Returns List[Entity] - Groups connected entities |
nearest |
Locates closest resources/entities | - Finds ores, water, trees - Returns Position object - 500 tile search radius |
get_resource_patch |
Analyzes resource deposits | - Returns size and boundaries - Supports all resource types - Includes total resource amount |
harvest_resource |
Gathers resources from the world | - Supports ores, trees, rocks - Auto-collects to inventory - Returns amount harvested |
connect_entities |
Creates connections between entities | - Handles belts, pipes, power - Automatic pathfinding - Returns connection group |
get_connection_amount |
Calculates required connection items | - Pre-planning tool - Works with all connection types - Returns item count needed |
set_entity_recipe |
Configures machine crafting recipes | - Works with assemblers/chemical plants - Validates recipe requirements - Returns updated entity |
get_prototype_recipe |
Retrieves crafting requirements | - Shows ingredients/products - Includes crafting time - Returns Recipe object |
craft_item |
Creates items from components | - Handles recursive crafting - Validates technology requirements - Returns crafted amount |
set_research |
Initiates technology research | - Validates prerequisites - Returns required ingredients - Handles research queue |
get_research_progress |
Monitors research status | - Shows remaining requirements - Tracks progress percentage - Returns ingredient list |
move_to |
Moves player to position | - Pathfinds around obstacles - Can place items while moving - Returns final position |
nearest_buildable |
Finds valid building locations | - Respects entity dimensions - Handles resource requirements - Returns buildable position |
sleep |
Pauses execution | - Waits for actions to complete - Adapts to game speed - Maximum 15 second duration |
launch_rocket |
Controls rocket silo launches | - Validates launch requirements - Handles launch sequence - Returns updated silo state |
print |
Outputs debug information to stdout | - Supports various object types - Useful for monitoring state - Returns formatted string |
Below is an overview of how the project is structured. Some directories also contain more detailed readmes.
factorio-learning-environment/
├── agents/ # Factorio Learning Environment
│ ├── utils/ # Some utilities for building an agent
│ ├── agent_abc.py # Abstract class to extend
│ └── basic_agent.py # Agent implementation we used for our experiments
├── env/ # Factorio Learning Environment
│ ├── src/ # Main implementation
│ │ ├── exceptions/ # Custom exceptions (WIP)
│ │ ├── gym/ # Gym environment wrapper (deprecated but possibly useful)
│ │ ├── lib/ # General purpose Lua utilities (e.g serialization etc)
│ │ ├── models/ # Core objects used during eval
│ │ ├── rcon/ # RCON wrapper for communicating with the game
│ │ ├── tools/ # Agent and admin tools
│ │ │ ├── admin/ # ~17 Tools for managing state, persistence, scoring etc
│ │ │ └── agent/ # ~27 Tools that the agent can use
│ │ ├── utils/ # Python utilities
│ │ ├── entities.py # Python object model of the game entities
│ │ ├── game_types.py # Technologies, Recipes, Resources
│ │ ├── instance.py # Environment state manager
│ │ └── namespace.py # Namespace the agent can read/write variables to.
│ └── tests/ # ~350 test cases
├── cluster/ # Everything needed to launch Factorio servers
│ ├── docker/ # Docker container definition of the Factorio server
│ │ ├── config/ # Factorio server configuration files
│ │ └── mods/ # Mods (deprecated)
│ ├── local/ # Tools for dynamically creating Docker Compose files for clusters
│ ├── remote/ # Tools for deploying Factorio clusters onto AWS
│ └── scenarios/ # Factorio scenarios for Lab-play and Open-play
│ ├── default_lab_scenario/
│ └── open_world/
├── data/ # Miscellaneous data
│ ├── blueprints_to_policies/ # Code to scrape Factorio blueprint sites and create Python policies
│ ├── icons/ # Icons for Factorio entities and items
│ ├── prompts/ # Prompts (deprecated)
│ ├── recipes/ # Factorio recipes in JSONL format
│ └── scripts/ # Misc Lua scripts (deprecated)
├── docs/ # Website
│ └── assets/ # Videos / Images
└── eval/
├── open/ # Implementations for running agents in the open game
│ ├── beam/ # Implementation for Beam sampling
│ ├── independent_runs/ # Implementation for independent eval runs
│ ├── mcts/ # Implementation for MCTS sampling
│ └── plots/ # Run results and plots
└── tasks # Implementations for running agents against lab-play tasks
├── task_definitions/ # JSON definition of task
├── task_abc.py # Abstract task definition
└── throughput_task.py # A basic task checking for a production throughput quota
To run long trajectories in FLE, we support checkpointing at every agent step using a SQL database. The db_client
implements the interface for saving and loading agent outputs, environment feedbacks, game states and histories of the current trajectory. We support out of the box Postgres and SQLite databases. The easiest way how to set up a FLE-compatible databse is to use SQLite and setup the programs table:
# create the db file
sqlite3 mydatabase.db
# create the programs table
CREATE TABLE programs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
code TEXT NOT NULL,
value REAL DEFAULT 0.0,
visits INTEGER DEFAULT 0,
parent_id INTEGER,
state_json TEXT,
conversation_json TEXT NOT NULL,
completion_token_usage INTEGER,
prompt_token_usage INTEGER,
token_usage INTEGER,
response TEXT,
holdout_value REAL,
raw_reward REAL,
version INTEGER DEFAULT 1,
version_description TEXT DEFAULT '',
model TEXT DEFAULT 'gpt-4o',
meta TEXT,
achievements_json TEXT,
instance INTEGER DEFAULT -1,
depth REAL DEFAULT 0.0,
advantage REAL DEFAULT 0.0,
ticks INTEGER DEFAULT 0,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
The SQLite database can then be instantiated to be used for tasks in the create_db_client
function at eval\open\independent_runs\trajectory_runner.py
.
We recommend setting up the database_file variable in the .env file
from eval.open.db_client import SQLliteDBClient
async def create_db_client() -> SQLliteDBClient:
"""Create database client with connection pool"""
return SQLliteDBClient(
max_conversation_length=40,
min_connections=2,
max_connections=5,
# Provide the SQLite database file path
database_file=os.getenv("SQLITE_DB_FILE") #"mydatabase.db"
)
We measured FLE execution performance across different configurations to measure performance. All benchmarks were run on a Macbook Pro M4 128GB, with 100 iterations per operation on a subset of the existing tools.
Executing tools against the Factorio server, while a Factorio game client is connected.
Operation | Operations/Min | Operations/Sec |
---|---|---|
place_entity_next_to | 2,578.20 | 42.97 |
place_entity | 12,057.63 | 200.96 |
move_to | 8,649.89 | 144.16 |
harvest_resource | 16,599.44 | 276.66 |
craft_item | 16,875.14 | 281.25 |
connect_entities | 1,664.70 | 27.74 |
rotate_entity | 12,281.31 | 204.69 |
insert_item | 13,044.42 | 217.41 |
extract_item | 17,167.43 | 286.12 |
inspect_inventory | 17,036.32 | 283.94 |
get_resource_patch | 7,004.49 | 116.74 |
Total | 7,513.29 | 125.22 |
Executing tools against the Factorio server without a game client.
Operation | Operations/Min | Operations/Sec |
---|---|---|
place_entity_next_to | 4,856.51 | 80.94 |
place_entity | 22,332.72 | 372.21 |
move_to | 16,005.59 | 266.76 |
harvest_resource | 32,727.01 | 545.45 |
craft_item | 36,223.63 | 603.73 |
connect_entities | 2,926.01 | 48.77 |
rotate_entity | 23,467.46 | 391.12 |
insert_item | 25,154.28 | 419.24 |
extract_item | 32,997.26 | 549.95 |
inspect_inventory | 28,401.56 | 473.36 |
get_resource_patch | 8,736.30 | 145.61 |
Total | 13,094.98 | 218.25 |
Executing tools as part of a Python policy string, while a Factorio game client is connected.
Operation | Operations/Min | Operations/Sec |
---|---|---|
place_entity_next_to | 4,714.52 | 78.58 |
place_entity | 4,774.13 | 79.57 |
move_to | 4,005.77 | 66.76 |
harvest_resource | 3,594.59 | 59.91 |
craft_item | 4,985.02 | 83.08 |
connect_entities | 1,497.11 | 24.95 |
rotate_entity | 4,914.69 | 81.91 |
insert_item | 5,046.99 | 84.12 |
extract_item | 4,743.08 | 79.05 |
inspect_inventory | 4,838.31 | 80.64 |
get_resource_patch | 2,593.11 | 43.22 |
Total | 3,639.10 | 60.65 |
Executing tools as part of a Python policy string, without a game client.
Operation | Operations/Min | Operations/Sec |
---|---|---|
place_entity_next_to | 5,069.60 | 84.49 |
place_entity | 5,238.61 | 87.31 |
move_to | 4,979.59 | 82.99 |
harvest_resource | 3,247.09 | 54.12 |
craft_item | 5,854.27 | 97.57 |
connect_entities | 2,150.21 | 35.84 |
rotate_entity | 5,370.21 | 89.50 |
insert_item | 5,065.89 | 84.43 |
extract_item | 5,449.07 | 90.82 |
inspect_inventory | 5,638.67 | 93.98 |
get_resource_patch | 2,479.41 | 41.32 |
Total | 4,103.53 | 68.39 |
-
Headless vs Client Performance: The headless server configuration consistently outperforms the client version, with direct API calls showing approximately 74% better throughput (218.25 vs 125.22 ops/sec).
-
Interpreter Overhead: Adding the interpreter layer introduces significant overhead:
- Headless: Drops from 218.25 to 68.39 ops/sec (~69% reduction)
- Client: Drops from 125.22 to 60.65 ops/sec (~52% reduction)
-
Operation Variability: Some operations show more significant performance variations:
-
connect_entities
is consistently the slowest operation across all configurations (because it relies on pathfinding) -
craft_item
andextract_item
tend to be among the fastest operations
-
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for factorio-learning-environment
Similar Open Source Tools

factorio-learning-environment
Factorio Learning Environment is an open source framework designed for developing and evaluating LLM agents in the game of Factorio. It provides two settings: Lab-play with structured tasks and Open-play for building large factories. Results show limitations in spatial reasoning and automation strategies. Agents interact with the environment through code synthesis, observation, action, and feedback. Tools are provided for game actions and state representation. Agents operate in episodes with observation, planning, and action execution. Tasks specify agent goals and are implemented in JSON files. The project structure includes directories for agents, environment, cluster, data, docs, eval, and more. A database is used for checkpointing agent steps. Benchmarks show performance metrics for different configurations.

AutoGPTQ
AutoGPTQ is an easy-to-use LLM quantization package with user-friendly APIs, based on GPTQ algorithm (weight-only quantization). It provides a simple and efficient way to quantize large language models (LLMs) to reduce their size and computational cost while maintaining their performance. AutoGPTQ supports a wide range of LLM models, including GPT-2, GPT-J, OPT, and BLOOM. It also supports various evaluation tasks, such as language modeling, sequence classification, and text summarization. With AutoGPTQ, users can easily quantize their LLM models and deploy them on resource-constrained devices, such as mobile phones and embedded systems.

vision-parse
Vision Parse is a tool that leverages Vision Language Models to parse PDF documents into beautifully formatted markdown content. It offers smart content extraction, content formatting, multi-LLM support, PDF document support, and local model hosting using Ollama. Users can easily convert PDFs to markdown with high precision and preserve document hierarchy and styling. The tool supports multiple Vision LLM providers like OpenAI, LLama, and Gemini for accuracy and speed, making document processing efficient and effortless.

last_layer
last_layer is a security library designed to protect LLM applications from prompt injection attacks, jailbreaks, and exploits. It acts as a robust filtering layer to scrutinize prompts before they are processed by LLMs, ensuring that only safe and appropriate content is allowed through. The tool offers ultra-fast scanning with low latency, privacy-focused operation without tracking or network calls, compatibility with serverless platforms, advanced threat detection mechanisms, and regular updates to adapt to evolving security challenges. It significantly reduces the risk of prompt-based attacks and exploits but cannot guarantee complete protection against all possible threats.

BetaML.jl
The Beta Machine Learning Toolkit is a package containing various algorithms and utilities for implementing machine learning workflows in multiple languages, including Julia, Python, and R. It offers a range of supervised and unsupervised models, data transformers, and assessment tools. The models are implemented entirely in Julia and are not wrappers for third-party models. Users can easily contribute new models or request implementations. The focus is on user-friendliness rather than computational efficiency, making it suitable for educational and research purposes.

Cherry_LLM
Cherry Data Selection project introduces a self-guided methodology for LLMs to autonomously discern and select cherry samples from open-source datasets, minimizing manual curation and cost for instruction tuning. The project focuses on selecting impactful training samples ('cherry data') to enhance LLM instruction tuning by estimating instruction-following difficulty. The method involves phases like 'Learning from Brief Experience', 'Evaluating Based on Experience', and 'Retraining from Self-Guided Experience' to improve LLM performance.

StableToolBench
StableToolBench is a new benchmark developed to address the instability of Tool Learning benchmarks. It aims to balance stability and reality by introducing features like Virtual API System, Solvable Queries, and Stable Evaluation System. The benchmark ensures consistency through a caching system and API simulators, filters queries based on solvability using LLMs, and evaluates model performance using GPT-4 with metrics like Solvable Pass Rate and Solvable Win Rate.

AIOS
AIOS, a Large Language Model (LLM) Agent operating system, embeds large language model into Operating Systems (OS) as the brain of the OS, enabling an operating system "with soul" -- an important step towards AGI. AIOS is designed to optimize resource allocation, facilitate context switch across agents, enable concurrent execution of agents, provide tool service for agents, maintain access control for agents, and provide a rich set of toolkits for LLM Agent developers.

Consistency_LLM
Consistency Large Language Models (CLLMs) is a family of efficient parallel decoders that reduce inference latency by efficiently decoding multiple tokens in parallel. The models are trained to perform efficient Jacobi decoding, mapping any randomly initialized token sequence to the same result as auto-regressive decoding in as few steps as possible. CLLMs have shown significant improvements in generation speed on various tasks, achieving up to 3.4 times faster generation. The tool provides a seamless integration with other techniques for efficient Large Language Model (LLM) inference, without the need for draft models or architectural modifications.

Qwen
Qwen is a series of large language models developed by Alibaba DAMO Academy. It outperforms the baseline models of similar model sizes on a series of benchmark datasets, e.g., MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, etc., which evaluate the models’ capabilities on natural language understanding, mathematic problem solving, coding, etc. Qwen models outperform the baseline models of similar model sizes on a series of benchmark datasets, e.g., MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, etc., which evaluate the models’ capabilities on natural language understanding, mathematic problem solving, coding, etc. Qwen-72B achieves better performance than LLaMA2-70B on all tasks and outperforms GPT-3.5 on 7 out of 10 tasks.

StableToolBench
StableToolBench is a new benchmark developed to address the instability of Tool Learning benchmarks. It aims to balance stability and reality by introducing features such as a Virtual API System with caching and API simulators, a new set of solvable queries determined by LLMs, and a Stable Evaluation System using GPT-4. The Virtual API Server can be set up either by building from source or using a prebuilt Docker image. Users can test the server using provided scripts and evaluate models with Solvable Pass Rate and Solvable Win Rate metrics. The tool also includes model experiments results comparing different models' performance.

TableLLM
TableLLM is a large language model designed for efficient tabular data manipulation tasks in real office scenarios. It can generate code solutions or direct text answers for tasks like insert, delete, update, query, merge, and chart operations on tables embedded in spreadsheets or documents. The model has been fine-tuned based on CodeLlama-7B and 13B, offering two scales: TableLLM-7B and TableLLM-13B. Evaluation results show its performance on benchmarks like WikiSQL, Spider, and self-created table operation benchmark. Users can use TableLLM for code and text generation tasks on tabular data.

floneum
Floneum is a graph editor that makes it easy to develop your own AI workflows. It uses large language models (LLMs) to run AI models locally, without any external dependencies or even a GPU. This makes it easy to use LLMs with your own data, without worrying about privacy. Floneum also has a plugin system that allows you to improve the performance of LLMs and make them work better for your specific use case. Plugins can be used in any language that supports web assembly, and they can control the output of LLMs with a process similar to JSONformer or guidance.

cambrian
Cambrian-1 is a fully open project focused on exploring multimodal Large Language Models (LLMs) with a vision-centric approach. It offers competitive performance across various benchmarks with models at different parameter levels. The project includes training configurations, model weights, instruction tuning data, and evaluation details. Users can interact with Cambrian-1 through a Gradio web interface for inference. The project is inspired by LLaVA and incorporates contributions from Vicuna, LLaMA, and Yi. Cambrian-1 is licensed under Apache 2.0 and utilizes datasets and checkpoints subject to their respective original licenses.

airdcpp-windows
AirDC++ for Windows 10/11 is a file sharing client with a focus on ease of use and performance. It is designed to provide a seamless experience for users looking to share and download files over the internet. The tool is built using Visual Studio 2022 and offers a range of features to enhance the file sharing process. Users can easily clone the repository to access the latest version and contribute to the development of the tool.

AiOS
AiOS is a tool for human pose and shape estimation, performing human localization and SMPL-X estimation in a progressive manner. It consists of body localization, body refinement, and whole-body refinement stages. Users can download datasets for evaluation, SMPL-X body models, and AiOS checkpoint. Installation involves creating a conda virtual environment, installing PyTorch, torchvision, Pytorch3D, MMCV, and other dependencies. Inference requires placing the video for inference and pretrained models in specific directories. Test results are provided for NMVE, NMJE, MVE, and MPJPE on datasets like BEDLAM and AGORA. Users can run scripts for AGORA validation, AGORA test leaderboard, and BEDLAM leaderboard. The tool acknowledges codes from MMHuman3D, ED-Pose, and SMPLer-X.
For similar tasks

factorio-learning-environment
Factorio Learning Environment is an open source framework designed for developing and evaluating LLM agents in the game of Factorio. It provides two settings: Lab-play with structured tasks and Open-play for building large factories. Results show limitations in spatial reasoning and automation strategies. Agents interact with the environment through code synthesis, observation, action, and feedback. Tools are provided for game actions and state representation. Agents operate in episodes with observation, planning, and action execution. Tasks specify agent goals and are implemented in JSON files. The project structure includes directories for agents, environment, cluster, data, docs, eval, and more. A database is used for checkpointing agent steps. Benchmarks show performance metrics for different configurations.

Odyssey
Odyssey is a framework designed to empower agents with open-world skills in Minecraft. It provides an interactive agent with a skill library, a fine-tuned LLaMA-3 model, and an open-world benchmark for evaluating agent capabilities. The framework enables agents to explore diverse gameplay opportunities in the vast Minecraft world by offering primitive and compositional skills, extensive training data, and various long-term planning tasks. Odyssey aims to advance research on autonomous agent solutions by providing datasets, model weights, and code for public use.

MinePal
MinePal is a Minecraft companion app with a React frontend, a local backend, and an AI agent. The frontend is built with React and Vite, the local backend APIs are in server.js, and the Minecraft agent logic is in src/agent/. Users can set up the frontend by installing dependencies and building it, refer to the backend repository for backend setup, and navigate to src/agent/ to access actions that the bot can take.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.