plexe
β¨ Build a machine learning model from a prompt
Stars: 2522
Plexe is a tool that allows users to create machine learning models by describing them in plain language. Users can explain their requirements, provide a dataset, and the AI-powered system will build a fully functional model through an automated agentic approach. It supports multiple AI agents and model building frameworks like XGBoost, CatBoost, and Keras. Plexe also provides Docker images with pre-configured environments, YAML configuration for customization, and support for multiple LiteLLM providers. Users can visualize experiment results using the built-in Streamlit dashboard and extend Plexe's functionality through custom integrations.
README:
Build machine learning models using natural language.
Quickstart | Features | Installation | Documentation
plexe lets you create machine learning models by describing them in plain language. Simply explain what you want, provide a dataset, and the AI-powered system builds a fully functional model through an automated agentic approach. Also available as a managed cloud service.
pip install plexe
export OPENAI_API_KEY=<your-key>
export ANTHROPIC_API_KEY=<your-key>Provide a tabular dataset (Parquet, CSV, ORC, or Avro) and a natural language intent:
python -m plexe.main \
--train-dataset-uri data.parquet \
--intent "predict whether a passenger was transported" \
--max-iterations 5from plexe.main import main
from pathlib import Path
best_solution, metrics, report = main(
intent="predict whether a passenger was transported",
data_refs=["train.parquet"],
max_iterations=5,
work_dir=Path("./workdir"),
)
print(f"Performance: {best_solution.performance:.4f}")The system uses 14 specialized AI agents across a 6-phase workflow to:
- Analyze your data and identify the ML task
- Select the right evaluation metric
- Search for the best model through hypothesis-driven iteration
- Evaluate model performance and robustness
- Package the model for deployment
Build complete models with a single call. Plexe supports XGBoost, CatBoost, and Keras for tabular data:
best_solution, metrics, report = main(
intent="predict house prices based on property features",
data_refs=["housing.parquet"],
max_iterations=10, # Search iterations
allowed_model_types=["xgboost"], # Or let plexe choose
enable_final_evaluation=True, # Evaluate on held-out test set
)Run python -m plexe.main --help for all CLI options.
The output is a self-contained model package at work_dir/model/ (also archived as model.tar.gz).
The package has no dependency on plexe β build the model with plexe, deploy it anywhere:
model/
βββ artifacts/ # Trained model + feature pipeline (pickle)
βββ src/ # Inference predictor, pipeline code, training template
βββ schemas/ # Input/output JSON schemas
βββ config/ # Hyperparameters
βββ evaluation/ # Metrics and detailed analysis reports
βββ model.yaml # Model metadata
βββ README.md # Usage instructions with example code
Run plexe with everything pre-configured β PySpark, Java, and all dependencies included.
A Makefile is provided for common workflows:
make build # Build the Docker image
make test-quick # Fast sanity check (~1 iteration)
make run-titanic # Run on Spaceship Titanic datasetOr run directly:
docker run --rm \
-e OPENAI_API_KEY=$OPENAI_API_KEY \
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
-v $(pwd)/data:/data -v $(pwd)/workdir:/workdir \
plexe:py3.12 python -m plexe.main \
--train-dataset-uri /data/dataset.parquet \
--intent "predict customer churn" \
--work-dir /workdir \
--spark-mode localA config.yaml in the project root is automatically mounted. A Databricks Connect image
is also available: docker build --target databricks .
Customize LLM routing, search parameters, Spark settings, and more via a config file:
# config.yaml
max_search_iterations: 5
allowed_model_types: [xgboost, catboost]
spark_driver_memory: "4g"
hypothesiser_llm: "openai/gpt-5-mini"
feature_processor_llm: "anthropic/claude-sonnet-4-5-20250929"CONFIG_FILE=config.yaml python -m plexe.main ...See config.yaml.template for all available options.
Plexe uses LLMs via LiteLLM, so you can use any supported provider:
# Route different agents to different providers
hypothesiser_llm: "openai/gpt-5-mini"
feature_processor_llm: "anthropic/claude-sonnet-4-5-20250929"
model_definer_llm: "ollama/llama3"[!NOTE] Plexe should work with most LiteLLM providers, but we actively test only with
openai/*andanthropic/*models. If you encounter issues with other providers, please let us know.
Visualize experiment results, search trees, and evaluation reports with the built-in Streamlit dashboard:
python -m plexe.viz --work-dir ./workdirConnect plexe to custom storage, tracking, and deployment infrastructure via the WorkflowIntegration interface:
main(intent="...", data_refs=[...], integration=MyCustomIntegration())See plexe/integrations/base.py for the full interface.
pip install plexe # Core (XGBoost, CatBoost, Keras, scikit-learn)
pip install plexe[pyspark] # + Local PySpark execution
pip install plexe[aws] # + S3 storage support (boto3)Requires Python >= 3.10, < 3.13.
export OPENAI_API_KEY=<your-key>
export ANTHROPIC_API_KEY=<your-key>See LiteLLM providers for all supported providers.
For full documentation, visit docs.plexe.ai.
See CONTRIBUTING.md for guidelines. Join our Discord to connect with the team.
If you use Plexe in your research, please cite it as follows:
@software{plexe2025,
author = {De Bernardi, Marcello AND Dubey, Vaibhav},
title = {Plexe: Build machine learning models using natural language.},
year = {2025},
publisher = {GitHub},
howpublished = {\url{https://github.com/plexe-ai/plexe}},
}For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for plexe
Similar Open Source Tools
For similar tasks
Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customerβs subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.
sorrentum
Sorrentum is an open-source project that aims to combine open-source development, startups, and brilliant students to build machine learning, AI, and Web3 / DeFi protocols geared towards finance and economics. The project provides opportunities for internships, research assistantships, and development grants, as well as the chance to work on cutting-edge problems, learn about startups, write academic papers, and get internships and full-time positions at companies working on Sorrentum applications.
tidb
TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.
zep-python
Zep is an open-source platform for building and deploying large language model (LLM) applications. It provides a suite of tools and services that make it easy to integrate LLMs into your applications, including chat history memory, embedding, vector search, and data enrichment. Zep is designed to be scalable, reliable, and easy to use, making it a great choice for developers who want to build LLM-powered applications quickly and easily.
telemetry-airflow
This repository codifies the Airflow cluster that is deployed at workflow.telemetry.mozilla.org (behind SSO) and commonly referred to as "WTMO" or simply "Airflow". Some links relevant to users and developers of WTMO: * The `dags` directory in this repository contains some custom DAG definitions * Many of the DAGs registered with WTMO don't live in this repository, but are instead generated from ETL task definitions in bigquery-etl * The Data SRE team maintains a WTMO Developer Guide (behind SSO)
mojo
Mojo is a new programming language that bridges the gap between research and production by combining Python syntax and ecosystem with systems programming and metaprogramming features. Mojo is still young, but it is designed to become a superset of Python over time.
pandas-ai
PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps you to explore, clean, and analyze your data using generative AI.
databend
Databend is an open-source cloud data warehouse that serves as a cost-effective alternative to Snowflake. With its focus on fast query execution and data ingestion, it's designed for complex analysis of the world's largest datasets.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.