mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

Stars: 7824

Visit

Mage is an open-source data pipeline tool for transforming and integrating data. It offers an easy developer experience, engineering best practices built-in, and data as a first-class citizen. Mage makes it easy to build, preview, and launch data pipelines, and provides observability and scaling capabilities. It supports data integrations, streaming pipelines, and dbt integration.

README:

🧙 A modern replacement for Airflow.

Documentation 🌪️ Get a 5 min overview 🌊 Play with live tool 🔥 Get instant help

Give your data team `magical` powers

Integrate and synchronize data from 3rd party sources

Build real-time and batch pipelines to transform data using Python, SQL, and R

Run, monitor, and orchestrate thousands of pipelines without losing sleep

1️⃣ 🏗️

Build

Have you met anyone who said they loved developing in Airflow?
That’s why we designed an easy developer experience that you’ll enjoy.


Easy developer experience Start developing locally with a single command or launch a dev environment in your cloud using Terraform. Language of choice Write code in Python, SQL, or R in the same data pipeline for ultimate flexibility. Engineering best practices built-in Each step in your pipeline is a standalone file containing modular code that’s reusable and testable with data validations. No more DAGs with spaghetti code.

Easy developer experience
Start developing locally with a single command or launch a dev environment in your cloud using Terraform.

Language of choice
Write code in Python, SQL, or R in the same data pipeline for ultimate flexibility.

Engineering best practices built-in
Each step in your pipeline is a standalone file containing modular code that’s reusable and testable with data validations. No more DAGs with spaghetti code.

↓

2️⃣ 🔮

Preview

Stop wasting time waiting around for your DAGs to finish testing.
Get instant feedback from your code each time you run it.


Interactive code Immediately see results from your code’s output with an interactive notebook UI. Data is a first-class citizen Each block of code in your pipeline produces data that can be versioned, partitioned, and cataloged for future use. Collaborate on cloud Develop collaboratively on cloud resources, version control with Git, and test pipelines without waiting for an available shared staging environment.

Interactive code
Immediately see results from your code’s output with an interactive notebook UI.

Data is a first-class citizen
Each block of code in your pipeline produces data that can be versioned, partitioned, and cataloged for future use.

Collaborate on cloud
Develop collaboratively on cloud resources, version control with Git, and test pipelines without waiting for an available shared staging environment.

↓

3️⃣ 🚀

Launch

Don’t have a large team dedicated to Airflow?
Mage makes it easy for a single developer or small team to scale up and manage thousands of pipelines.


Fast deploy Deploy Mage to AWS, GCP, or Azure with only 2 commands using maintained Terraform templates. Scaling made simple Transform very large datasets directly in your data warehouse or through a native integration with Spark. Observability Operationalize your pipelines with built-in monitoring, alerting, and observability through an intuitive UI.

🧙 Intro

Mage is an open-source data pipeline tool for transforming and integrating data.

🏃‍♀️ Install

The recommended way to install the latest version of Mage is through Docker with the following command:

docker pull mageai/mageai:latest

You can also install Mage using pip or conda, though this may cause dependency issues without the proper environment.

pip install mage-ai

conda install -c conda-forge mage-ai

Looking for help? The fastest way to get started is by checking out our documentation here.

Looking for quick examples? Open a demo project right in your browser or check out our guides.

🎮 Demo

Live demo

Build and run a data pipeline with our demo app.

WARNING

The live demo is public to everyone, please don’t save anything sensitive (e.g. passwords, secrets, etc).

Demo video (5 min)

_{Click the image to play video}

👩‍🏫 Tutorials

🔮 Features


🎶	Orchestration	Schedule and manage data pipelines with observability.
📓	Notebook	Interactive Python, SQL, & R editor for coding data pipelines.
🏗️	Data integrations	Synchronize data from 3rd party sources to your internal destinations.
🚰	Streaming pipelines	Ingest and transform real-time data.
❎	dbt	Build, run, and manage your dbt models with Mage.

A sample data pipeline defined across 3 files ➝

Load data ➝

@data_loader
def load_csv_from_file():
    return pd.read_csv('default_repo/titanic.csv')

Transform data ➝

@transformer
def select_columns_from_df(df, *args):
    return df[['Age', 'Fare', 'Survived']]

Export data ➝

@data_exporter
def export_titanic_data_to_disk(df) -> None:
    df.to_csv('default_repo/titanic_transformed.csv')

What the data pipeline looks like in the UI ➝

New? We recommend reading about blocks and learning from a hands-on tutorial.

🏔️ Core design principles

Every user experience and technical design decision adheres to these principles.


💻	Easy developer experience	Open-source engine that comes with a custom notebook UI for building data pipelines.
🚢	Engineering best practices built-in	Build and deploy data pipelines using modular code. No more writing throwaway code or trying to turn notebooks into scripts.
💳	Data is a first-class citizen	Designed from the ground up specifically for running data-intensive workflows.
🪐	Scaling is made simple	Analyze and process large data quickly for rapid iteration.

🛸 Core abstractions

These are the fundamental concepts that Mage uses to operate.


Project	Like a repository on GitHub; this is where you write all your code.
Pipeline	Contains references to all the blocks of code you want to run, charts for visualizing data, and organizes the dependency between each block of code.
Block	A file with code that can be executed independently or within a pipeline.
Data product	Every block produces data after it's been executed. These are called data products in Mage.
Trigger	A set of instructions that determine when or how a pipeline should run.
Run	Stores information about when it was started, its status, when it was completed, any runtime variables used in the execution of the pipeline or block, etc.

🙋‍♀️ Contributing and developing

Add features and instantly improve the experience for everyone.

Check out the contributing guide to set up your development environment and start building.

👨‍👩‍👧‍👦 Community

Individually, we’re a mage.

🧙 Mage

Magic is indistinguishable from advanced technology. A mage is someone who uses magic (aka advanced technology). Together, we’re Magers!

🧙‍♂️🧙 Magers (/ˈmājər/)

A group of mages who help each other realize their full potential! Let’s hang out and chat together ➝

For real-time news, fun memes, data engineering topics, and more, join us on ➝


	Twitter
	LinkedIn
	GitHub
	Slack

🤔 Frequently Asked Questions (FAQs)

Check out our FAQ page to find answers to some of our most asked questions.

🪪 License

See the LICENSE file for licensing information.

For Tasks:

Click tags to check more tools for each tasks

load data transform data export data schedule pipelines monitor pipelines

For Jobs:

data engineer data analyst data scientist machine learning engineer software engineer

Alternative AI tools for mage-ai

Similar Open Source Tools

mage-ai

github

: 7.8k

camel

CAMEL is an open-source library designed for the study of autonomous and communicative agents. We believe that studying these agents on a large scale offers valuable insights into their behaviors, capabilities, and potential risks. To facilitate research in this field, we implement and support various types of agents, tasks, prompts, models, and simulated environments.

github

: 11.5k

opik

Comet Opik is a repository containing two main services: a frontend and a backend. It provides a Python SDK for easy installation. Users can run the full application locally with minikube, following specific installation prerequisites. The repository structure includes directories for applications like Opik backend, with detailed instructions available in the README files. Users can manage the installation using simple k8s commands and interact with the application via URLs for checking the running application and API documentation. The repository aims to facilitate local development and testing of Opik using Kubernetes technology.

github

: 6.2k

eko

Eko is a lightweight and flexible command-line tool for managing environment variables in your projects. It allows you to easily set, get, and delete environment variables for different environments, making it simple to manage configurations across development, staging, and production environments. With Eko, you can streamline your workflow and ensure consistency in your application settings without the need for complex setup or configuration files.

github

: 1.1k

findto

Findto is a decentralized search tool for the Web and AI that puts people in control of algorithms. It aims to provide a better search experience by offering diverse sources, privacy and carbon level information, trends exploration, autosuggest, voice search, and more. Findto encourages a free search experience and promotes a healthier internet by empowering users with democratic choices.

github

: 83

airunner

AI Runner is a multi-modal AI interface that allows users to run open-source large language models and AI image generators on their own hardware. The tool provides features such as voice-based chatbot conversations, text-to-speech, speech-to-text, vision-to-text, text generation with large language models, image generation capabilities, image manipulation tools, utility functions, and more. It aims to provide a stable and user-friendly experience with security updates, a new UI, and a streamlined installation process. The application is designed to run offline on users' hardware without relying on a web server, offering a smooth and responsive user experience.

github

: 307

biochatter

Generative AI models have shown tremendous usefulness in increasing accessibility and automation of a wide range of tasks. This repository contains the `biochatter` Python package, a generic backend library for the connection of biomedical applications to conversational AI. It aims to provide a common framework for deploying, testing, and evaluating diverse models and auxiliary technologies in the biomedical domain. BioChatter is part of the BioCypher ecosystem, connecting natively to BioCypher knowledge graphs.

github

: 135

PlanExe

PlanExe is a planning AI tool that helps users generate detailed plans based on vague descriptions. It offers a Gradio-based web interface for easy input and output. Users can choose between running models in the cloud or locally on a high-end computer. The tool aims to provide a straightforward path to planning various tasks efficiently.

github

: 109

Starmoon

Starmoon is an affordable, compact AI-enabled device that can understand and respond to your emotions with empathy. It offers supportive conversations and personalized learning assistance. The device is cost-effective, voice-enabled, open-source, compact, and aims to reduce screen time. Users can assemble the device themselves using off-the-shelf components and deploy it locally for data privacy. Starmoon integrates various APIs for AI language models, speech-to-text, text-to-speech, and emotion intelligence. The hardware setup involves components like ESP32S3, microphone, amplifier, speaker, LED light, and button, along with software setup instructions for developers. The project also includes a web app, backend API, and background task dashboard for monitoring and management.

github

: 457

superduperdb

SuperDuperDB is a Python framework for integrating AI models, APIs, and vector search engines directly with your existing databases, including hosting of your own models, streaming inference and scalable model training/fine-tuning. Build, deploy and manage any AI application without the need for complex pipelines, infrastructure as well as specialized vector databases, and moving our data there, by integrating AI at your data's source: - Generative AI, LLMs, RAG, vector search - Standard machine learning use-cases (classification, segmentation, regression, forecasting recommendation etc.) - Custom AI use-cases involving specialized models - Even the most complex applications/workflows in which different models work together SuperDuperDB is **not** a database. Think `db = superduper(db)`: SuperDuperDB transforms your databases into an intelligent platform that allows you to leverage the full AI and Python ecosystem. A single development and deployment environment for all your AI applications in one place, fully scalable and easy to manage.

github

: 4.5k

Open-Interface

Open Interface is a self-driving software that automates computer tasks by sending user requests to a language model backend (e.g., GPT-4V) and simulating keyboard and mouse inputs to execute the steps. It course-corrects by sending current screenshots to the language models. The tool supports MacOS, Linux, and Windows, and requires setting up the OpenAI API key for access to GPT-4V. It can automate tasks like creating meal plans, setting up custom language model backends, and more. Open Interface is currently not efficient in accurate spatial reasoning, tracking itself in tabular contexts, and navigating complex GUI-rich applications. Future improvements aim to enhance the tool's capabilities with better models trained on video walkthroughs. The tool is cost-effective, with user requests priced between $0.05 - $0.20, and offers features like interrupting the app and primary display visibility in multi-monitor setups.

github

: 934

BitBLAS

BitBLAS is a library for mixed-precision BLAS operations on GPUs, for example, the $W_{wdtype}A_{adtype}$ mixed-precision matrix multiplication where $C_{cdtype}[M, N] = A_{adtype}[M, K] \times W_{wdtype}[N, K]$. BitBLAS aims to support efficient mixed-precision DNN model deployment, especially the $W_{wdtype}A_{adtype}$ quantization in large language models (LLMs), for example, the $W_{UINT4}A_{FP16}$ in GPTQ, the $W_{INT2}A_{FP16}$ in BitDistiller, the $W_{INT2}A_{INT8}$ in BitNet-b1.58. BitBLAS is based on techniques from our accepted submission at OSDI'24.

github

: 502

openlit

OpenLIT is an OpenTelemetry-native GenAI and LLM Application Observability tool. It's designed to make the integration process of observability into GenAI projects as easy as pie – literally, with just **a single line of code**. Whether you're working with popular LLM Libraries such as OpenAI and HuggingFace or leveraging vector databases like ChromaDB, OpenLIT ensures your applications are monitored seamlessly, providing critical insights to improve performance and reliability.

github

: 1.3k

inference

Xorbits Inference (Xinference) is a powerful and versatile library designed to serve language, speech recognition, and multimodal models. With Xorbits Inference, you can effortlessly deploy and serve your or state-of-the-art built-in models using just a single command. Whether you are a researcher, developer, or data scientist, Xorbits Inference empowers you to unleash the full potential of cutting-edge AI models.

github

: 7.4k

AI-Writer

AI-Writer is an AI content generation toolkit called Alwrity that automates and enhances the process of blog creation, optimization, and management. It integrates advanced AI models for text generation, image creation, and data analysis, offering features such as online research integration, long-form content generation, AI content planning, multilingual support, prevention of AI hallucinations, multimodal content generation, SEO optimization, and integration with platforms like Wordpress and Jekyll. The toolkit is designed for automated blog management and requires appropriate API keys and access credentials for full functionality.

github

: 331

Streamline-Analyst

Streamline Analyst is a cutting-edge, open-source application powered by Large Language Models (LLMs) designed to revolutionize data analysis. This Data Analysis Agent effortlessly automates tasks such as data cleaning, preprocessing, and complex operations like identifying target objects, partitioning test sets, and selecting the best-fit models based on your data. With Streamline Analyst, results visualization and evaluation become seamless. It aims to expedite the data analysis process, making it accessible to all, regardless of their expertise in data analysis. The tool is built to empower users to process data and achieve high-quality visualizations with unparalleled efficiency, and to execute high-performance modeling with the best strategies. Future enhancements include Natural Language Processing (NLP), neural networks, and object detection utilizing YOLO, broadening its capabilities to meet diverse data analysis needs.

github

: 301

For similar tasks

mage-ai

github

: 7.8k

nucliadb

NucliaDB is a robust database that allows storing and searching on unstructured data. It is an out of the box hybrid search database, utilizing vector, full text and graph indexes. NucliaDB is written in Rust and Python. We designed it to index large datasets and provide multi-teanant support. When utilizing NucliaDB with Nuclia cloud, you are able to the power of an NLP database without the hassle of data extraction, enrichment and inference. We do all the hard work for you.

github

: 688

LLMstudio

LLMstudio by TensorOps is a platform that offers prompt engineering tools for accessing models from providers like OpenAI, VertexAI, and Bedrock. It provides features such as Python Client Gateway, Prompt Editing UI, History Management, and Context Limit Adaptability. Users can track past runs, log costs and latency, and export history to CSV. The tool also supports automatic switching to larger-context models when needed. Coming soon features include side-by-side comparison of LLMs, automated testing, API key administration, project organization, and resilience against rate limits. LLMstudio aims to streamline prompt engineering, provide execution history tracking, and enable effortless data export, offering an evolving environment for teams to experiment with advanced language models.

github

: 311

CyberScraper-2077

CyberScraper 2077 is an advanced web scraping tool powered by AI, designed to extract data from websites with precision and style. It offers a user-friendly interface, supports multiple data export formats, operates in stealth mode to avoid detection, and promises lightning-fast scraping. The tool respects ethical scraping practices, including robots.txt and site policies. With upcoming features like proxy support and page navigation, CyberScraper 2077 is a futuristic solution for data extraction in the digital realm.

github

: 951

kangaroo

Kangaroo is an AI-powered SQL client and admin tool for popular databases like SQLite, MySQL, PostgreSQL, etc. It supports various functionalities such as table design, query, model, sync, export/import, and more. The tool is designed to be comfortable, fun, and developer-friendly, with features like code intellisense and autocomplete. Kangaroo aims to provide a seamless experience for database management across different operating systems.

github

: 370

airbyte

Airbyte is an open-source data integration platform that makes it easy to move data from any source to any destination. With Airbyte, you can build and manage data pipelines without writing any code. Airbyte provides a library of pre-built connectors that make it easy to connect to popular data sources and destinations. You can also create your own connectors using Airbyte's no-code Connector Builder or low-code CDK. Airbyte is used by data engineers and analysts at companies of all sizes to build and manage their data pipelines.

github

: 17.8k

airbyte-connectors

This repository contains Airbyte connectors used in Faros and Faros Community Edition platforms as well as Airbyte Connector Development Kit (CDK) for JavaScript/TypeScript.

github

: 115

embedJs

EmbedJs is a NodeJS framework that simplifies RAG application development by efficiently processing unstructured data. It segments data, creates relevant embeddings, and stores them in a vector database for quick retrieval.

github

: 352

For similar jobs

lollms-webui

LoLLMs WebUI (Lord of Large Language Multimodal Systems: One tool to rule them all) is a user-friendly interface to access and utilize various LLM (Large Language Models) and other AI models for a wide range of tasks. With over 500 AI expert conditionings across diverse domains and more than 2500 fine tuned models over multiple domains, LoLLMs WebUI provides an immediate resource for any problem, from car repair to coding assistance, legal matters, medical diagnosis, entertainment, and more. The easy-to-use UI with light and dark mode options, integration with GitHub repository, support for different personalities, and features like thumb up/down rating, copy, edit, and remove messages, local database storage, search, export, and delete multiple discussions, make LoLLMs WebUI a powerful and versatile tool.

github

: 4.6k

Azure-Analytics-and-AI-Engagement

The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.

github

: 136

minio

MinIO is a High Performance Object Storage released under GNU Affero General Public License v3.0. It is API compatible with Amazon S3 cloud storage service. Use MinIO to build high performance infrastructure for machine learning, analytics and application data workloads.

github

: 46.0k

mage-ai

github

: 7.8k

AiTreasureBox

AiTreasureBox is a versatile AI tool that provides a collection of pre-trained models and algorithms for various machine learning tasks. It simplifies the process of implementing AI solutions by offering ready-to-use components that can be easily integrated into projects. With AiTreasureBox, users can quickly prototype and deploy AI applications without the need for extensive knowledge in machine learning or deep learning. The tool covers a wide range of tasks such as image classification, text generation, sentiment analysis, object detection, and more. It is designed to be user-friendly and accessible to both beginners and experienced developers, making AI development more efficient and accessible to a wider audience.

github

: 368

tidb

TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

github

: 37.1k

airbyte

github

: 17.8k

labelbox-python

Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.

github

: 135

mage-ai

README:

Give your data team magical powers

Build

Preview

Launch

🧙 Intro

🏃‍♀️ Install

🎮 Demo

Live demo

Demo video (5 min)

👩‍🏫 Tutorials

🔮 Features

🏔️ Core design principles

🛸 Core abstractions

🙋‍♀️ Contributing and developing

👨‍👩‍👧‍👦 Community

🤔 Frequently Asked Questions (FAQs)

🪪 License

For Tasks:

For Jobs:

Alternative AI tools for mage-ai

Similar Open Source Tools

mage-ai

camel

opik

eko

findto

airunner

biochatter

PlanExe

Starmoon

superduperdb

Open-Interface

BitBLAS

openlit

inference

AI-Writer

Streamline-Analyst

For similar tasks

mage-ai

nucliadb

LLMstudio

CyberScraper-2077

kangaroo

airbyte

airbyte-connectors

embedJs

For similar jobs

lollms-webui

Azure-Analytics-and-AI-Engagement

minio

mage-ai

AiTreasureBox

tidb

airbyte

labelbox-python

Give your data team `magical` powers