ivy

The Unified AI Framework

Stars: 14027

Visit

Ivy is an open-source machine learning framework that enables users to convert code between different ML frameworks and write framework-agnostic code. It allows users to transpile code from one framework to another, making it easy to use building blocks from different frameworks in a single project. Ivy also serves as a flexible framework that breaks free from framework limitations, allowing users to publish code that is interoperable with various frameworks and future frameworks. Users can define trainable modules and layers using Ivy's stateful API, making it easy to build and train models across different backends.

README:

Sign up on our console for pilot access!

Status

Unified AI

Ivy is an open-source machine learning framework that enables you to:

🔄 Convert code into any framework: Use and build on top of any model, library, or device by converting any code from one framework to another using ivy.transpile.
⚒️ Write framework-agnostic code: Write your code once in ivy and then choose the most appropriate ML framework as the backend to leverage all the benefits and tools.

Join our growing community 🌍 to connect with people using Ivy. Let's unify.ai together 🦾

Getting started

Ivy's transpiler helps you convert code between different ML frameworks. To get pilot access to the transpiler, sign up and generate an API key. The Get Started notebook should help you set up your API key and the Quickstart notebook should give you a brief idea of the features!

The most important notebooks are:

Beyond that, based on the frameworks you want to convert code between, there are a few more examples further down this page 👇 which contain a number of models and libraries transpiled between PyTorch, JAX, TensorFlow and NumPy.

Installing ivy

The easiest way to set up Ivy is to install it using pip:

pip install ivy

Docker Images

Given the challenges of maintaining installations of various frameworks in a single environment, users who would want to test ivy with multiple frameworks at once can use our Docker images for a seamless experience. You can pull the images from:

docker pull unifyai/ivy:latest      # CPU
docker pull unifyai/ivy:latest-gpu  # GPU

From Source

You can also install Ivy from source if you want to take advantage of the latest changes, but we can't ensure everything will work as expected 😅

git clone https://github.com/unifyai/ivy.git
cd ivy
pip install --user -e .

If you want to set up testing and various frameworks it's probably best to check out the Setting Up page, where OS-specific and IDE-specific instructions and video tutorials to do so are available!

Using Ivy

After installing Ivy, you can start using it straight away, for example:

Transpiling any code from one framework to another

import ivy
import torch
import jax

def jax_fn(x):
    a = jax.numpy.dot(x, x)
    b = jax.numpy.mean(x)
    return x * a + b

jax_x = jax.numpy.array([1., 2., 3.])
torch_x = torch.tensor([1., 2., 3.])
torch_fn = ivy.transpile(jax_fn, source="jax", to="torch", args=(jax_x,))
ret = torch_fn(torch_x)

Running your code with any backend

 import ivy
 import torch
 import jax

 ivy.set_backend("jax")

 x = jax.numpy.array([1, 2, 3])
 y = jax.numpy.array([3, 2, 1])
 z = ivy.add(x, y)

 ivy.set_backend('torch')

 x = torch.tensor([1, 2, 3])
 y = torch.tensor([3, 2, 1])
 z = ivy.add(x, y)

The Examples page features a wide range of demos and tutorials showcasing the functionalities of Ivy along with multiple use cases, but feel free to check out some shorter framework-specific examples here ⬇️

I'm using PyTorch

You can use Ivy to get PyTorch code from:

Any model

From TensorFlow

import ivy
import torch
import tensorflow as tf

# Get a pretrained keras model
eff_encoder = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
    include_top=False, weights="imagenet", input_shape=(224, 224, 3)
)

# Transpile it into a torch.nn.Module with the corresponding parameters
noise = tf.random.normal(shape=(1, 224, 224, 3))
torch_eff_encoder = ivy.transpile(eff_encoder, source="tensorflow", to="torch", args=(noise,))

# Build a classifier using the transpiled encoder
class Classifier(torch.nn.Module):
    def __init__(self, num_classes=20):
        super().__init__()
        self.encoder = torch_eff_encoder
        self.fc = torch.nn.Linear(1280, num_classes)

    def forward(self, x):
        x = self.encoder(x)
        return self.fc(x)

# Initialize a trainable, customizable, torch.nn.Module
classifier = Classifier()
ret = classifier(torch.rand((1, 244, 244, 3)))

From JAX

import ivy
import jax
import torch

# Get a pretrained haiku model
# https://github.com/unifyai/demos/blob/15c235f/scripts/deepmind_perceiver_io.py
from deepmind_perceiver_io import key, perceiver_backbone

# Transpile it into a torch.nn.Module with the corresponding parameters
dummy_input = jax.random.uniform(key, shape=(1, 3, 224, 224))
params = perceiver_backbone.init(rng=key, images=dummy_input)
ivy.set_backend("jax")
backbone = ivy.transpile(
    perceiver_backbone, source="jax", to="torch", params_v=params, kwargs={"images": dummy_input}
)

# Build a classifier using the transpiled backbone
class PerceiverIOClassifier(torch.nn.Module):
    def __init__(self, num_classes=20):
        super().__init__()
        self.backbone = backbone
        self.max_pool = torch.nn.MaxPool2d((512, 1))
        self.flatten = torch.nn.Flatten()
        self.fc = torch.nn.Linear(1024, num_classes)

    def forward(self, x):
        x = self.backbone(images=x)
        x = self.flatten(self.max_pool(x))
        return self.fc(x)

# Initialize a trainable, customizable, torch.nn.Module
classifier = PerceiverIOClassifier()
ret = classifier(torch.rand((1, 3, 224, 224)))

Any library

From Tensorflow

import ivy
import torch
import os
os.environ["SM_FRAMEWORK"] = "tf.keras"
import segmentation_models as sm

# transpile sm from tensorflow to torch
torch_sm = ivy.transpile(sm, source="tensorflow", to="torch")

# get some image-like arrays
output = torch.rand((1, 3, 512, 512))
target = torch.rand((1, 3, 512, 512))

# and use the transpiled version of any function from the library!
out = torch_sm.metrics.iou_score(output, target)

From JAX

import ivy
import rax
import torch

# transpile rax from jax to torch
torch_rax = ivy.transpile(rax, source="jax", to="torch")

# get some arrays
scores = torch.tensor([2.2, 1.3, 5.4])
labels = torch.tensor([1.0, 0.0, 0.0])

# and use the transpiled version of any function from the library!
out = torch_rax.poly1_softmax_loss(scores, labels)

From NumPy

import ivy
import torch
import madmom

# transpile madmon from numpy to torch
torch_madmom = ivy.transpile(madmom, source="numpy", to="torch")

# get some arrays
freqs = torch.arange(20) * 10

# and use the transpiled version of any function from the library!
out = torch_madmom.audio.filters.hz2midi(freqs)

Any function

From Tensorflow

import ivy
import tensorflow as tf
import torch

def loss(predictions, targets):
    return tf.sqrt(tf.reduce_mean(tf.square(predictions - targets)))

# transpile any function from tf to torch
torch_loss = ivy.transpile(loss, source="tensorflow", to="torch")

# get some arrays
p = torch.tensor([3.0, 2.0, 1.0])
t = torch.tensor([0.0, 0.0, 0.0])

# and use the transpiled version!
out = torch_loss(p, t)

From JAX

import ivy
import jax.numpy as jnp
import torch

def loss(predictions, targets):
    return jnp.sqrt(jnp.mean((predictions - targets) ** 2))

# transpile any function from jax to torch
torch_loss = ivy.transpile(loss, source="jax", to="torch")

# get some arrays
p = torch.tensor([3.0, 2.0, 1.0])
t = torch.tensor([0.0, 0.0, 0.0])

# and use the transpiled version!
out = torch_loss(p, t)

From NumPy

import ivy
import numpy as np
import torch

def loss(predictions, targets):
    return np.sqrt(np.mean((predictions - targets) ** 2))

# transpile any function from numpy to torch
torch_loss = ivy.transpile(loss, source="numpy", to="torch")

# get some arrays
p = torch.tensor([3.0, 2.0, 1.0])
t = torch.tensor([0.0, 0.0, 0.0])

# and use the transpiled version!
out = torch_loss(p, t)

I'm using TensorFlow

You can use Ivy to get TensorFlow code from:

Any model

From PyTorch

import ivy
import torch
import timm
import tensorflow as tf

# Get a pretrained pytorch model
mlp_encoder = timm.create_model("mixer_b16_224", pretrained=True, num_classes=0)

# Transpile it into a keras.Model with the corresponding parameters
noise = torch.randn(1, 3, 224, 224)
mlp_encoder = ivy.transpile(mlp_encoder, to="tensorflow", args=(noise,))

# Build a classifier using the transpiled encoder
class Classifier(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.encoder = mlp_encoder
        self.output_dense = tf.keras.layers.Dense(units=1000, activation="softmax")

    def call(self, x):
        x = self.encoder(x)
        return self.output_dense(x)

# Transform the classifier and use it as a standard keras.Model
x = tf.random.normal(shape=(1, 3, 224, 224))
model = Classifier()
ret = model(x)

From JAX

import ivy
import jax
import tensorflow as tf

# Get a pretrained haiku model
# https://unify.ai/demos/scripts/deepmind_perceiver_io.py
from deepmind_perceiver_io import key, perceiver_backbone

# Transpile it into a tf.keras.Model with the corresponding parameters
dummy_input = jax.random.uniform(key, shape=(1, 3, 224, 224))
params = perceiver_backbone.init(rng=key, images=dummy_input)
backbone = ivy.transpile(
    perceiver_backbone, to="tensorflow", params_v=params, args=(dummy_input,)
)

# Build a classifier using the transpiled backbone
class PerceiverIOClassifier(tf.keras.Model):
    def __init__(self, num_classes=20):
        super().__init__()
        self.backbone = backbone
        self.max_pool = tf.keras.layers.MaxPooling1D(pool_size=512)
        self.flatten = tf.keras.layers.Flatten()
        self.fc = tf.keras.layers.Dense(num_classes)

    def call(self, x):
        x = self.backbone(x)
        x = self.flatten(self.max_pool(x))
        return self.fc(x)

# Initialize a trainable, customizable, tf.keras.Model
x = tf.random.normal(shape=(1, 3, 224, 224))
classifier = PerceiverIOClassifier()
ret = classifier(x)

Any library

From PyTorch

import ivy
import kornia
import requests
import numpy as np
import tensorflow as tf
from PIL import Image

# transpile kornia from torch to tensorflow
tf_kornia = ivy.transpile(kornia, source="torch", to="tensorflow")

# get an image
url = "http://images.cocodataset.org/train2017/000000000034.jpg"
raw_img = Image.open(requests.get(url, stream=True).raw)

# convert it to the format expected by kornia
img = np.array(raw_img)
img = tf.transpose(tf.constant(img), (2, 0, 1))
img = tf.expand_dims(img, 0) / 255

# and use the transpiled version of any function from the library!
out = tf_kornia.enhance.sharpness(img, 5)

From JAX

import ivy
import rax
import tensorflow as tf

# transpile rax from jax to tensorflow
tf_rax = ivy.transpile(rax, source="jax", to="tensorflow")

# get some arrays
scores = tf.constant([2.2, 1.3, 5.4])
labels = tf.constant([1.0, 0.0, 0.0])

# and use the transpiled version of any function from the library!
out = tf_rax.poly1_softmax_loss(scores, labels)

From NumPy

import ivy
import madmom
import tensorflow as tf

# transpile madmom from numpy to tensorflow
tf_madmom = ivy.transpile(madmom, source="numpy", to="tensorflow")

# get some arrays
freqs = tf.range(20) * 10

# and use the transpiled version of any function from the library!
out = tf_madmom.audio.filters.hz2midi(freqs)

Any function

From PyTorch

import ivy
import torch
import tensorflow as tf

def loss(predictions, targets):
    return torch.sqrt(torch.mean((predictions - targets) ** 2))

# transpile any function from torch to tensorflow
tf_loss = ivy.transpile(loss, source="torch", to="tensorflow")

# get some arrays
p = tf.constant([3.0, 2.0, 1.0])
t = tf.constant([0.0, 0.0, 0.0])

# and use the transpiled version!
out = tf_loss(p, t)

From JAX

import ivy
import jax.numpy as jnp
import tensorflow as tf

def loss(predictions, targets):
    return jnp.sqrt(jnp.mean((predictions - targets) ** 2))

# transpile any function from jax to tensorflow
tf_loss = ivy.transpile(loss, source="jax", to="tensorflow")

# get some arrays
p = tf.constant([3.0, 2.0, 1.0])
t = tf.constant([0.0, 0.0, 0.0])

# and use the transpiled version!
out = tf_loss(p, t)

From NumPy

import ivy
import numpy as np
import tensorflow as tf

def loss(predictions, targets):
    return np.sqrt(np.mean((predictions - targets) ** 2))

# transpile any function from numpy to tensorflow
tf_loss = ivy.transpile(loss, source="numpy", to="tensorflow")

# get some arrays
p = tf.constant([3.0, 2.0, 1.0])
t = tf.constant([0.0, 0.0, 0.0])

# and use the transpiled version!
out = tf_loss(p, t)

I'm using Jax

You can use Ivy to get JAX code from:

Any model

From PyTorch

import ivy
import timm
import torch
import jax
import haiku as hk

# Get a pretrained pytorch model
mlp_encoder = timm.create_model("mixer_b16_224", pretrained=True, num_classes=0)

# Transpile it into a hk.Module with the corresponding parameters
noise = torch.randn(1, 3, 224, 224)
mlp_encoder = ivy.transpile(mlp_encoder, source="torch", to="haiku", args=(noise,))

# Build a classifier using the transpiled encoder
class Classifier(hk.Module):
    def __init__(self, num_classes=1000):
        super().__init__()
        self.encoder = mlp_encoder()
        self.fc = hk.Linear(output_size=num_classes, with_bias=True)

    def __call__(self, x):
        x = self.encoder(x)
        x = self.fc(x)
        return x

def _forward_classifier(x):
    module = Classifier()
    return module(x)

# Transform the classifier and use it as a standard hk.Module
rng_key = jax.random.PRNGKey(42)
x = jax.random.uniform(key=rng_key, shape=(1, 3, 224, 224), dtype=jax.numpy.float32)
forward_classifier = hk.transform(_forward_classifier)
params = forward_classifier.init(rng=rng_key, x=x)

ret = forward_classifier.apply(params, None, x)

From TensorFlow

import ivy
import jax
import haiku as hk
import tensorflow as tf
jax.config.update("jax_enable_x64", True)

# Get a pretrained keras model
eff_encoder = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
    include_top=False, weights="imagenet", input_shape=(224, 224, 3)
)

# Transpile it into a hk.Module with the corresponding parameters
noise = tf.random.normal(shape=(1, 224, 224, 3))
hk_eff_encoder = ivy.transpile(eff_encoder, source="tensorflow", to="haiku", args=(noise,))

# Build a classifier using the transpiled encoder
class Classifier(hk.Module):
    def __init__(self, num_classes=1000):
        super().__init__()
        self.encoder = hk_eff_encoder()
        self.fc = hk.Linear(output_size=num_classes, with_bias=True)

    def __call__(self, x):
        x = self.encoder(x)
        x = self.fc(x)
        return x

def _forward_classifier(x):
    module = Classifier()
    return module(x)

# Transform the classifier and use it as a standard hk.Module
rng_key = jax.random.PRNGKey(42)
dummy_x = jax.random.uniform(key=rng_key, shape=(1, 224, 224, 3))
forward_classifier = hk.transform(_forward_classifier)
params = forward_classifier.init(rng=rng_key, x=dummy_x)

ret = forward_classifier.apply(params, None, dummy_x)

Any library

From PyTorch

import ivy
import kornia
import requests
import jax.numpy as jnp
from PIL import Image
jax.config.update("jax_enable_x64", True)

# transpile kornia from torch to jax
jax_kornia = ivy.transpile(kornia, source="torch", to="jax")

# get an image
url = "http://images.cocodataset.org/train2017/000000000034.jpg"
raw_img = Image.open(requests.get(url, stream=True).raw)

# convert it to the format expected by kornia
img = jnp.transpose(jnp.array(raw_img), (2, 0, 1))
img = jnp.expand_dims(img, 0) / 255

# and use the transpiled version of any function from the library!
out = jax_kornia.enhance.sharpness(img, 5)

From TensorFlow

import ivy
import jax
import os
os.environ["SM_FRAMEWORK"] = "tf.keras"
import segmentation_models as sm

# transpile sm from tensorflow to jax
jax_sm = ivy.transpile(sm, source="tensorflow", to="jax")

# get some image-like arrays
key = jax.random.PRNGKey(23)
key1, key2 = jax.random.split(key)
output = jax.random.uniform(key1, (1, 3, 512, 512))
target = jax.random.uniform(key2, (1, 3, 512, 512))

# and use the transpiled version of any function from the library!
out = jax_sm.metrics.iou_score(output, target)

From NumPy

import ivy
import madmom
import jax.numpy as jnp

# transpile madmon from numpy to jax
jax_madmom = ivy.transpile(madmom, source="numpy", to="jax")

# get some arrays
freqs = jnp.arange(20) * 10

# and use the transpiled version of any function from the library!
out = jax_madmom.audio.filters.hz2midi(freqs)

Any function

From PyTorch

import ivy
import torch
import jax.numpy as jnp

def loss(predictions, targets):
    return torch.sqrt(torch.mean((predictions - targets) ** 2))

# transpile any function from torch to jax
jax_loss = ivy.transpile(loss, source="torch", to="jax")

# get some arrays
p = jnp.array([3.0, 2.0, 1.0])
t = jnp.array([0.0, 0.0, 0.0])

# and use the transpiled version!
out = jax_loss(p, t)

From TensorFlow

import ivy
import tensorflow as tf
import jax.numpy as jnp

def loss(predictions, targets):
    return tf.sqrt(tf.reduce_mean(tf.square(predictions - targets)))

# transpile any function from tf to jax
jax_loss = ivy.transpile(loss, source="tensorflow", to="jax")

# get some arrays
p = jnp.array([3.0, 2.0, 1.0])
t = jnp.array([0.0, 0.0, 0.0])

# and use the transpiled version!
out = jax_loss(p, t)

From NumPy

import ivy
import numpy as np
import jax
import jax.numpy as jnp
jax.config.update('jax_enable_x64', True)

def loss(predictions, targets):
    return np.sqrt(np.mean((predictions - targets) ** 2))

# transpile any function from numpy to jax
jax_loss = ivy.transpile(loss, source="numpy", to="jax")

# get some arrays
p = jnp.array([3.0, 2.0, 1.0])
t = jnp.array([0.0, 0.0, 0.0])

# and use the transpiled version!
out = jax_loss(p, t)

I'm using NumPy

You can use Ivy to get NumPy code from:

Any library

From PyTorch

import ivy
import kornia
import requests
import numpy as np
from PIL import Image

# transpile kornia from torch to np
np_kornia = ivy.transpile(kornia, source="torch", to="numpy")

# get an image
url = "http://images.cocodataset.org/train2017/000000000034.jpg"
raw_img = Image.open(requests.get(url, stream=True).raw)

# convert it to the format expected by kornia
img = np.transpose(np.array(raw_img), (2, 0, 1))
img = np.expand_dims(img, 0) / 255

# and use the transpiled version of any function from the library!
out = np_kornia.enhance.sharpness(img, 5)

From TensorFlow

import ivy
import numpy as np
import os
os.environ["SM_FRAMEWORK"] = "tf.keras"
import segmentation_models as sm

# transpile sm from tensorflow to numpy
np_sm = ivy.transpile(sm, source="tensorflow", to="numpy")

# get some image-like arrays
output = np.random.rand(1, 3, 512, 512).astype(dtype=np.float32)
target = np.random.rand(1, 3, 512, 512).astype(dtype=np.float32)

# and use the transpiled version of any function from the library!
out = np_sm.metrics.iou_score(output, target)

From Jax

import ivy
import rax
import numpy as np

# transpile rax from jax to numpy
np_rax = ivy.transpile(rax, source="jax", to="numpy")

# get some arrays
scores = np.array([2.2, 1.3, 5.4])
labels = np.array([1.0, 0.0, 0.0])

# and use the transpiled version of any function from the library!
out = np_rax.poly1_softmax_loss(scores, labels)

Any function

From PyTorch

import ivy
import torch
import numpy as np

def loss(predictions, targets):
    return torch.sqrt(torch.mean((predictions - targets) ** 2))

# transpile any function from torch to numpy
np_loss = ivy.transpile(loss, source="torch", to="numpy")

# get some arrays
p = np.array([3.0, 2.0, 1.0])
t = np.array([0.0, 0.0, 0.0])

# and use the transpiled version!
out = np_loss(p, t)

From TensorFlow

import ivy
import tensorflow as tf
import numpy as np

def loss(predictions, targets):
    return tf.sqrt(tf.reduce_mean(tf.square(predictions - targets)))

# transpile any function from tf to numpy
np_loss = ivy.transpile(loss, source="tensorflow", to="numpy")

# get some arrays
p = np.array([3.0, 2.0, 1.0])
t = np.array([0.0, 0.0, 0.0])

# and use the transpiled version!
out = np_loss(p, t)

From JAX

import ivy
import jax.numpy as jnp
import numpy as np

def loss(predictions, targets):
    return jnp.sqrt(jnp.mean((predictions - targets) ** 2))

# transpile any function from jax to numpy
np_loss = ivy.transpile(loss, source="jax", to="numpy")

# get some arrays
p = np.array([3.0, 2.0, 1.0])
t = np.array([0.0, 0.0, 0.0])

# and use the transpiled version!
out = np_loss(p, t)

I'm using Ivy

Or you can use Ivy as a framework, breaking yourself (and your code) free from deciding which community to support, allowing anyone to run your code in their framework of choice!

import ivy

# A simple image classification model
class IvyNet(ivy.Module):
    def __init__(
        self,
        h_w=(32, 32),
        input_channels=3,
        output_channels=512,
        num_classes=2,
        data_format="NCHW",
        device="cpu",
    ):
        self.h_w = h_w
        self.input_channels = input_channels
        self.output_channels = output_channels
        self.num_classes = num_classes
        self.data_format = data_format
        super().__init__(device=device)

    def _build(self, *args, **kwargs):
        self.extractor = ivy.Sequential(
            ivy.Conv2D(self.input_channels, 6, [5, 5], 1, "SAME", data_format=self.data_format),
            ivy.GELU(),
            ivy.Conv2D(6, 16, [5, 5], 1, "SAME", data_format=self.data_format),
            ivy.GELU(),
            ivy.Conv2D(16, self.output_channels, [5, 5], 1, "SAME", data_format=self.data_format),
            ivy.GELU(),
        )

        self.classifier = ivy.Sequential(
            # Since the padding is "SAME", this would be image_height x image_width x output_channels
            ivy.Linear(self.h_w[0] * self.h_w[1] * self.output_channels, 512),
            ivy.GELU(),
            ivy.Linear(512, self.num_classes),
        )

    def _forward(self, x):
        x = self.extractor(x)
        # flatten all dims except batch dim
        x = ivy.flatten(x, start_dim=1, end_dim=-1)
        logits = self.classifier(x)
        probs = ivy.softmax(logits)
        return logits, probs

After building your model in Ivy, you can set your favourite framework as the backend to use its operations under the hood!

ivy.set_backend("torch")
model = IvyNet()
x = torch.randn(1, 3, 32, 32)
logits, probs = model(x)

ivy.set_backend("tensorflow")
model = IvyNet()
x = tf.random.uniform(shape=(1, 3, 32, 32))
logits, probs = model(x)

ivy.set_backend("jax")
model = IvyNet()
x = jax.random.uniform(key, shape=(1, 3, 32, 32))
logits, probs = model(x)

ivy.set_backend("numpy")
model = IvyNet()
x = np.random.uniform(size=(1, 3, 32, 32))
logits, probs = model(x)

Last but not least, we can also build the training pipeline in pure ivy ⬇️

Let's define some helper functions first

# helper function for loading the dataset in batches
def generate_batches(images, classes, dataset_size, batch_size=32):
    if batch_size > dataset_size:
        raise ivy.utils.exceptions.IvyError("Use a smaller batch size")
    for idx in range(0, dataset_size, batch_size):
        yield images[idx : min(idx + batch_size, dataset_size)], classes[
            idx : min(idx + batch_size, dataset_size)
        ]


# helper function to get the number of current predictions
def num_correct(preds, labels):
    return (preds.argmax() == labels).sum().to_numpy().item()


# define a loss function
def loss_fn(params):
    v, model, x, y = params
    _, probs = model(x, v=v)
    return ivy.cross_entropy(y, probs), probs

And train this model!

# train the model on gpu if it's available
device = "gpu:0" if ivy.gpu_is_available() else "cpu"

# training hyperparams
optimizer = ivy.Adam(1e-4)
batch_size = 4
num_epochs = 20
num_classes = 10

model = IvyNet(
    h_w=(28, 28),
    input_channels=1,
    output_channels=120,
    num_classes=num_classes,
    device=device,
)

images = ivy.random_uniform(shape=(16, 1, 28, 28))
classes = ivy.randint(0, num_classes - 1, shape=(16,))


# training loop
def train(images, classes, epochs, model, device, num_classes=10, batch_size=32):
    # training metrics
    epoch_loss = 0.0
    metrics = []
    dataset_size = len(images)

    for epoch in range(epochs):
        train_correct = 0
        train_loop = tqdm(
            generate_batches(images, classes, len(images), batch_size=batch_size),
            total=dataset_size // batch_size,
            position=0,
            leave=True,
        )

        for xbatch, ybatch in train_loop:
            xbatch, ybatch = xbatch.to_device(device), ybatch.to_device(device)

            # Since the cross entropy function expects the target classes to be in one-hot encoded format
            ybatch_encoded = ivy.one_hot(ybatch, num_classes)

            # update model params
            loss_probs, grads = ivy.execute_with_gradients(
                loss_fn,
                (model.v, model, xbatch, ybatch_encoded),
            )

            model.v = optimizer.step(model.v, grads["0"])

            batch_loss = ivy.to_numpy(loss_probs[0]).mean().item()  # batch mean loss
            epoch_loss += batch_loss * xbatch.shape[0]
            train_correct += num_correct(loss_probs[1], ybatch)

            train_loop.set_description(f"Epoch [{epoch + 1:2d}/{epochs}]")
            train_loop.set_postfix(
                running_loss=batch_loss,
                accuracy_percentage=(train_correct / dataset_size) * 100,
            )

        epoch_loss = epoch_loss / dataset_size
        training_accuracy = train_correct / dataset_size

        metrics.append([epoch, epoch_loss, training_accuracy])

        train_loop.write(
            f"\nAverage training loss: {epoch_loss:.6f}, Train Correct: {train_correct}",
            end="\n",
        )


# assuming the dataset(images and classes) are already prepared in a folder
train(
    images,
    classes,
    num_epochs,
    model,
    device,
    num_classes=num_classes,
    batch_size=batch_size,
)

For a more comprehensive overview, head over to the Demos section with more on the basics, a few guides and a wide-ranging set of examples that demonstrate the transpilation of various popular models. We continue to expand on that list, let us know what demos you'd like us to add next 🎯

Let's take a look at how Ivy works both as a transpiler and a framework in a bit more detail to get an idea of why and where to use it.

Ivy as a transpiler

When should I use Ivy as a transpiler?

If you want to use building blocks published in other frameworks (neural networks, layers, array computing libraries, training pipelines...), you want to integrate code developed in various frameworks, or maybe straight up move code from one framework to another, the transpiler is definitely the tool 🔧 for the job! As the output of transpilation is native code in the target framework, you can use the converted code just as if it was code originally developed in that framework, applying framework-specific optimizations or tools, instantly exposing your project to all of the unique perks of a different framework.

Ivy's transpiler allows you to use code from any other framework (or from any other version of the same framework!) in your own code, by just adding one line of code. Under the hood, Ivy traces a computational graph and leverages the frontends and backends to link one framework to another.

This way, Ivy makes all ML-related projects available for you, independently of the framework you want to use to research, develop, or deploy systems. Feel free to head over to the docs for the full API reference, but the functions you'd most likely want to use are:

# Traces an efficient fully-functional graph from a function, removing all wrapping and redundant code
ivy.trace_graph()

# Converts framework-specific code to a different framework
ivy.transpile()

# Converts framework-specific code to Ivy
ivy.unify()

These functions can be used eagerly or lazily. If you pass the necessary arguments for function tracing, the graph tracing/transpilation step will happen instantly (eagerly). Otherwise, the graph tracing/transpilation will happen only when the returned function is first invoked.

import ivy
import jax
ivy.set_backend("jax")

# Simple JAX function to transpile
def test_fn(x):
    return jax.numpy.sum(x)

x1 = ivy.array([1., 2.])

# Arguments are available -> transpilation happens eagerly
eager_graph = ivy.transpile(test_fn, source="jax", to="torch", args=(x1,))

# eager_graph is now torch code and runs efficiently
ret = eager_graph(x1)

# Arguments are not available -> transpilation happens lazily
lazy_graph = ivy.transpile(test_fn, source="jax", to="torch")

# The transpiled graph is initialized, transpilation will happen here
ret = lazy_graph(x1)

# lazy_graph is now torch code and runs efficiently
ret = lazy_graph(x1)

If you want to learn more, you can find more information in the Ivy as a transpiler section of the docs!

Ivy as a framework

When should I use Ivy as a framework?

As Ivy supports multiple backends, writing code in Ivy breaks you free from framework limitations. If you want to publish highly flexible code for everyone to use, independently of the framework they are using, or you plan to develop ML-related tools and want them to be interoperable with not only the already existing frameworks, but also with future frameworks, then Ivy is for you!

The Ivy framework is built on top of various essential components, mainly the Backend Handler, which manages what framework is being used behind the scenes and the Backend Functional APIs, which provide framework-specific implementations of the Ivy functions. Likewise, classes such as ivy.Container or ivy.Array are also available, facilitating the use of structured data and array-like objects (learn more about them here!).

All of the functionalities in Ivy are exposed through the Ivy functional API and the Ivy stateful API. All functions in the Functional API are Framework Agnostic Functions, which means that we can use them like this:

import ivy
import jax.numpy as jnp
import tensorflow as tf
import numpy as np
import torch

def mse_loss(y, target):
    return ivy.mean((y - target)**2)

jax_mse   = mse_loss(jnp.ones((5,)), jnp.ones((5,)))
tf_mse    = mse_loss(tf.ones((5,)), tf.ones((5,)))
np_mse    = mse_loss(np.ones((5,)), np.ones((5,)))
torch_mse = mse_loss(torch.ones((5,)), torch.ones((5,)))

In the example above we show how Ivy's functions are compatible with tensors from different frameworks. This is the same for ALL Ivy functions. They can accept tensors from any framework and return the correct result.

The Ivy Stateful API, on the other hand, allows you to define trainable modules and layers, which you can use alone or as a part of any other framework code!

import ivy


class Regressor(ivy.Module):
    def __init__(self, input_dim, output_dim):
        self.input_dim = input_dim
        self.output_dim = output_dim
        super().__init__()

    def _build(self, *args, **kwargs):
        self.linear0 = ivy.Linear(self.input_dim, 128)
        self.linear1 = ivy.Linear(128, self.output_dim)

    def _forward(self, x):
        x = self.linear0(x)
        x = ivy.functional.relu(x)
        x = self.linear1(x)
        return x

If we put it all together, we'll have something like this. This example uses PyTorch as the backend, but this can easily be changed to your favorite frameworks, such as TensorFlow, or JAX.

import ivy


class Regressor(ivy.Module):
    def __init__(self, input_dim, output_dim):
        self.input_dim = input_dim
        self.output_dim = output_dim
        super().__init__()

    def _build(self, *args, **kwargs):
        self.linear0 = ivy.Linear(self.input_dim, 128)
        self.linear1 = ivy.Linear(128, self.output_dim)

    def _forward(self, x):
        x = self.linear0(x)
        x = ivy.functional.relu(x)
        x = self.linear1(x)
        return x

ivy.set_backend('torch')  # set backend to PyTorch (or any other backend!)

model = Regressor(input_dim=1, output_dim=1)
optimizer = ivy.Adam(0.3)

n_training_examples = 2000
noise = ivy.random.random_normal(shape=(n_training_examples, 1), mean=0, std=0.1)
x = ivy.linspace(-6, 3, n_training_examples).reshape((n_training_examples, 1))
y = 0.2 * x ** 2 + 0.5 * x + 0.1 + noise


def loss_fn(v, x, target):
    pred = model(x, v=v)
    return ivy.mean((pred - target) ** 2)

for epoch in range(40):
    # forward pass
    pred = model(x)

    # compute loss and gradients
    loss, grads = ivy.execute_with_gradients(lambda params: loss_fn(*params), (model.v, x, y))

    # update parameters
    model.v = optimizer.step(model.v, grads)

    # print current loss
    print(f'Epoch: {epoch + 1:2d} --- Loss: {ivy.to_numpy(loss).item():.5f}')

print('Finished training!')

The model's output can be visualized as follows:

As always, you can find more information about Ivy as a framework in the docs!

Documentation

You can find Ivy's documentation on the Docs page, which includes:

Motivation: This contextualizes the problem Ivy is trying to solve by going over
- The current ML Explosion.
- Explaining why it is important to solve this problem.
- Explaining how we adhere to existing standards to make this happen.
Related Work: Which paints a picture of the role Ivy plays in the ML stack, comparing it to other existing solutions in terms of functionalities and abstraction level.
Design: A user-focused guide about the design decision behind the architecture and the main building blocks of Ivy.
Deep Dive: Which delves deeper into the implementation details of Ivy and is oriented towards potential contributors to the code base.

Contributing

We believe that everyone can contribute and make a difference. Whether it's writing code 💻, fixing bugs 🐛, or simply sharing feedback 💬, your contributions are definitely welcome and appreciated 🙌

Check out all of our Open Tasks, and find out more info in our Contributing guide in the docs!

Join our amazing community as a contributor, and help accelerate our journey to unify all ML frameworks!

Community

In order to achieve the ambitious goal of unifying AI, we definitely need as many hands as possible on it! Whether you are a seasoned developer or just starting out, you'll find a place here! Join the Ivy community on our Discord 👾 server, which is the perfect place to ask questions, share ideas, and get help from both fellow developers and the Ivy Team directly!

Also! Feel free to follow us on Twitter 🐦 as well, we use it to share updates, sneak peeks, and all sorts of relevant news, certainly a great way to stay in the loop 😄

Can't wait to see you there!

Citation

If you use Ivy for your work, please don't forget to give proper credit by including the accompanying paper 📄 in your references. It's a small way to show appreciation and help to continue to support this and other open source projects 🙌

@article{lenton2021ivy,
  title={Ivy: Templated deep learning for inter-framework portability},
  author={Lenton, Daniel and Pardo, Fabio and Falck, Fabian and James, Stephen and Clark, Ronald},
  journal={arXiv preprint arXiv:2102.02886},
  year={2021}
}

For Tasks:

Click tags to check more tools for each tasks

convert code between frameworks write framework-agnostic code build and train models define trainable modules integrate code from different frameworks

For Jobs:

machine learning engineer data scientist ai researcher software developer data analyst

Alternative AI tools for ivy

Similar Open Source Tools

ivy

github

: 14.0k

ivy

Ivy is an open-source machine learning framework that enables you to: * 🔄 **Convert code into any framework** : Use and build on top of any model, library, or device by converting any code from one framework to another using `ivy.transpile`. * ⚒️ **Write framework-agnostic code** : Write your code once in `ivy` and then choose the most appropriate ML framework as the backend to leverage all the benefits and tools. Join our growing community 🌍 to connect with people using Ivy. **Let's** unify.ai **together 🦾**

github

: 14.0k

zeta

Zeta is a tool designed to build state-of-the-art AI models faster by providing modular, high-performance, and scalable building blocks. It addresses the common issues faced while working with neural nets, such as chaotic codebases, lack of modularity, and low performance modules. Zeta emphasizes usability, modularity, and performance, and is currently used in hundreds of models across various GitHub repositories. It enables users to prototype, train, optimize, and deploy the latest SOTA neural nets into production. The tool offers various modules like FlashAttention, SwiGLUStacked, RelativePositionBias, FeedForward, BitLinear, PalmE, Unet, VisionEmbeddings, niva, FusedDenseGELUDense, FusedDropoutLayerNorm, MambaBlock, Film, hyper_optimize, DPO, and ZetaCloud for different tasks in AI model development.

github

: 365

rl

TorchRL is an open-source Reinforcement Learning (RL) library for PyTorch. It provides pytorch and **python-first** , low and high level abstractions for RL that are intended to be **efficient** , **modular** , **documented** and properly **tested**. The code is aimed at supporting research in RL. Most of it is written in python in a highly modular way, such that researchers can easily swap components, transform them or write new ones with little effort.

github

: 2.6k

wandb

Weights & Biases (W&B) is a platform that helps users build better machine learning models faster by tracking and visualizing all components of the machine learning pipeline, from datasets to production models. It offers tools for tracking, debugging, evaluating, and monitoring machine learning applications. W&B provides integrations with popular frameworks like PyTorch, TensorFlow/Keras, Hugging Face Transformers, PyTorch Lightning, XGBoost, and Sci-Kit Learn. Users can easily log metrics, visualize performance, and compare experiments using W&B. The platform also supports hosting options in the cloud or on private infrastructure, making it versatile for various deployment needs.

github

: 9.7k

ChatRex

ChatRex is a Multimodal Large Language Model (MLLM) designed to seamlessly integrate fine-grained object perception and robust language understanding. By adopting a decoupled architecture with a retrieval-based approach for object detection and leveraging high-resolution visual inputs, ChatRex addresses key challenges in perception tasks. It is powered by the Rexverse-2M dataset with diverse image-region-text annotations. ChatRex can be applied to various scenarios requiring fine-grained perception, such as object detection, grounded conversation, grounded image captioning, and region understanding.

github

: 124

simple-openai

Simple-OpenAI is a Java library that provides a simple way to interact with the OpenAI API. It offers consistent interfaces for various OpenAI services like Audio, Chat Completion, Image Generation, and more. The library uses CleverClient for HTTP communication, Jackson for JSON parsing, and Lombok to reduce boilerplate code. It supports asynchronous requests and provides methods for synchronous calls as well. Users can easily create objects to communicate with the OpenAI API and perform tasks like text-to-speech, transcription, image generation, and chat completions.

github

: 289

continuous-eval

Open-Source Evaluation for LLM Applications. `continuous-eval` is an open-source package created for granular and holistic evaluation of GenAI application pipelines. It offers modularized evaluation, a comprehensive metric library covering various LLM use cases, the ability to leverage user feedback in evaluation, and synthetic dataset generation for testing pipelines. Users can define their own metrics by extending the Metric class. The tool allows running evaluation on a pipeline defined with modules and corresponding metrics. Additionally, it provides synthetic data generation capabilities to create user interaction data for evaluation or training purposes.

github

: 461

pywhyllm

github

: 121

Jlama

Jlama is a modern Java inference engine designed for large language models. It supports various model types such as Gemma, Llama, Mistral, GPT-2, BERT, and more. The tool implements features like Flash Attention, Mixture of Experts, and supports different model quantization formats. Built with Java 21 and utilizing the new Vector API for faster inference, Jlama allows users to add LLM inference directly to their Java applications. The tool includes a CLI for running models, a simple UI for chatting with LLMs, and examples for different model types.

github

: 987

microchain

Microchain is a function calling-based LLM agents tool with no bloat. It allows users to define LLM and templates, use various functions like Sum and Product, and create LLM agents for specific tasks. The tool provides a simple and efficient way to interact with OpenAI models and create conversational agents for various applications.

github

: 268

beyondllm

Beyond LLM offers an all-in-one toolkit for experimentation, evaluation, and deployment of Retrieval-Augmented Generation (RAG) systems. It simplifies the process with automated integration, customizable evaluation metrics, and support for various Large Language Models (LLMs) tailored to specific needs. The aim is to reduce LLM hallucination risks and enhance reliability.

github

: 254

mcpdotnet

mcpdotnet is a .NET implementation of the Model Context Protocol (MCP), facilitating connections and interactions between .NET applications and MCP clients and servers. It aims to provide a clean, specification-compliant implementation with support for various MCP capabilities and transport types. The library includes features such as async/await pattern, logging support, and compatibility with .NET 8.0 and later. Users can create clients to use tools from configured servers and also create servers to register tools and interact with clients. The project roadmap includes expanding documentation, increasing test coverage, adding samples, performance optimization, SSE server support, and authentication.

github

: 156

CodeTF

CodeTF is a Python transformer-based library for code large language models (Code LLMs) and code intelligence. It provides an interface for training and inferencing on tasks like code summarization, translation, and generation. The library offers utilities for code manipulation across various languages, including easy extraction of code attributes. Using tree-sitter as its core AST parser, CodeTF enables parsing of function names, comments, and variable names. It supports fast model serving, fine-tuning of LLMs, various code intelligence tasks, preprocessed datasets, model evaluation, pretrained and fine-tuned models, and utilities to manipulate source code. CodeTF aims to facilitate the integration of state-of-the-art Code LLMs into real-world applications, ensuring a user-friendly environment for code intelligence tasks.

github

: 1.5k

UniChat

UniChat is a pipeline tool for creating online and offline chat-bots in Unity. It leverages Unity.Sentis and text vector embedding technology to enable offline mode text content search based on vector databases. The tool includes a chain toolkit for embedding LLM and Agent in games, along with middleware components for Text to Speech, Speech to Text, and Sub-classifier functionalities. UniChat also offers a tool for invoking tools based on ReActAgent workflow, allowing users to create personalized chat scenarios and character cards. The tool provides a comprehensive solution for designing flexible conversations in games while maintaining developer's ideas.

github

: 62

RWKV-LM

RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode. So it's combining the best of RNN and transformer - **great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding** (using the final hidden state).

github

: 13.0k

For similar tasks

ivy

github

: 14.0k

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675