PDEBench

PDEBench: An Extensive Benchmark for Scientific Machine Learning

Stars: 793

Visit

PDEBench provides a diverse and comprehensive set of benchmarks for scientific machine learning, including challenging and realistic physical problems. The repository consists of code for generating datasets, uploading and downloading datasets, training and evaluating machine learning models as baselines. It features a wide range of PDEs, realistic and difficult problems, ready-to-use datasets with various conditions and parameters. PDEBench aims for extensibility and invites participation from the SciML community to improve and extend the benchmark.

README:

PDEBench

The code repository for the NeurIPS 2022 paper PDEBench: An Extensive Benchmark for Scientific Machine Learning

🎉 SimTech Best Paper Award 2023 🎊

PDEBench provides a diverse and comprehensive set of benchmarks for scientific machine learning, including challenging and realistic physical problems. This repository consists of the code used to generate the datasets, to upload and download the datasets from the data repository, as well as to train and evaluate different machine learning models as baselines. PDEBench features a much wider range of PDEs than existing benchmarks and includes realistic and difficult problems (both forward and inverse), larger ready-to-use datasets comprising various initial and boundary conditions, and PDE parameters. Moreover, PDEBench was created to make the source code extensible and we invite active participation from the SciML community to improve and extend the benchmark.

Created and maintained by Makoto Takamoto <[email protected], [email protected]>, Timothy Praditia <[email protected]>, Raphael Leiteritz, Dan MacKinlay, Francesco Alesiani, Dirk Pflüger, and Mathias Niepert.

Datasets and Pretrained Models

We also provide datasets and pretrained machine learning models.

PDEBench Datasets: https://darus.uni-stuttgart.de/dataset.xhtml?persistentId=doi:10.18419/darus-2986

PDEBench Pre-Trained Models: https://darus.uni-stuttgart.de/dataset.xhtml?persistentId=doi:10.18419/darus-2987

DOIs

Installation

Using pip

Locally:

pip install --upgrade pip wheel
pip install .

From PyPI:

pip install pdebench

To include dependencies for data generation:

pip install "pdebench[datagen310]"
pip install ".[datagen310]" # locally

pip install "pdebench[datagen39]"
pip install ".[datagen39]" # locally

GPU Support

For GPU support there are additional platform-specific instructions:

For PyTorch, the latest version we support is v1.13.1 see previous-versions/#linux - CUDA 11.7.

For JAX, which is approximately 6 times faster for simulations than PyTorch in our tests, see jax#pip-installation-gpu-cuda-installed-via-pip

Installation using conda:

If you like you can also install dependencies using anaconda, we suggest to use mambaforge as a distribution. Otherwise you may have to enable the conda-forge channel for the following commands.

Starting from a fresh environment:

conda create -n myenv python=3.9
conda activate myenv

Install dependencies for model training:

conda install deepxde hydra-core h5py -c conda-forge

According to your hardware availability, either install PyTorch with CUDA support:

see previous-versions/#linux - CUDA 11.7.

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia

or CPU only binaries.

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 cpuonly -c pytorch

Optional dependencies for data generation:

conda install clawpack jax jaxlib python-dotenv

Configuring DeepXDE

In our tests we used PyTorch as backend for DeepXDE. Please follow the documentation to enable this.

Data Generation

The data generation codes are contained in data_gen:

gen_diff_react.py to generate the 2D diffusion-reaction data.
gen_diff_sorp.py to generate the 1D diffusion-sorption data.
gen_radial_dam_break.py to generate the 2D shallow-water data.
gen_ns_incomp.py to generate the 2D incompressible inhomogeneous Navier-Stokes data.
plot.py to plot the generated data.
uploader.py to upload the generated data to the data repository.
.env is the environment data to store Dataverse URL and API token to upload the generated data. Note that the filename should be strictly .env (i.e. remove the example from the filename)
configs directory contains the yaml files storing the configuration for the simulation. Arguments for the simulation are problem-specific and detailed explanation can be found in the simulation scripts.
src directory contains the simulation scripts for different problems: sim_diff_react-py for 2D diffusion-reaction, sim_diff_sorp.py for 1D diffusion-sorption, and swe for the shallow-water equation.

Data Generation for 1D Advection/Burgers/Reaction-Diffusion/2D DarcyFlow/Compressible Navier-Stokes Equations

The data generation codes are contained in data_gen_NLE:

utils.py util file for data generation, mainly boundary conditions and initial conditions.
AdvectionEq directory with the source codes to generate 1D Advection equation training samples
BurgersEq directory with the source codes to generate 1D Burgers equation training samples
CompressibleFluid directory with the source codes to generate compressible Navier-Stokes equations training samples
- ReactionDiffusionEq directory with the source codes to generate 1D Reaction-Diffusion equation training samples (Note: DarcyFlow data can be generated by run_DarcyFlow2D.sh in this folder.)
save directory saving the generated training samples

A typical example to generate training samples (1D Advection Equation): (in data_gen/data_gen_NLE/AdvectionEq/)

python3 advection_multi_solution_Hydra.py +multi=beta1e0.yaml

which is assumed to be performed in each directory.

Examples for generating other PDEs are provided in run_trainset.sh in each PDE's directories. The config files for Hydra are stored in config directory in each PDE's directory.

Data Transformaion and Merge into HDF5 format

1D Advection/Burgers/Reaction-Diffusion/2D DarcyFlow/Compressible Navier-Stokes Equations save data as a numpy array. So, to read those data via our dataloaders, the data transformation/merge should be performed. This can be done using data_gen_NLE/Data_Merge.py whose config file is located at: data_gen/data_gen_NLE/config/config.yaml. After properly setting the parameters in the config file (type: name of PDEs, dim: number of spatial-dimension, bd: boundary condition), the corresponding HDF5 file could be obtained as:

python3 Data_Merge.py

Configuration

You can set the default values for data locations for this project by putting config vars like this in the .env file:

WORKING_DIR=~/Data/Working
ARCHIVE_DATA_DIR=~/Data/Archive

There is an example in example.env.

Data Download

The download scripts are provided in data_download. There are two options to download data.

Using download_direct.py (recommended)
- Retrieves data shards directly using URLs. Sample command for each PDE is given in the README file in the data_download directory.
Using download_easydataverse.py (might be slow and you could encounter errors/issues; hence, not recommended!)
- Use the config files from the config directory that contains the yaml files storing the configuration. Any files in the dataset matching args.filename will be downloaded into args.data_folder.

Baseline Models

In this work, we provide three different ML models to be trained and evaluated against the benchmark datasets, namely FNO, U-Net, and PINN. The codes for the baseline model implementations are contained in models:

train_models_forward.py is the main script to train and evaluate the model. It will call on model-specific script based on the input argument.
train_models_inverse.py is the main script to train and evaluate the model for inverse problems. It will call on model-specific script based on the input argument.
metrics.py is the script to evaluate the trained models based on various evaluation metrics described in our paper. Additionally, it also plots the prediction and target data.
analyse_result_forward.py is the script to convert the saved pickle file from the metrics calculation script into pandas dataframe format and save it as a CSV file. Additionally it also plots a bar chart to compare the results between different models.
analyse_result_inverse.py is the script to convert the saved pickle file from the metrics calculation script into pandas dataframe format and save it as a CSV file. This script is used for the inverse problems. Additionally it also plots a bar chart to compare the results between different models.
fno contains the scripts of FNO implementation. These are partly adapted from the FNO repository.
unet contains the scripts of U-Net implementation. These are partly adapted from the U-Net repository.
pinn contains the scripts of PINN implementation. These utilize the DeepXDE library.
inverse contains the model for inverse model based on gradient.
config contains the yaml files for the model training input. The default templates for different equations are provided in the args directory. User just needs to copy and paste them to the args keyword in the config.yaml file.

An example to run the forward model training can be found in run_forward_1D.sh, and an example to run the inverse model training can be found in run_inverse.sh.

Short explanations on the config args

model_name: string, containing the baseline model name, either 'FNO', 'Unet', or 'PINN'.
if_training: bool, set True for training, or False for evaluation.
continue_training: bool, set True to continue training from a checkpoint.
num_workers: int, number of workers for the PyTorch dataloader.
batch_size: int, training batch size.
initial_step: int, number of time steps used as input for FNO and U-Net.
t_train: int, number of the last time step used for training (for extrapolation testing, set this to be < Nt).
model_update: int, number of epochs to save model.
filename: str, has to match the dataset filename.
single_file: bool, set False for 2D diffusion-reaction, 1D diffusion-sorption, and the radial dam break scenarios, and set True otherwise.
reduced_resolution: int, factor to downsample spatial resolution.
reduced_resolution_t: int, factor to downsample temporal resolution.
reduced_batch: int, factor to downsample sample size used for training.
epochs: int, total epochs used for training.
learning_rate: float, learning rate of the optimizer.
scheduler_step: int, number of epochs to update the learning rate scheduler.
scheduler_gamma: float, decay rate of the learning rate.

U-Net specific args:

in_channels: int, number of input channels
out_channels: int, number of output channels
ar_mode: bool, set True for fully autoregressive or pushforward training.
pushforward: bool, set True for pushforward training, False otherwise (ar_mode also has to be set True).
unroll_step: int, number of time steps to backpropagate in the pushforward training.

FNO specific args:

num_channels: int, number of channels (variables).
modes: int, number of Fourier modes to multiply.
width: int, number of channels for the Fourier layer.

INVERSE specific args:

base_path: string, location of the data directory
training_type: string, type of training, autoregressive, single
mcmc_num_samples: int, number of generated samples
mcmc_warmup_steps: 10
mcmc_num_chains: 1
num_samples_max: 1000
in_channels_hid: 64
inverse_model_type: string, type of inverse inference model, ProbRasterLatent, InitialConditionInterp
inverse_epochs: int, number of epochs for the gradient based method
inverse_learning_rate: float, learning rate for the gradient based method
inverse_verbose_flag: bool, some printing

Plotting specific args:

plot: bool, set True to activate plotting.
channel_plot: int, determines which channel/variable to plot.
x_min: float, left spatial domain.
x_max: float, right spatial domain.
y_min: float, lower spatial domain.
y_max: float, upper spatial domain.
t_min: float, start of temporal domain.
t_max: float, end of temporal domain.

Datasets and pretrained models

We provide the benchmark datasets we used in the paper through our DaRUS data repository. The data generation configuration can be found in the paper. Additionally, the pretrained models are also available to be downloaded from PDEBench Pretrained Models DaRus repository. To use the pretrained models, users can specify the argument continue_training: True in the config file.

Directory Tour

Below is an illustration of the directory structure of PDEBench.

📂 pdebench
|_📁 models
  |_📁 pinn    # Model: Physics-Informed Neural Network
    |_📄 train.py
    |_📄 utils.py
    |_📄 pde_definitions.py
  |_📁 fno     # Model: Fourier Neural Operator
    |_📄 train.py
    |_📄 utils.py
    |_📄 fno.py
  |_📁 unet    # Model: U-Net
    |_📄 train.py
    |_📄 utils.py
    |_📄 unet.py
  |_📁 inverse # Model: Gradient-Based Inverse Method
    |_📄 train.py
    |_📄 utils.py
    |_📄 inverse.py
  |_📁 config  # Config: All config files reside here
  |_📄 train_models_inverse.py
  |_📄 run_forward_1D.sh
  |_📄 analyse_result_inverse.py
  |_📄 train_models_forward.py
  |_📄 run_inverse.sh
  |_📄 metrics.py
  |_📄 analyse_result_forward.py
|_📁 data_download  # Data: Scripts to download data from DaRUS
  |_📁 config
  |_📄 download_direct.py
  |_📄 download_easydataverse.py
  |_📄 visualize_pdes.py
  |_📄 README.md
  |_📄 download_metadata.csv
|_📁 data_gen   # Data: Scripts to generate data
  |_📁 configs
  |_📁 data_gen_NLE
  |_📁 src
  |_📁 notebooks
  |_📄 gen_diff_sorp.py
  |_📄 plot.py
  |_📄 example.env
  |_📄 gen_ns_incomp.py
  |_📄 gen_diff_react.py
  |_📄 uploader.py
  |_📄 gen_radial_dam_break.py
|_📄 __init__.py

Publications & Citations

Please cite the following papers if you use PDEBench datasets and/or source code in your research.

PDEBench: An Extensive Benchmark for Scientific Machine Learning - NeurIPS'2022

@inproceedings{PDEBench2022,
author = {Takamoto, Makoto and Praditia, Timothy and Leiteritz, Raphael and MacKinlay, Dan and Alesiani, Francesco and Pflüger, Dirk and Niepert, Mathias},
title = {{PDEBench: An Extensive Benchmark for Scientific Machine Learning}},
year = {2022},
booktitle = {36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks},
url = {https://arxiv.org/abs/2210.07182}
}

PDEBench Datasets - NeurIPS'2022

@data{darus-2986_2022,
author = {Takamoto, Makoto and Praditia, Timothy and Leiteritz, Raphael and MacKinlay, Dan and Alesiani, Francesco and Pflüger, Dirk and Niepert, Mathias},
publisher = {DaRUS},
title = {{PDEBench Datasets}},
year = {2022},
doi = {10.18419/darus-2986},
url = {https://doi.org/10.18419/darus-2986}
}

Learning Neural PDE Solvers with Parameter-Guided Channel Attention - ICML'2023

@article{cape-takamoto:2023,
     author   = {Makoto Takamoto and
                 Francesco Alesiani and
                 Mathias Niepert},
 title        = {Learning Neural {PDE} Solvers with Parameter-Guided Channel Attention},
 journal      = {CoRR},
 volume       = {abs/2304.14118},
 year         = {2023},
 url          = {https://doi.org/10.48550/arXiv.2304.14118},
 doi          = {10.48550/arXiv.2304.14118},
 eprinttype    = {arXiv},
 eprint       = {2304.14118},
 }

Vectorized Conditional Neural Fields: A Framework for Solving Time-dependent Parametric Partial Differential Equations - ICLR-W'2024 & ICML'2024

@inproceedings{vcnef-vectorized-conditional-neural-fields-hagnberger:2024,
author = {Hagnberger, Jan and Kalimuthu, Marimuthu and Musekamp, Daniel and Niepert, Mathias},
title = {{Vectorized Conditional Neural Fields: A Framework for Solving Time-dependent Parametric Partial Differential Equations}},
year = {2024},
booktitle = {Proceedings of the 41st International Conference on Machine Learning (ICML 2024)}
}

Active Learning for Neural PDE Solvers - NeurIPS-W'2024

@article{active-learn-neuralpde-benchmark-musekamp:2024,
 author       = {Daniel Musekamp and
                 Marimuthu Kalimuthu and
                 David Holzm{\"{u}}ller and
                 Makoto Takamoto and
                 Mathias Niepert},
 title        = {Active Learning for Neural {PDE} Solvers},
 journal      = {CoRR},
 volume       = {abs/2408.01536},
 year         = {2024},
 url          = {https://doi.org/10.48550/arXiv.2408.01536},
 doi          = {10.48550/ARXIV.2408.01536},
 eprinttype    = {arXiv},
 eprint       = {2408.01536},
}

Code contributors

License

MIT licensed, except where otherwise stated. See LICENSE.txt file.

For Tasks:

Click tags to check more tools for each tasks

train models evaluate models generate datasets download data plot results

For Jobs:

data scientist machine learning engineer research scientist computational physicist ai researcher

Alternative AI tools for PDEBench

Similar Open Source Tools

PDEBench

github

: 793

Trace

Trace is a new AutoDiff-like tool for training AI systems end-to-end with general feedback. It generalizes the back-propagation algorithm by capturing and propagating an AI system's execution trace. Implemented as a PyTorch-like Python library, users can write Python code directly and use Trace primitives to optimize certain parts, similar to training neural networks.

github

: 500

llama_index

LlamaIndex is a data framework for building LLM applications. It provides tools for ingesting, structuring, and querying data, as well as integrating with LLMs and other tools. LlamaIndex is designed to be easy to use for both beginner and advanced users, and it provides a comprehensive set of features for building LLM applications.

github

: 40.7k

raid

RAID is the largest and most comprehensive dataset for evaluating AI-generated text detectors. It contains over 10 million documents spanning 11 LLMs, 11 genres, 4 decoding strategies, and 12 adversarial attacks. RAID is designed to be the go-to location for trustworthy third-party evaluation of popular detectors. The dataset covers diverse models, domains, sampling strategies, and attacks, making it a valuable resource for training detectors, evaluating generalization, protecting against adversaries, and comparing to state-of-the-art models from academia and industry.

github

: 55

wanda

Official PyTorch implementation of Wanda (Pruning by Weights and Activations), a simple and effective pruning approach for large language models. The pruning approach removes weights on a per-output basis, by the product of weight magnitudes and input activation norms. The repository provides support for various features such as LLaMA-2, ablation study on OBS weight update, zero-shot evaluation, and speedup evaluation. Users can replicate main results from the paper using provided bash commands. The tool aims to enhance the efficiency and performance of language models through structured and unstructured sparsity techniques.

github

: 560

LLM-Pruner

LLM-Pruner is a tool for structural pruning of large language models, allowing task-agnostic compression while retaining multi-task solving ability. It supports automatic structural pruning of various LLMs with minimal human effort. The tool is efficient, requiring only 3 minutes for pruning and 3 hours for post-training. Supported LLMs include Llama-3.1, Llama-3, Llama-2, LLaMA, BLOOM, Vicuna, and Baichuan. Updates include support for new LLMs like GQA and BLOOM, as well as fine-tuning results achieving high accuracy. The tool provides step-by-step instructions for pruning, post-training, and evaluation, along with a Gradio interface for text generation. Limitations include issues with generating repetitive or nonsensical tokens in compressed models and manual operations for certain models.

github

: 828

llmgraph

llmgraph is a tool that enables users to create knowledge graphs in GraphML, GEXF, and HTML formats by extracting world knowledge from large language models (LLMs) like ChatGPT. It supports various entity types and relationships, offers cache support for efficient graph growth, and provides insights into LLM costs. Users can customize the model used and interact with different LLM providers. The tool allows users to generate interactive graphs based on a specified entity type and Wikipedia link, making it a valuable resource for knowledge graph creation and exploration.

github

: 271

llm-analysis

llm-analysis is a tool designed for Latency and Memory Analysis of Transformer Models for Training and Inference. It automates the calculation of training or inference latency and memory usage for Large Language Models (LLMs) or Transformers based on specified model, GPU, data type, and parallelism configurations. The tool helps users to experiment with different setups theoretically, understand system performance, and optimize training/inference scenarios. It supports various parallelism schemes, communication methods, activation recomputation options, data types, and fine-tuning strategies. Users can integrate llm-analysis in their code using the `LLMAnalysis` class or use the provided entry point functions for command line interface. The tool provides lower-bound estimations of memory usage and latency, and aims to assist in achieving feasible and optimal setups for training or inference.

github

: 300

pgai

pgai simplifies the process of building search and Retrieval Augmented Generation (RAG) AI applications with PostgreSQL. It brings embedding and generation AI models closer to the database, allowing users to create embeddings, retrieve LLM chat completions, reason over data for classification, summarization, and data enrichment directly from within PostgreSQL in a SQL query. The tool requires an OpenAI API key and a PostgreSQL client to enable AI functionality in the database. Users can install pgai from source, run it in a pre-built Docker container, or enable it in a Timescale Cloud service. The tool provides functions to handle API keys using psql or Python, and offers various AI functionalities like tokenizing, detokenizing, embedding, chat completion, and content moderation.

github

: 4.6k

IntelliNode

IntelliNode is a javascript module that integrates cutting-edge AI models like ChatGPT, LLaMA, WaveNet, Gemini, and Stable diffusion into projects. It offers functions for generating text, speech, and images, as well as semantic search, multi-model evaluation, and chatbot capabilities. The module provides a wrapper layer for low-level model access, a controller layer for unified input handling, and a function layer for abstract functionality tailored to various use cases.

github

: 201

GOLEM

GOLEM is an open-source AI framework focused on optimization and learning of structured graph-based models using meta-heuristic methods. It emphasizes the potential of meta-heuristics in complex problem spaces where gradient-based methods are not suitable, and the importance of structured models in various problem domains. The framework offers features like structured model optimization, metaheuristic methods, multi-objective optimization, constrained optimization, extensibility, interpretability, and reproducibility. It can be applied to optimization problems represented as directed graphs with defined fitness functions. GOLEM has applications in areas like AutoML, Bayesian network structure search, differential equation discovery, geometric design, and neural architecture search. The project structure includes packages for core functionalities, adapters, graph representation, optimizers, genetic algorithms, utilities, serialization, visualization, examples, and testing. Contributions are welcome, and the project is supported by ITMO University's Research Center Strong Artificial Intelligence in Industry.

github

: 53

VMind

VMind is an open-source solution for intelligent visualization, providing an intelligent chart component based on LLM by VisActor. It allows users to create chart narrative works with natural language interaction, edit charts through dialogue, and export narratives as videos or GIFs. The tool is easy to use, scalable, supports various chart types, and offers one-click export functionality. Users can customize chart styles, specify themes, and aggregate data using LLM models. VMind aims to enhance efficiency in creating data visualization works through dialogue-based editing and natural language interaction.

github

: 263

lloco

LLoCO is a technique that learns documents offline through context compression and in-domain parameter-efficient finetuning using LoRA, which enables LLMs to handle long context efficiently.

github

: 60

aixt

Aixt is a programming framework for microcontrollers using a modern language syntax based on V, with components including the Aixt programming language, Aixt to C Transpiler, and Aixt API. It is designed to be modular, allowing easy incorporation of new devices and boards through a TOML configuration file. The Aixt to C Transpiler translates Aixt source code to C for specific microcontroller compilers. The Aixt language implements a subset of V with differences in variables, strings, arrays, default integers size, structs, functions, and preprocessor commands. The Aixt API provides functions for digital I/O, analog inputs, PWM outputs, and serial ports.

github

: 69

codellm-devkit

Codellm-devkit (CLDK) is a Python library that serves as a multilingual program analysis framework bridging traditional static analysis tools and Large Language Models (LLMs) specialized for code (CodeLLMs). It simplifies the process of analyzing codebases across multiple programming languages, enabling the extraction of meaningful insights and facilitating LLM-based code analysis. The library provides a unified interface for integrating outputs from various analysis tools and preparing them for effective use by CodeLLMs. Codellm-devkit aims to enable the development and experimentation of robust analysis pipelines that combine traditional program analysis tools and CodeLLMs, reducing friction in multi-language code analysis and ensuring compatibility across different tools and LLM platforms. It is designed to seamlessly integrate with popular analysis tools like WALA, Tree-sitter, LLVM, and CodeQL, acting as a crucial intermediary layer for efficient communication between these tools and CodeLLMs. The project is continuously evolving to include new tools and frameworks, maintaining its versatility for code analysis and LLM integration.

github

: 58

mountain-goap

Mountain GOAP is a generic C# GOAP (Goal Oriented Action Planning) library for creating AI agents in games. It favors composition over inheritance, supports multiple weighted goals, and uses A* pathfinding to plan paths through sequential actions. The library includes concepts like agents, goals, actions, sensors, permutation selectors, cost callbacks, state mutators, state checkers, and a logger. It also features event handling for agent planning and execution. The project structure includes examples, API documentation, and internal classes for planning and execution.

github

: 77

For similar tasks

numerapi

Numerapi is a Python client to the Numerai API that allows users to automatically download and upload data for the Numerai machine learning competition. It provides functionalities for downloading training data, uploading predictions, and accessing user, submission, and competitions information for both the main competition and Numerai Signals competition. Users can interact with the API using Python modules or command line interface. Tokens are required for certain actions like uploading predictions or staking, which can be obtained from Numer.ai account settings. The tool also supports features like checking new rounds, getting leaderboards, and managing stakes.

github

: 174

PDEBench

github

: 793

cookiecutter-data-science

Cookiecutter Data Science (CCDS) is a tool for setting up a data science project template that incorporates best practices. It provides a logical, reasonably standardized but flexible project structure for doing and sharing data science work. The tool helps users to easily start new data science projects with a well-organized directory structure, including folders for data, models, notebooks, reports, and more. By following the project template created by CCDS, users can streamline their data science workflow and ensure consistency across projects.

github

: 8.7k

LLaSA_training

LLaSA_training is a repository focused on training models for speech synthesis using a large amount of open-source speech data. The repository provides instructions for finetuning models and offers pre-trained models for multilingual speech synthesis. It includes tools for training, data downloading, and data processing using specialized tokenizers for text and speech sequences. The repository also supports direct usage on Hugging Face platform with specific codecs and collections.

github

: 453

labelbox-python

Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.

github

: 135

promptfoo

Promptfoo is a tool for testing and evaluating LLM output quality. With promptfoo, you can build reliable prompts, models, and RAGs with benchmarks specific to your use-case, speed up evaluations with caching, concurrency, and live reloading, score outputs automatically by defining metrics, use as a CLI, library, or in CI/CD, and use OpenAI, Anthropic, Azure, Google, HuggingFace, open-source models like Llama, or integrate custom API providers for any LLM API.

github

: 6.1k

vespa

Vespa is a platform that performs operations such as selecting a subset of data in a large corpus, evaluating machine-learned models over the selected data, organizing and aggregating it, and returning it, typically in less than 100 milliseconds, all while the data corpus is continuously changing. It has been in development for many years and is used on a number of large internet services and apps which serve hundreds of thousands of queries from Vespa per second.

github

: 6.1k

python-aiplatform

The Vertex AI SDK for Python is a library that provides a convenient way to use the Vertex AI API. It offers a high-level interface for creating and managing Vertex AI resources, such as datasets, models, and endpoints. The SDK also provides support for training and deploying custom models, as well as using AutoML models. With the Vertex AI SDK for Python, you can quickly and easily build and deploy machine learning models on Vertex AI.

github

: 701

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675