PDEBench
PDEBench: An Extensive Benchmark for Scientific Machine Learning
Stars: 793
PDEBench provides a diverse and comprehensive set of benchmarks for scientific machine learning, including challenging and realistic physical problems. The repository consists of code for generating datasets, uploading and downloading datasets, training and evaluating machine learning models as baselines. It features a wide range of PDEs, realistic and difficult problems, ready-to-use datasets with various conditions and parameters. PDEBench aims for extensibility and invites participation from the SciML community to improve and extend the benchmark.
README:
The code repository for the NeurIPS 2022 paper PDEBench: An Extensive Benchmark for Scientific Machine Learning
π SimTech Best Paper Award 2023 π
PDEBench provides a diverse and comprehensive set of benchmarks for scientific machine learning, including challenging and realistic physical problems. This repository consists of the code used to generate the datasets, to upload and download the datasets from the data repository, as well as to train and evaluate different machine learning models as baselines. PDEBench features a much wider range of PDEs than existing benchmarks and includes realistic and difficult problems (both forward and inverse), larger ready-to-use datasets comprising various initial and boundary conditions, and PDE parameters. Moreover, PDEBench was created to make the source code extensible and we invite active participation from the SciML community to improve and extend the benchmark.
Created and maintained by Makoto Takamoto
<[email protected], [email protected]>
, Timothy Praditia
<[email protected]>
, Raphael Leiteritz, Dan MacKinlay,
Francesco Alesiani, Dirk PflΓΌger, and Mathias Niepert.
We also provide datasets and pretrained machine learning models.
PDEBench Datasets: https://darus.uni-stuttgart.de/dataset.xhtml?persistentId=doi:10.18419/darus-2986
PDEBench Pre-Trained Models: https://darus.uni-stuttgart.de/dataset.xhtml?persistentId=doi:10.18419/darus-2987
DOIs
Locally:
pip install --upgrade pip wheel
pip install .
From PyPI:
pip install pdebench
To include dependencies for data generation:
pip install "pdebench[datagen310]"
pip install ".[datagen310]" # locally
or
pip install "pdebench[datagen39]"
pip install ".[datagen39]" # locally
For GPU support there are additional platform-specific instructions:
For PyTorch, the latest version we support is v1.13.1 see previous-versions/#linux - CUDA 11.7.
For JAX, which is approximately 6 times faster for simulations than PyTorch in our tests, see jax#pip-installation-gpu-cuda-installed-via-pip
If you like you can also install dependencies using anaconda, we suggest to use mambaforge as a distribution. Otherwise you may have to enable the conda-forge channel for the following commands.
Starting from a fresh environment:
conda create -n myenv python=3.9
conda activate myenv
Install dependencies for model training:
conda install deepxde hydra-core h5py -c conda-forge
According to your hardware availability, either install PyTorch with CUDA support:
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 cpuonly -c pytorch
Optional dependencies for data generation:
conda install clawpack jax jaxlib python-dotenv
In our tests we used PyTorch as backend for DeepXDE. Please follow the documentation to enable this.
The data generation codes are contained in data_gen:
-
gen_diff_react.py
to generate the 2D diffusion-reaction data. -
gen_diff_sorp.py
to generate the 1D diffusion-sorption data. -
gen_radial_dam_break.py
to generate the 2D shallow-water data. -
gen_ns_incomp.py
to generate the 2D incompressible inhomogeneous Navier-Stokes data. -
plot.py
to plot the generated data. -
uploader.py
to upload the generated data to the data repository. -
.env
is the environment data to store Dataverse URL and API token to upload the generated data. Note that the filename should be strictly.env
(i.e. remove theexample
from the filename) -
configs
directory contains the yaml files storing the configuration for the simulation. Arguments for the simulation are problem-specific and detailed explanation can be found in the simulation scripts. -
src
directory contains the simulation scripts for different problems:sim_diff_react-py
for 2D diffusion-reaction,sim_diff_sorp.py
for 1D diffusion-sorption, andswe
for the shallow-water equation.
Data Generation for 1D Advection/Burgers/Reaction-Diffusion/2D DarcyFlow/Compressible Navier-Stokes Equations
The data generation codes are contained in data_gen_NLE:
-
utils.py
util file for data generation, mainly boundary conditions and initial conditions. -
AdvectionEq
directory with the source codes to generate 1D Advection equation training samples -
BurgersEq
directory with the source codes to generate 1D Burgers equation training samples -
CompressibleFluid
directory with the source codes to generate compressible Navier-Stokes equations training samples-
ReactionDiffusionEq
directory with the source codes to generate 1D Reaction-Diffusion equation training samples (Note: DarcyFlow data can be generated by run_DarcyFlow2D.sh in this folder.)
-
-
save
directory saving the generated training samples
A typical example to generate training samples (1D Advection Equation): (in
data_gen/data_gen_NLE/AdvectionEq/
)
python3 advection_multi_solution_Hydra.py +multi=beta1e0.yaml
which is assumed to be performed in each directory.
Examples for generating other PDEs are provided in run_trainset.sh
in each
PDE's directories. The config files for Hydra are stored in config
directory
in each PDE's directory.
1D Advection/Burgers/Reaction-Diffusion/2D DarcyFlow/Compressible Navier-Stokes
Equations save data as a numpy array. So, to read those data via our
dataloaders, the data transformation/merge should be performed. This can be done
using data_gen_NLE/Data_Merge.py
whose config file is located at:
data_gen/data_gen_NLE/config/config.yaml
. After properly setting the
parameters in the config file (type: name of PDEs, dim: number of
spatial-dimension, bd: boundary condition), the corresponding HDF5 file could be
obtained as:
python3 Data_Merge.py
You can set the default values for data locations for this project by putting
config vars like this in the .env
file:
WORKING_DIR=~/Data/Working
ARCHIVE_DATA_DIR=~/Data/Archive
There is an example in example.env
.
The download scripts are provided in data_download. There are two options to download data.
- Using
download_direct.py
(recommended)- Retrieves data shards directly using URLs. Sample command for each PDE is given in the README file in the data_download directory.
- Using
download_easydataverse.py
(might be slow and you could encounter errors/issues; hence, not recommended!)- Use the config files from the
config
directory that contains the yaml files storing the configuration. Any files in the dataset matchingargs.filename
will be downloaded intoargs.data_folder
.
- Use the config files from the
In this work, we provide three different ML models to be trained and evaluated against the benchmark datasets, namely FNO, U-Net, and PINN. The codes for the baseline model implementations are contained in models:
-
train_models_forward.py
is the main script to train and evaluate the model. It will call on model-specific script based on the input argument. -
train_models_inverse.py
is the main script to train and evaluate the model for inverse problems. It will call on model-specific script based on the input argument. -
metrics.py
is the script to evaluate the trained models based on various evaluation metrics described in our paper. Additionally, it also plots the prediction and target data. -
analyse_result_forward.py
is the script to convert the saved pickle file from the metrics calculation script into pandas dataframe format and save it as a CSV file. Additionally it also plots a bar chart to compare the results between different models. -
analyse_result_inverse.py
is the script to convert the saved pickle file from the metrics calculation script into pandas dataframe format and save it as a CSV file. This script is used for the inverse problems. Additionally it also plots a bar chart to compare the results between different models. -
fno
contains the scripts of FNO implementation. These are partly adapted from the FNO repository. -
unet
contains the scripts of U-Net implementation. These are partly adapted from the U-Net repository. -
pinn
contains the scripts of PINN implementation. These utilize the DeepXDE library. -
inverse
contains the model for inverse model based on gradient. -
config
contains the yaml files for the model training input. The default templates for different equations are provided in the args directory. User just needs to copy and paste them to the args keyword in the config.yaml file.
An example to run the forward model training can be found in run_forward_1D.sh, and an example to run the inverse model training can be found in run_inverse.sh.
- model_name: string, containing the baseline model name, either 'FNO', 'Unet', or 'PINN'.
- if_training: bool, set True for training, or False for evaluation.
- continue_training: bool, set True to continue training from a checkpoint.
- num_workers: int, number of workers for the PyTorch dataloader.
- batch_size: int, training batch size.
- initial_step: int, number of time steps used as input for FNO and U-Net.
- t_train: int, number of the last time step used for training (for extrapolation testing, set this to be < Nt).
- model_update: int, number of epochs to save model.
- filename: str, has to match the dataset filename.
- single_file: bool, set False for 2D diffusion-reaction, 1D diffusion-sorption, and the radial dam break scenarios, and set True otherwise.
- reduced_resolution: int, factor to downsample spatial resolution.
- reduced_resolution_t: int, factor to downsample temporal resolution.
- reduced_batch: int, factor to downsample sample size used for training.
- epochs: int, total epochs used for training.
- learning_rate: float, learning rate of the optimizer.
- scheduler_step: int, number of epochs to update the learning rate scheduler.
- scheduler_gamma: float, decay rate of the learning rate.
- in_channels: int, number of input channels
- out_channels: int, number of output channels
- ar_mode: bool, set True for fully autoregressive or pushforward training.
- pushforward: bool, set True for pushforward training, False otherwise (ar_mode also has to be set True).
- unroll_step: int, number of time steps to backpropagate in the pushforward training.
- num_channels: int, number of channels (variables).
- modes: int, number of Fourier modes to multiply.
- width: int, number of channels for the Fourier layer.
- base_path: string, location of the data directory
- training_type: string, type of training, autoregressive, single
- mcmc_num_samples: int, number of generated samples
- mcmc_warmup_steps: 10
- mcmc_num_chains: 1
- num_samples_max: 1000
- in_channels_hid: 64
- inverse_model_type: string, type of inverse inference model, ProbRasterLatent, InitialConditionInterp
- inverse_epochs: int, number of epochs for the gradient based method
- inverse_learning_rate: float, learning rate for the gradient based method
- inverse_verbose_flag: bool, some printing
- plot: bool, set True to activate plotting.
- channel_plot: int, determines which channel/variable to plot.
- x_min: float, left spatial domain.
- x_max: float, right spatial domain.
- y_min: float, lower spatial domain.
- y_max: float, upper spatial domain.
- t_min: float, start of temporal domain.
- t_max: float, end of temporal domain.
We provide the benchmark datasets we used in the paper through our
DaRUS data repository.
The data generation configuration can be found in the paper. Additionally, the
pretrained models are also available to be downloaded from
PDEBench Pretrained Models
DaRus repository. To use the pretrained models, users can specify the argument
continue_training: True
in the
config file.
Below is an illustration of the directory structure of PDEBench.
π pdebench
|_π models
|_π pinn # Model: Physics-Informed Neural Network
|_π train.py
|_π utils.py
|_π pde_definitions.py
|_π fno # Model: Fourier Neural Operator
|_π train.py
|_π utils.py
|_π fno.py
|_π unet # Model: U-Net
|_π train.py
|_π utils.py
|_π unet.py
|_π inverse # Model: Gradient-Based Inverse Method
|_π train.py
|_π utils.py
|_π inverse.py
|_π config # Config: All config files reside here
|_π train_models_inverse.py
|_π run_forward_1D.sh
|_π analyse_result_inverse.py
|_π train_models_forward.py
|_π run_inverse.sh
|_π metrics.py
|_π analyse_result_forward.py
|_π data_download # Data: Scripts to download data from DaRUS
|_π config
|_π download_direct.py
|_π download_easydataverse.py
|_π visualize_pdes.py
|_π README.md
|_π download_metadata.csv
|_π data_gen # Data: Scripts to generate data
|_π configs
|_π data_gen_NLE
|_π src
|_π notebooks
|_π gen_diff_sorp.py
|_π plot.py
|_π example.env
|_π gen_ns_incomp.py
|_π gen_diff_react.py
|_π uploader.py
|_π gen_radial_dam_break.py
|_π __init__.py
Please cite the following papers if you use PDEBench datasets and/or source code in your research.
PDEBench: An Extensive Benchmark for Scientific Machine Learning - NeurIPS'2022
@inproceedings{PDEBench2022,
author = {Takamoto, Makoto and Praditia, Timothy and Leiteritz, Raphael and MacKinlay, Dan and Alesiani, Francesco and PflΓΌger, Dirk and Niepert, Mathias},
title = {{PDEBench: An Extensive Benchmark for Scientific Machine Learning}},
year = {2022},
booktitle = {36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks},
url = {https://arxiv.org/abs/2210.07182}
}
PDEBench Datasets - NeurIPS'2022
@data{darus-2986_2022,
author = {Takamoto, Makoto and Praditia, Timothy and Leiteritz, Raphael and MacKinlay, Dan and Alesiani, Francesco and PflΓΌger, Dirk and Niepert, Mathias},
publisher = {DaRUS},
title = {{PDEBench Datasets}},
year = {2022},
doi = {10.18419/darus-2986},
url = {https://doi.org/10.18419/darus-2986}
}
Learning Neural PDE Solvers with Parameter-Guided Channel Attention - ICML'2023
@article{cape-takamoto:2023,
author = {Makoto Takamoto and
Francesco Alesiani and
Mathias Niepert},
title = {Learning Neural {PDE} Solvers with Parameter-Guided Channel Attention},
journal = {CoRR},
volume = {abs/2304.14118},
year = {2023},
url = {https://doi.org/10.48550/arXiv.2304.14118},
doi = {10.48550/arXiv.2304.14118},
eprinttype = {arXiv},
eprint = {2304.14118},
}
Vectorized Conditional Neural Fields: A Framework for Solving Time-dependent Parametric Partial Differential Equations - ICLR-W'2024 & ICML'2024
@inproceedings{vcnef-vectorized-conditional-neural-fields-hagnberger:2024,
author = {Hagnberger, Jan and Kalimuthu, Marimuthu and Musekamp, Daniel and Niepert, Mathias},
title = {{Vectorized Conditional Neural Fields: A Framework for Solving Time-dependent Parametric Partial Differential Equations}},
year = {2024},
booktitle = {Proceedings of the 41st International Conference on Machine Learning (ICML 2024)}
}
Active Learning for Neural PDE Solvers - NeurIPS-W'2024
@article{active-learn-neuralpde-benchmark-musekamp:2024,
author = {Daniel Musekamp and
Marimuthu Kalimuthu and
David Holzm{\"{u}}ller and
Makoto Takamoto and
Mathias Niepert},
title = {Active Learning for Neural {PDE} Solvers},
journal = {CoRR},
volume = {abs/2408.01536},
year = {2024},
url = {https://doi.org/10.48550/arXiv.2408.01536},
doi = {10.48550/ARXIV.2408.01536},
eprinttype = {arXiv},
eprint = {2408.01536},
}
- Makato Takamoto (NEC laboratories Europe)
- Timothy Praditia (Stuttgart Center for Simulation Science | University of Stuttgart)
- Raphael Leiteritz (Stuttgart Center for Simulation Science | University of Stuttgart)
- Francesco Alesiani (NEC laboratories Europe)
- Dan MacKinlay (CSIROβs Data61)
- Marimuthu Kalimuthu (Stuttgart Center for Simulation Science | University of Stuttgart)
- John Kim (ANU TechLauncher/CSIROβs Data61)
- Gefei Shan (ANU TechLauncher/CSIROβs Data61)
- Yizhou Yang (ANU TechLauncher/CSIROβs Data61)
- Ran Zhang (ANU TechLauncher/CSIROβs Data61)
- Simon Brown (ANU TechLauncher/CSIROβs Data61)
MIT licensed, except where otherwise stated. See LICENSE.txt
file.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for PDEBench
Similar Open Source Tools
PDEBench
PDEBench provides a diverse and comprehensive set of benchmarks for scientific machine learning, including challenging and realistic physical problems. The repository consists of code for generating datasets, uploading and downloading datasets, training and evaluating machine learning models as baselines. It features a wide range of PDEs, realistic and difficult problems, ready-to-use datasets with various conditions and parameters. PDEBench aims for extensibility and invites participation from the SciML community to improve and extend the benchmark.
ShortcutsBench
ShortcutsBench is a project focused on collecting and analyzing workflows created in the Shortcuts app, providing a dataset of shortcut metadata, source files, and API information. It aims to study the integration of large language models with Apple devices, particularly focusing on the role of shortcuts in enhancing user experience. The project offers insights for Shortcuts users, enthusiasts, and researchers to explore, customize workflows, and study automated workflows, low-code programming, and API-based agents.
llama_index
LlamaIndex is a data framework for building LLM applications. It provides tools for ingesting, structuring, and querying data, as well as integrating with LLMs and other tools. LlamaIndex is designed to be easy to use for both beginner and advanced users, and it provides a comprehensive set of features for building LLM applications.
rtdl-num-embeddings
This repository provides the official implementation of the paper 'On Embeddings for Numerical Features in Tabular Deep Learning'. It focuses on transforming scalar continuous features into vectors before integrating them into the main backbone of tabular neural networks, showcasing improved performance. The embeddings for continuous features are shown to enhance the performance of tabular DL models and are applicable to various conventional backbones, offering efficiency comparable to Transformer-based models. The repository includes Python packages for practical usage, exploration of metrics and hyperparameters, and reproducing reported results for different algorithms and datasets.
wanda
Official PyTorch implementation of Wanda (Pruning by Weights and Activations), a simple and effective pruning approach for large language models. The pruning approach removes weights on a per-output basis, by the product of weight magnitudes and input activation norms. The repository provides support for various features such as LLaMA-2, ablation study on OBS weight update, zero-shot evaluation, and speedup evaluation. Users can replicate main results from the paper using provided bash commands. The tool aims to enhance the efficiency and performance of language models through structured and unstructured sparsity techniques.
upgini
Upgini is an intelligent data search engine with a Python library that helps users find and add relevant features to their ML pipeline from various public, community, and premium external data sources. It automates the optimization of connected data sources by generating an optimal set of machine learning features using large language models, GraphNNs, and recurrent neural networks. The tool aims to simplify feature search and enrichment for external data to make it a standard approach in machine learning pipelines. It democratizes access to data sources for the data science community.
KnowAgent
KnowAgent is a tool designed for Knowledge-Augmented Planning for LLM-Based Agents. It involves creating an action knowledge base, converting action knowledge into text for model understanding, and a knowledgeable self-learning phase to continually improve the model's planning abilities. The tool aims to enhance agents' potential for application in complex situations by leveraging external reservoirs of information and iterative processes.
jina
Jina is a tool that allows users to build multimodal AI services and pipelines using cloud-native technologies. It provides a Pythonic experience for serving ML models and transitioning from local deployment to advanced orchestration frameworks like Docker-Compose, Kubernetes, or Jina AI Cloud. Users can build and serve models for any data type and deep learning framework, design high-performance services with easy scaling, serve LLM models while streaming their output, integrate with Docker containers via Executor Hub, and host on CPU/GPU using Jina AI Cloud. Jina also offers advanced orchestration and scaling capabilities, a smooth transition to the cloud, and easy scalability and concurrency features for applications. Users can deploy to their own cloud or system with Kubernetes and Docker Compose integration, and even deploy to JCloud for autoscaling and monitoring.
oasis
OASIS is a scalable, open-source social media simulator that integrates large language models with rule-based agents to realistically mimic the behavior of up to one million users on platforms like Twitter and Reddit. It facilitates the study of complex social phenomena such as information spread, group polarization, and herd behavior, offering a versatile tool for exploring diverse social dynamics and user interactions in digital environments. With features like scalability, dynamic environments, diverse action spaces, and integrated recommendation systems, OASIS provides a comprehensive platform for simulating social media interactions at a large scale.
LLM-Pruner
LLM-Pruner is a tool for structural pruning of large language models, allowing task-agnostic compression while retaining multi-task solving ability. It supports automatic structural pruning of various LLMs with minimal human effort. The tool is efficient, requiring only 3 minutes for pruning and 3 hours for post-training. Supported LLMs include Llama-3.1, Llama-3, Llama-2, LLaMA, BLOOM, Vicuna, and Baichuan. Updates include support for new LLMs like GQA and BLOOM, as well as fine-tuning results achieving high accuracy. The tool provides step-by-step instructions for pruning, post-training, and evaluation, along with a Gradio interface for text generation. Limitations include issues with generating repetitive or nonsensical tokens in compressed models and manual operations for certain models.
zipnn
ZipNN is a lossless and near-lossless compression library optimized for numbers/tensors in the Foundation Models environment. It automatically prepares data for compression based on its type, allowing users to focus on core tasks without worrying about compression complexities. The library delivers effective compression techniques for different data types and structures, achieving high compression ratios and rates. ZipNN supports various compression methods like ZSTD, lz4, and snappy, and provides ready-made scripts for file compression/decompression. Users can also manually import the package to compress and decompress data. The library offers advanced configuration options for customization and validation tests for different input and compression types.
llmgraph
llmgraph is a tool that enables users to create knowledge graphs in GraphML, GEXF, and HTML formats by extracting world knowledge from large language models (LLMs) like ChatGPT. It supports various entity types and relationships, offers cache support for efficient graph growth, and provides insights into LLM costs. Users can customize the model used and interact with different LLM providers. The tool allows users to generate interactive graphs based on a specified entity type and Wikipedia link, making it a valuable resource for knowledge graph creation and exploration.
llm-analysis
llm-analysis is a tool designed for Latency and Memory Analysis of Transformer Models for Training and Inference. It automates the calculation of training or inference latency and memory usage for Large Language Models (LLMs) or Transformers based on specified model, GPU, data type, and parallelism configurations. The tool helps users to experiment with different setups theoretically, understand system performance, and optimize training/inference scenarios. It supports various parallelism schemes, communication methods, activation recomputation options, data types, and fine-tuning strategies. Users can integrate llm-analysis in their code using the `LLMAnalysis` class or use the provided entry point functions for command line interface. The tool provides lower-bound estimations of memory usage and latency, and aims to assist in achieving feasible and optimal setups for training or inference.
generative-fusion-decoding
Generative Fusion Decoding (GFD) is a novel shallow fusion framework that integrates Large Language Models (LLMs) into multi-modal text recognition systems such as automatic speech recognition (ASR) and optical character recognition (OCR). GFD operates across mismatched token spaces of different models by mapping text token space to byte token space, enabling seamless fusion during the decoding process. It simplifies the complexity of aligning different model sample spaces, allows LLMs to correct errors in tandem with the recognition model, increases robustness in long-form speech recognition, and enables fusing recognition models deficient in Chinese text recognition with LLMs extensively trained on Chinese. GFD significantly improves performance in ASR and OCR tasks, offering a unified solution for leveraging existing pre-trained models through step-by-step fusion.
zml
ZML is a high-performance AI inference stack built for production, using Zig language, MLIR, and Bazel. It allows users to create exciting AI projects, run pre-packaged models like MNIST, TinyLlama, OpenLLama, and Meta Llama, and compile models for accelerator runtimes. Users can also run tests, explore examples, and contribute to the project. ZML is licensed under the Apache 2.0 license.
paxml
Pax is a framework to configure and run machine learning experiments on top of Jax.
For similar tasks
numerapi
Numerapi is a Python client to the Numerai API that allows users to automatically download and upload data for the Numerai machine learning competition. It provides functionalities for downloading training data, uploading predictions, and accessing user, submission, and competitions information for both the main competition and Numerai Signals competition. Users can interact with the API using Python modules or command line interface. Tokens are required for certain actions like uploading predictions or staking, which can be obtained from Numer.ai account settings. The tool also supports features like checking new rounds, getting leaderboards, and managing stakes.
PDEBench
PDEBench provides a diverse and comprehensive set of benchmarks for scientific machine learning, including challenging and realistic physical problems. The repository consists of code for generating datasets, uploading and downloading datasets, training and evaluating machine learning models as baselines. It features a wide range of PDEs, realistic and difficult problems, ready-to-use datasets with various conditions and parameters. PDEBench aims for extensibility and invites participation from the SciML community to improve and extend the benchmark.
labelbox-python
Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.
promptfoo
Promptfoo is a tool for testing and evaluating LLM output quality. With promptfoo, you can build reliable prompts, models, and RAGs with benchmarks specific to your use-case, speed up evaluations with caching, concurrency, and live reloading, score outputs automatically by defining metrics, use as a CLI, library, or in CI/CD, and use OpenAI, Anthropic, Azure, Google, HuggingFace, open-source models like Llama, or integrate custom API providers for any LLM API.
vespa
Vespa is a platform that performs operations such as selecting a subset of data in a large corpus, evaluating machine-learned models over the selected data, organizing and aggregating it, and returning it, typically in less than 100 milliseconds, all while the data corpus is continuously changing. It has been in development for many years and is used on a number of large internet services and apps which serve hundreds of thousands of queries from Vespa per second.
python-aiplatform
The Vertex AI SDK for Python is a library that provides a convenient way to use the Vertex AI API. It offers a high-level interface for creating and managing Vertex AI resources, such as datasets, models, and endpoints. The SDK also provides support for training and deploying custom models, as well as using AutoML models. With the Vertex AI SDK for Python, you can quickly and easily build and deploy machine learning models on Vertex AI.
ScandEval
ScandEval is a framework for evaluating pretrained language models on mono- or multilingual language tasks. It provides a unified interface for benchmarking models on a variety of tasks, including sentiment analysis, question answering, and machine translation. ScandEval is designed to be easy to use and extensible, making it a valuable tool for researchers and practitioners alike.
opencompass
OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features include: * Comprehensive support for models and datasets: Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 70+ datasets with about 400,000 questions, comprehensively evaluating the capabilities of the models in five dimensions. * Efficient distributed evaluation: One line command to implement task division and distributed evaluation, completing the full evaluation of billion-scale models in just a few hours. * Diversified evaluation paradigms: Support for zero-shot, few-shot, and chain-of-thought evaluations, combined with standard or dialogue-type prompt templates, to easily stimulate the maximum performance of various models. * Modular design with high extensibility: Want to add new models or datasets, customize an advanced task division strategy, or even support a new cluster management system? Everything about OpenCompass can be easily expanded! * Experiment management and reporting mechanism: Use config files to fully record each experiment, and support real-time reporting of results.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.