AI2BMD

AI2BMD

AI-powered ab initio biomolecular dynamics simulation

Stars: 155

Visit
 screenshot

AI2BMD is a program for efficiently simulating protein molecular dynamics with ab initio accuracy. The repository contains datasets, simulation programs, and public materials related to AI2BMD. It provides a Docker image for easy deployment and a standalone launcher program. Users can run simulations by downloading the launcher script and specifying simulation parameters. The repository also includes ready-to-use protein structures for testing. AI2BMD is designed for x86-64 GNU/Linux systems with recommended hardware specifications. The related research includes model architectures like ViSNet, Geoformer, and fine-grained force metrics for MLFF. Citation information and contact details for the AI2BMD Team are provided.

README:

AI2BMD: AI-powered ab initio biomolecular dynamics simulation

Contents

Overview

AI2BMD is a program for efficiently simulating protein molecular dynamics with ab initio accuracy. This repository contains datasets, simulation programs, and public materials related to AI2BMD.

AI2BMD Setup Guide

The source code of AI2BMD is hosted in this repository. To streamline the user experience, we package the source code and runtime libraries into a Docker image, and provide a Python launcher program to simplify the deployment process. To run the simulation program, you don't need to clone this repository. Simply download scripts/ai2bmd and launch it (Python >=3.7 is required).

wget 'https://raw.githubusercontent.com/microsoft/AI2BMD/main/scripts/ai2bmd'
chmod +x ai2bmd
# you may need to "sudo" the following line if the docker group is not configured for the user
./ai2bmd --prot-file path/to/target-protein.pdb --sim-steps nnn  ...
#        '-------- required argument ---------' '-- optional arguments --'
#
# Notable optional arguments:
#
# [Simulation directory mapping options]
#   --base-dir path/to/base-dir    A directory for running simulation (defaults to current directory)
#   --log-dir  path/to/log-dir     A directory for saving results (defaults to base-dir/Logs-protein-name)
#
# [Simulation parameter options]
#   --sim-steps nnn                Simulation steps
#   --temp-k nnn                   Simulation temperature in Kelvin
#   --timestep nnn                 TimeStep (fs) for simulation
#   --preeq-steps nnn              Pre-equilibration simulation steps for each constraint
#   --max-cyc nnn                  Maximum energy minimization cycles in preprocessing
#
# [Performance tweaks]
#   --device-strategy [strategy]   The compute device allocation strategy
#       small-molecule             Bonded/non-bonded/solvent computation share all GPUs, enable GPU oversubscription
#       large-molecule             No multiple models on the same GPU
#   --chunk-size nnn               When there's more than device_chunk elements (e.g. dipeptides) in a batch, split them into chunks
#                                  and feed them into GPUs sequentially. Reduces memory consumption
#
# [Additional launcher options]
#   --software-update              When specified, updates the program in the Docker image before running
#   --download-training-data       When specified, downloads the AI2BMD training data, and unpacks it in the working directory. 
#                                  Ignores all other options.
#   --gpus                         Specifies the GPU devices to passthrough to the program. Can be one of the following:
#                                  all:        Passthrough all available GPUs to the program.
#                                  none:       Disables GPU passthrough.
#                                  i[,j,k...]  Passthrough some GPUs. Example: --gpus 0,1

Running Simulation

The code repository contains several sample protein structures in the testcases directory. Here we use the Chignolin structure as an example:

# skip the following two lines if you've already set up the launcher
wget 'https://raw.githubusercontent.com/microsoft/AI2BMD/main/scripts/ai2bmd'
chmod +x ai2bmd
# download the Chignolin protein structure data file
wget 'https://raw.githubusercontent.com/microsoft/AI2BMD/main/testcases/chig.pdb'
# launch the program, with all simulation parameters set to default values
# you may need to "sudo" the following line if the docker group is not configured for the user
./ai2bmd --prot-file chig.pdb

The results will be placed in a new directory Logs-chig. The directory contains the simulation trajectory file:

  • chig-traj.traj: The full trajectory file in ASE binary format.

Datasets

Protein Unit Dataset

The protein unit dataset covers a wide range of conformations for dipeptides. It can be downloaded with the following commands:

# skip the following two lines if you've already set up the launcher
wget 'https://raw.githubusercontent.com/microsoft/AI2BMD/main/scripts/ai2bmd'
chmod +x ai2bmd
# you may need to "sudo" the following line if the docker group is not configured for the user
./ai2bmd --download-training-data

When it finishes, the current working directory will be populated by the numpy data files (*.npz).

AIMD-Chig Dataset

The whole comformation MD dataset for proteins calculated at Density Functional Theory (DFT) level. AIMD-Chig consists of 2M conformations of the 166-atom Chignolin and the corresponding potential energy and atomic forces calculated at M06-2X/6-31g* level.

System Requirements

Hardware Requirements

The AI2BMD program runs on x86-64 GNU/Linux systems. We recommend a machine with the following specs:

  • CPU: 8+ cores
  • Memory: 32+ GB
  • GPU: CUDA-enabled GPU with 8+ GB memory

The program has been tested on the following GPUs:

  • A100
  • V100
  • RTX A6000
  • Titan RTX

Software Requirements

The program has been tested on the following systems:

  • OS: Ubuntu 20.04, Docker: 27.1
  • OS: ArchLinux, Docker: 26.1

AI2BMD Related Research

Model Architectures

ViSNet

ViSNet (Vector-Scalar interactive graph neural Network) is an equivariant geometry-enhanced graph neural for molecules that significantly alleviates the dilemma between computational costs and the sufficient utilization of geometric information.

Geoformer

Geoformer (Geometric Transformer) is a novel geometric Transformer to effectively model molecular structures for various molecular property predictions. Geoformer introduces a novel positional encoding method, Interatomic Positional Encoding (IPE), to parameterize atomic environments in Transformer. By incorporating IPE, Geoformer captures valuable geometric information beyond pairwise distances within a Transformer-based architecture. Geoformer can be regarded as a Transformer variant of ViSNet.

Fine-grained force metrics for MLFF

Machine learning force fields (MLFFs) have gained popularity in recent years as a cost-effective alternative to ab initio molecular dynamics (MD) simulations. Despite their small errors on test sets, MLFFs inherently suffer from generalization and robustness issues during MD simulations.

To alleviate these issues, we propose the use of global force metrics and fine-grained metrics from elemental and conformational aspects to systematically measure MLFFs for every atom and conformation of molecules. Furthermore, the performance of MLFFs and the stability of MD simulations can be enhanced by employing the proposed force metrics during model training. This includes training MLFF models using these force metrics as loss functions, fine-tuning by reweighting samples in the original dataset, and continued training by incorporating additional unexplored data.

Stochastic lag time parameterization for Markov State Model

Markov state models (MSMs) play a key role in studying protein conformational dynamics. A sliding count window with a fixed lag time is commonly used to sample sub-trajectories for transition counting and MSM construction. However, sub-trajectories sampled with a fixed lag time may not perform well under different selections of lag time, requiring strong prior experience and resulting in less robust estimations.

To alleviate this, we propose a novel stochastic method based on a Poisson process to generate perturbative lag times for sub-trajectory sampling and use it to construct a Markov chain. Comprehensive evaluations on the double-well system, WW domain, BPTI, and RBD–ACE2 complex of SARS-CoV-2 reveal that our algorithm significantly increases the robustness and accuracy of the constructed MSM without disrupting its Markovian properties. Furthermore, the advantages of our algorithm are especially pronounced for slow dynamic modes in complex biological processes.

Citation

(#: co-first author; *: corresponding author)

Yusong Wang#, Tong Wang#*, Shaoning Li#, Xinheng He, Mingyu Li, Zun Wang, Nanning Zheng, Bin Shao*, Tie-Yan Liu, Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing, Nature Communications, 15.1 (2024): 313.

Yusong Wang#, Shaoning Li#, Tong Wang*, Bin Shao, Nanning Zheng, Tie-Yan Liu. Geometric Transformer with Interatomic Positional Encoding. NeurIPS 2023.

Zun Wang#, Hongfei Wu#, Lixin Sun, Xinheng He, Zhirong Liu, Bin Shao, Tong Wang*, Tie-Yan Liu. Improving machine learning force fields for molecular dynamics simulations with fine-grained force metrics, The Journal of Chemical Physics, Volume 159, Issue 3, Cover Story.

Tong Wang#*, Xinheng He#, Mingyu Li#, Bin Shao*, Tie-Yan Liu. AIMD-Chig: Exploring the conformational space of a 166-atom protein Chignolin with ab initio molecular dynamics, Scientific Data 10, 549 (2023).

Shiqi Gong#, Xinheng He#, Qi Meng, Zhiming Ma, Bin Shao*, Tong Wang*, Tie-Yan Liu. Stochastic Lag Time Parameterization for Markov State Models of Protein Dynamics, The Journal of Physical Chemistry B 2022 126 (46), Cover Story, 2022.

License

Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT license.

Disclaimer

AI2BMD is a research project. It is not an officially supported Microsoft product.

Contacts

Please contact AI2BMD Team for any questions or suggestions. The main team members include:

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for AI2BMD

Similar Open Source Tools

For similar tasks

For similar jobs