Generative_AI_For_Science

Generative_AI_For_Science

None

Stars: 58

Visit
 screenshot

Generative AI for Science is a comprehensive, hands-on guide for researchers, students, and practitioners who want to apply cutting-edge AI techniques to scientific discovery. The book bridges the gap between AI/ML expertise and domain science, providing practical implementations across chemistry, biology, physics, geoscience, and beyond. It covers key AI architectures like Transformers, Diffusion Models, VAEs, and GNNs, and teaches how to apply generative models to problems in climate science, drug discovery, genomics, materials science, and more. The book also emphasizes best practices around ethics, reproducibility, and deployment, helping readers develop the intuition to know when and how to apply AI to scientific research.

README:

🧬 Generative AI for Science

- A Hands-On Guide for Students and Researchers

Design molecules. Predict protein structures. Accelerate climate models. All with AI.

Generative AI for Science Book Cover

500+ pages β€’ 13 chapters β€’ 50+ runnable notebooks β€’ Zero setup required

Leanpub Try Now Get the Book on Amazon License

By Dr. J. Paul Liu
πŸ“’ Updated regularly based on student & reader feedback
Check the sample chapters


πŸ”₯ Why This Book?

The Revolution The Impact
πŸ§ͺ AI-designed drugs 80-90% Phase I success vs traditional 40-65%
🧬 AlphaFold protein prediction 2024 Nobel Prize in Chemistry
🌍 GenCast weather AI Outperforms best models in 97% of scenarios
⚑ Neural surrogates Simulations 1000x faster than traditional methods

This book teaches you HOW to start and build educational-similar systems yourself.


🎯 What Can You Build?

In 30 Minutes You'll Create Using
πŸ§ͺ Drug Discovery Design molecules with target properties GNNs + Diffusion
🧬 Protein Engineering Predict 3D structure from sequence ESMFold
🌍 Climate Science Fast weather/climate emulators Neural Surrogates
βš›οΈ Physics Simulation Solve PDEs with neural networks PINNs
πŸ“š Literature Mining Extract insights from papers RAG + LLMs

πŸ“– About This Book

Generative AI for Science is a comprehensive, hands-on guide for researchers, students, and practitioners who want to apply cutting-edge AI techniques to scientific discovery. This book bridges the gap between AI/ML expertise and domain science, providing practical implementations across chemistry, biology, physics, geoscience, and beyond.

"Generative AI does not replace the scientific methodβ€”it enhances it. It expands the space of hypotheses we can explore, sharpens experimental design, and reveals patterns hidden in complexity."

✨ What Makes This Book Different

Feature Description
πŸ”¬ Theory Meets Practice Every concept is paired with ready-to-run code
πŸ’» Interactive Learning All examples provided as Google Colab notebooksβ€”no installation required
πŸ§ͺ Real Scientific Problems Examples from authentic research across multiple domains
πŸ“Š Accessible Yet Rigorous Suitable for domain scientists exploring AI and ML experts entering scientific applications

πŸŽ“ Who Is This For?

You Are... You'll Get...
πŸ”¬ Domain Scientist AI skills to accelerate your research
πŸ’» ML Engineer Scientific applications for your expertise
πŸŽ“ Graduate Student Complete curriculum with hands-on projects
πŸ‘” Industry Practitioner Production-ready code and best practices

βœ… What You Will Learn

By the end of this book, you will:

  • βœ… Understand key AI architectures: Transformers, Diffusion Models, VAEs, and GNNs
  • βœ… Represent scientific data types effectively for AI models
  • βœ… Apply generative models to problems in climate science, drug discovery, genomics, materials science, and more
  • βœ… Follow best practices around ethics, reproducibility, and deployment
  • βœ… Stay current with emerging methods and future directions
  • βœ… Develop the intuition to know when and how to apply AI to scientific research

πŸ“š Table of Contents

Part I: Foundations

Chapter Title Topics
1 Generative AI: A New Frontier for Scientific Discovery AI revolution in science, core technologies, cross-cutting capabilities
2 Generative AI Fundamentals Transformers, LLMs, Diffusion Models, VAEs, GANs, attention mechanisms
3 Scientific Data & Workflows Data challenges, FAIR principles, data preparation, workflow automation
4 Text, Code & Knowledge Generation Literature synthesis, RAG, hypothesis generation, code generation, scientific writing

Part II: Core Techniques

Chapter Title Topics
5 Data-to-Data Models Missing data imputation, synthetic data with GANs, VAEs, Gaussian processes, time series
6 Physics-Informed AI and Simulation PINNs, neural surrogates, code optimization, automated testing

Part III: Domain Applications

Chapter Title Topics
7 Domain Applications Chemistry & Materials, Biology & Biomedicine, Physics & Engineering, Geoscience & Climate
πŸ“‚ Chapter 7 Detailed Breakdown (click to expand)

Part I: Chemistry & Materials Science

  • Molecular Graph Learning (GNNs)
  • Molecular Generation with Diffusion Models
  • Crystal Structure Prediction
  • Reaction Outcome Prediction with Transformers

Part II: Biology & Biomedicine

  • Protein Structure Prediction (ESMFold, AlphaFold2)
  • Protein Sequence Generation (ProteinMPNN, RFDiffusion)
  • Genomic Variant Analysis
  • Clinical Trial Optimization

Part III: Physics & Engineering

  • Particle Physics Applications
  • Quantum Systems
  • Materials Characterization

Part IV: Geoscience & Climate

  • Ocean Forecasting
  • Hurricane Prediction
  • Climate Modeling
  • Weather AI (GenCast, Aurora)

Part V: Cross-Cutting Applications

  • Transfer Learning
  • Multi-task Learning
  • Foundation Models

Part IV: Production & Best Practices

Chapter Title Topics
8 Fine-Tuning & Domain Adaptation LoRA, PEFT, domain-specific training, evaluation strategies
9 Multimodal Generative AI Vision-language models, graph-text models, multimodal fusion
10 Evaluation, Validation & Benchmarking Metrics, validation strategies, uncertainty quantification, robustness testing
11 Ethics & Responsible AI Reproducibility, bias & fairness, environmental impact, dual-use, data privacy
12 Deployment & MLOps Experiment tracking, data versioning, model lifecycle, continuous training
13 Future Directions & Conclusion Emerging architectures, foundation models, AI reasoning, open challenges

πŸš€ Quick Start

Prerequisites

βœ… Basic Python (functions, loops, data structures)
βœ… Undergraduate statistics (helpful but not required)
βœ… A web browser + curiosity
❌ No prior deep learning experience needed

Get Started in 3 Steps

  1. πŸ“– Get the 500-page book

    πŸ‘‰ https://leanpub.com/generativeaiforscience

    πŸ‘‰ https://www.amazon.com/dp/B0GGHD79VR

  2. πŸ“₯ Pick up a chapter

  • Read the chapter and open that chapter's Colab Notebook
  1. ▢️ Open any notebook in Google Colab
    • Click the "Open in Colab" badge in each notebook
    • Or upload directly to colab.research.google.com
    • GPU runtime recommended for deep learning examples

πŸ“‚ Repository Structure

Generative_AI_For_Science/
β”œβ”€β”€ πŸ“ Chapter01_Introduction/
β”‚   └── πŸ““ Ch01_AI_Scientific_Discovery.ipynb
β”œβ”€β”€ πŸ“ Chapter02_Fundamentals/
β”‚   β”œβ”€β”€ πŸ““ Ch02_Transformers.ipynb
β”‚   β”œβ”€β”€ πŸ““ Ch02_Diffusion_Models.ipynb
β”‚   └── πŸ““ Ch02_VAEs_GANs.ipynb
β”œβ”€β”€ πŸ“ Chapter03_Data_Workflows/
β”‚   └── πŸ““ Ch03_Scientific_Data.ipynb
β”œβ”€β”€ πŸ“ Chapter04_Text_Code_Knowledge/
β”‚   β”œβ”€β”€ πŸ““ Ch04_RAG_Literature.ipynb
β”‚   └── πŸ““ Ch04_Code_Generation.ipynb
β”œβ”€β”€ πŸ“ Chapter05_Data_to_Data/
β”‚   β”œβ”€β”€ πŸ““ Ch05_Autoencoders.ipynb
β”‚   β”œβ”€β”€ πŸ““ Ch05_GANs.ipynb
β”‚   β”œβ”€β”€ πŸ““ Ch05_VAEs.ipynb
β”‚   └── πŸ““ Ch05_Time_Series.ipynb
β”œβ”€β”€ πŸ“ Chapter06_Physics_Informed/
β”‚   β”œβ”€β”€ πŸ““ Ch06_PINNs.ipynb
β”‚   └── πŸ““ Ch06_Neural_Surrogates.ipynb
β”œβ”€β”€ πŸ“ Chapter07_Domain_Applications/
β”‚   β”œβ”€β”€ πŸ““ Ch07_Chemistry_GNNs.ipynb
β”‚   β”œβ”€β”€ πŸ““ Ch07_Molecular_Diffusion.ipynb
β”‚   β”œβ”€β”€ πŸ““ Ch07_Protein_Structure.ipynb
β”‚   β”œβ”€β”€ πŸ““ Ch07_Genomics.ipynb
β”‚   └── πŸ““ Ch07_Climate_AI.ipynb
β”œβ”€β”€ πŸ“ Chapter08_FineTuning/
β”‚   └── πŸ““ Ch08_LoRA_PEFT.ipynb
β”œβ”€β”€ πŸ“ Chapter09_Multimodal/
β”‚   └── πŸ““ Ch09_Vision_Language.ipynb
β”œβ”€β”€ πŸ“ Chapter10_Evaluation/
β”‚   └── πŸ““ Ch10_Metrics_Validation.ipynb
β”œβ”€β”€ πŸ“ Chapter11_Ethics/
β”‚   └── πŸ““ Ch11_Responsible_AI.ipynb
β”œβ”€β”€ πŸ“ Chapter12_Deployment/
β”‚   └── πŸ““ Ch12_MLOps.ipynb
β”œβ”€β”€ πŸ“ slides/
β”‚   └── πŸ“Š PowerPoint slides for each chapter
β”œβ”€β”€ πŸ“ assets/
β”‚   └── πŸ–ΌοΈ Figures and images
└── πŸ“„ README.md

πŸ’‘ How to Use This Book

Use Case Recommendation
πŸ“– As a course text Follow chapters sequentially for structured introduction
πŸ” As a reference Jump directly to sections relevant to your research domain
πŸ’» As a hands-on guide Open Colab notebooks alongside each chapter, run and modify code
πŸš€ As a research launchpad Use provided implementations as starting points for your projects

πŸ”¬ Featured Applications

πŸ§ͺ Chemistry & Materials

  • Molecular Property Prediction with Graph Neural Networks
  • Drug Design with Diffusion Models
  • Crystal Structure Prediction with AI
  • Reaction Prediction with Transformers

🧬 Biology & Biomedicine

  • Protein Structure Prediction (ESMFold, AlphaFold2)
  • Protein Design (ProteinMPNN, RFDiffusion)
  • Variant Effect Prediction for genomics
  • Clinical Trial Optimization

🌍 Geoscience & Climate

  • Weather Forecasting with GenCast
  • Ocean Dynamics modeling
  • Climate Projection with surrogates
  • Extreme Event Prediction

βš›οΈ Physics & Engineering

  • Physics-Informed Neural Networks (PINNs)
  • Neural Network Surrogates for simulations
  • Uncertainty Quantification

πŸ“Š Key Technologies Covered

Architecture Use Cases Scientific Applications
Transformers & LLMs Text, code, sequences Literature synthesis, protein sequences, code generation
Diffusion Models Structured outputs, images Molecular structures, protein folding, climate data
VAEs & GANs Latent space learning Synthetic data, anomaly detection, compression
Graph Neural Networks Molecular graphs Property prediction, reaction prediction
Physics-Informed NNs PDEs, conservation laws Fluid dynamics, heat transfer, wave propagation

πŸ› οΈ Installation (Optional Local Setup)

While all notebooks run in Google Colab, you can also set up locally:

# Create virtual environment
python -m venv genai-science
source genai-science/bin/activate  # Linux/Mac
# or: genai-science\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

Core Dependencies

torch>=2.0
transformers>=4.30
rdkit
numpy
pandas
matplotlib
scikit-learn

πŸ“ˆ Computational Requirements

Model Type GPU Memory Recommended Platform
Small models (GNNs, VAEs) < 4 GB Colab Free Tier
Medium models (Diffusion) 4-8 GB Colab Pro
Large models (LLMs, ESMFold) 16+ GB Colab Pro+, A100

🀝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Ways to Contribute

  • πŸ› Report bugs or issues
  • πŸ’‘ Suggest new examples or applications
  • πŸ“ Improve documentation
  • πŸ”§ Submit code improvements
  • 🌍 Translate content

πŸ“œ Citation

If you use this book or code in your research, please cite:

@book{liu2026generativeai,
  title     = {Generative AI for Science},
  author    = {Liu, J. Paul},
  year      = {2026},
  publisher = {Leanpub},
  url       = {https://leanpub.com/generativeaiforscience}
}

Or simply:

J. Paul Liu, 2026. Generative AI for Science. Leanpub, https://leanpub.com/generativeaiforscience


πŸ“¬ Contact & Community

Platform Link
πŸ“§ Email Contact through Leanpub
🐦 Twitter / X @jpliu168 β€” follow for updates
πŸ’Ό LinkedIn Paul Liu β€” connect for professional updates
πŸ’¬ Discussions Use GitHub Discussions for Q&A
πŸ› Issues Report bugs via GitHub Issues

πŸ™ Acknowledgements

This book was developed through:

  • Graduate courses at the Data Science and AI Academy
  • Bioinformatics Research Center workshops
  • Cross-campus AI for Research training programs
  • Research Triangle AI Society–LLM intensive bootcamps
  • Collaborations in oceanography, materials science, protein engineering, and literature mining

Special thanks to all students and colleagues who provided feedback and helped refine the material.


πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

The book content is Β© 2026 J. Paul Liu. Code examples are provided under MIT License for educational use.


⭐ Support This Project

If you find this resource helpful:

  • ⭐ Star this repository to help others discover it
  • 🐦 Share on Twitter/LinkedIn to spread the word
  • πŸ“– Get the book to support continued development
  • πŸ’¬ Leave feedback to help improve future editions

πŸš€ Ready to accelerate your scientific discovery with AI?

Get the Book

Get the Book on Amazon


"Combine human creativity with machine assistance, and new discoveries become possible."
β€” Dr. J. Paul Liu


Other related book:
How to Build and Fine-Tune a Small Language Model:
SLM Book Cover
https://leanpub.com/howtobuildandfine-tuneasmalllanguagemodel
https://www.amazon.com/dp/B0G3MYWTJK

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for Generative_AI_For_Science

Similar Open Source Tools

For similar tasks

For similar jobs