awesome-mlops
A curated list of references for MLOps
Stars: 11726
README:
An awesome list of references for MLOps - Machine Learning Operations 👉 ml-ops.org
Join us at the "Women+ in Data and AI Summer Festival". Featuring 100% female+ speakers. Everyone is welcome. 2024 September 27th at Berlin Radialsystem. Tickets 👉 women-in-data-ai.tech
Click to expand!
- Machine Learning Operations: You Design It, You Train It, You Run It!
- MLOps SIG Specification
- ML in Production
- Awesome production machine learning: State of MLOps Tools and Frameworks
- Udemy “Deployment of ML Models”
- Full Stack Deep Learning
- Engineering best practices for Machine Learning
- 🚀 Putting ML in Production
- Stanford MLSys Seminar Series
- IBM ML Operationalization Starter Kit
- Productize ML. A self-study guide for Developers and Product Managers building Machine Learning products.
- MLOps (Machine Learning Operations) Fundamentals on GCP
- ML full Stack preparation
- MLOps Guide: Theory and Implementation
- Practitioners guide to MLOps: A framework for continuous delivery and automation of machine learning.
- MLOps maturity assessment
Click to expand!
- MLOps Zoomcamp (free)
- Coursera's Machine Learning Engineering for Production (MLOps) Specialization
- Udacity Machine Learning DevOps Engineer
- Made with ML
- Udacity LLMOps: Building Real-World Applications With Large Language Models
Click to expand!
- “Machine Learning Engineering” by Andriy Burkov, 2020
- "ML Ops: Operationalizing Data Science" by David Sweenor, Steven Hillion, Dan Rope, Dev Kannabiran, Thomas Hill, Michael O'Connell
- "Building Machine Learning Powered Applications" by Emmanuel Ameisen
- "Building Machine Learning Pipelines" by Hannes Hapke, Catherine Nelson, 2020, O’Reilly
- "Managing Data Science" by Kirill Dubovikov
- "Accelerated DevOps with AI, ML & RPA: Non-Programmer's Guide to AIOPS & MLOPS" by Stephen Fleming
- "Evaluating Machine Learning Models" by Alice Zheng
- Agile AI. 2020. By Carlo Appugliese, Paco Nathan, William S. Roberts. O'Reilly Media, Inc.
- "Machine Learning Logistics". 2017. By T. Dunning et al. O'Reilly Media Inc.
- "Machine Learning Design Patterns" by Valliappa Lakshmanan, Sara Robinson, Michael Munn. O'Reilly 2020
- "Serving Machine Learning Models: A Guide to Architecture, Stream Processing Engines, and Frameworks" by Boris Lublinsky, O'Reilly Media, Inc. 2017
- "Kubeflow for Machine Learning" by Holden Karau, Trevor Grant, Ilan Filonenko, Richard Liu, Boris Lublinsky
- "Clean Machine Learning Code" by Moussa Taifi. Leanpub. 2020
- E-Book "Practical MLOps. How to Get Ready for Production Models"
- "Introducing MLOps" by Mark Treveil, et al. O'Reilly Media, Inc. 2020
- "Machine Learning for Data Streams with Practical Examples in MOA", Bifet, Albert and Gavald`a, Ricard and Holmes, Geoff and Pfahringer, Bernhard, MIT Press, 2018
- "Machine Learning Product Manual" by Laszlo Sragner, Chris Kelly
- "Data Science Bootstrap Notes" by Eric J. Ma
- "Data Teams" by Jesse Anderson, 2020
- "Data Science on AWS" by Chris Fregly, Antje Barth, 2021
- “Engineering MLOps” by Emmanuel Raj, 2021
- Machine Learning Engineering in Action
- Practical MLOps
- "Effective Data Science Infrastructure" by Ville Tuulos, 2021
- AI and Machine Learning for On-Device Development, 2021, By Laurence Moroney. O'Reilly
- Designing Machine Learning Systems ,2022 by Chip Huyen , O'Reilly
- Reliable Machine Learning. 2022. By Cathy Chen, Niall Richard Murphy, Kranti Parisa, D. Sculley, Todd Underwood. O'Reilly
- MLOps Lifecycle Toolkit. 2023. By Dayne Sorvisto. Apress
- Implementing MLOps in the Enterprise. 2023. By Yaron Haviv, Noah Gift. O'Reilly
Click to expand!
- Continuous Delivery for Machine Learning (by Thoughtworks)
- What is MLOps? NVIDIA Blog
- MLSpec: A project to standardize the intercomponent schemas for a multi-stage ML Pipeline.
- The 2021 State of Enterprise Machine Learning | State of Enterprise ML 2020: PDF and Interactive
- Organizing machine learning projects: project management guidelines.
- Rules for ML Project (Best practices)
- ML Pipeline Template
- Data Science Project Structure
- Reproducible ML
- ML project template facilitating both research and production phases.
- Machine learning requires a fundamentally different deployment approach. As organizations embrace machine learning, the need for new deployment tools and strategies grows.
- Introducting Flyte: A Cloud Native Machine Learning and Data Processing Platform
- Why is DevOps for Machine Learning so Different?
- Lessons learned turning machine learning models into real products and services – O’Reilly
- MLOps: Model management, deployment and monitoring with Azure Machine Learning
- Guide to File Formats for Machine Learning: Columnar, Training, Inferencing, and the Feature Store
- Architecting a Machine Learning Pipeline How to build scalable Machine Learning systems
- Why Machine Learning Models Degrade In Production
- Concept Drift and Model Decay in Machine Learning
- Machine Learning in Production: Why You Should Care About Data and Concept Drift
- Bringing ML to Production
- A Tour of End-to-End Machine Learning Platforms
- MLOps: Continuous delivery and automation pipelines in machine learning
- AI meets operations
- What would machine learning look like if you mixed in DevOps? Wonder no more, we lift the lid on MLOps
- Forbes: The Emergence Of ML Ops
- Cognilytica Report "ML Model Management and Operations 2020 (MLOps)"
- Introducing Cloud AI Platform Pipelines
- A Guide to Production Level Deep Learning
- The 5 Components Towards Building Production-Ready Machine Learning Systems
- Deep Learning in Production (references about deploying deep learning-based models in production)
- Machine Learning Experiment Tracking
- The Team Data Science Process (TDSP)
- MLOps Solutions (Azure based)
- Monitoring ML pipelines
- Deployment & Explainability of Machine Learning COVID-19 Solutions at Scale with Seldon Core and Alibi
- Demystifying AI Infrastructure
- Organizing machine learning projects: project management guidelines.
- The Checklist for Machine Learning Projects (from Aurélien Géron,"Hands-On Machine Learning with Scikit-Learn and TensorFlow")
- Data Project Checklist by Jeremy Howard
- MLOps: not as Boring as it Sounds
- 10 Steps to Making Machine Learning Operational. Cloudera White Paper
- MLOps is Not Enough. The Need for an End-to-End Data Science Lifecycle Process.
- Data Science Lifecycle Repository Template
- Template: code and pipeline definition for a machine learning project demonstrating how to automate an end to end ML/AI workflow.
- Nitpicking Machine Learning Technical Debt
- The Best Tools, Libraries, Frameworks and Methodologies that Machine Learning Teams Actually Use – Things We Learned from 41 ML Startups
- Software Engineering for AI/ML - An Annotated Bibliography
- Intelligent System. Machine Learning in Practice
- CMU 17-445/645: Software Engineering for AI-Enabled Systems (SE4AI)
- Machine Learning is Requirements Engineering
- Machine Learning Reproducibility Checklist
- Machine Learning Ops. A collection of resources on how to facilitate Machine Learning Ops with GitHub.
- Task Cheatsheet for Almost Every Machine Learning Project A checklist of tasks for building End-to-End ML projects
- Web services vs. streaming for real-time machine learning endpoints
- How PyTorch Lightning became the first ML framework to run continuous integration on TPUs
- The ultimate guide to building maintainable Machine Learning pipelines using DVC
- Continuous Machine Learning (CML) is CI/CD for Machine Learning Projects (DVC)
- What I learned from looking at 200 machine learning tools | Update: MLOps Tooling Landscape v2 (+84 new tools) - Dec '20
- Big Data & AI Landscape
- Deploying Machine Learning Models as Data, not Code — A better match?
- “Thou shalt always scale” — 10 commandments of MLOps
- Three Risks in Building Machine Learning Systems
- Blog about ML in production (by maiot.io)
- Back to the Machine Learning fundamentals: How to write code for Model deployment. Part 1, Part 2, Part 3
- MLOps: Machine Learning as an Engineering Discipline
- ML Engineering on Google Cloud Platform (hands-on labs and code samples)
- Deep Reinforcement Learning in Production. The use of Reinforcement Learning to Personalize User Experience at Zynga
- What is Data Observability?
- A Practical Guide to Maintaining Machine Learning in Production
- Continuous Machine Learning. Part 1, Part 2. Part 3 is coming soon.
- The Agile approach in data science explained by an ML expert
- Here is what you need to look for in a model server to build ML-powered services
- The problem with AI developer tools for enterprises (and what IKEA has to do with it)
- Streaming Machine Learning with Tiered Storage
- Best practices for performance and cost optimization for machine learning (Google Cloud)
- Lean Data and Machine Learning Operations
- A Brief Guide to Running ML Systems in Production Best Practices for Site Reliability Engineers
- AI engineering practices in the wild - SIG | Getting software right for a healthier digital world
- SE-ML | The 2020 State of Engineering Practices for Machine Learning
- Awesome Software Engineering for Machine Learning (GitHub repository)
- Sampling isn’t enough, profile your ML data instead
- Reproducibility in ML: why it matters and how to achieve it
- 12 Factors of reproducible Machine Learning in production
- MLOps: More Than Automation
- Lean Data Science
- Engineering Skills for Data Scientists
- DAGsHub Blog. Read about data science and machine learning workflows, MLOps, and open source data science
- Data Science Project Flow for Startups
- Data Science Engineering at Shopify
- Building state-of-the-art machine learning technology with efficient execution for the crypto economy
- Completing the Machine Learning Loop
- Deploying Machine Learning Models: A Checklist
- Global MLOps and ML tools landscape (by MLReef)
- Why all Data Science teams need to get serious about MLOps
- MLOps Values (by Bart Grasza)
- Machine Learning Systems Design (by Chip Huyen)
- Designing an ML system (Stanford | CS 329 | Chip Huyen)
- How COVID-19 Has Infected AI Models (about the data drift or model drift concept)
- Microkernel Architecture for Machine Learning Library. An Example of Microkernel Architecture with Python Metaclass
- Machine Learning in production: the Booking.com approach
- What I Learned From Attending TWIMLcon 2021 (by James Le)
- Designing ML Orchestration Systems for Startups. A case study in building a lightweight production-grade ML orchestration system
- Towards MLOps: Technical capabilities of a Machine Learning platform | Prosus AI Tech Blog
- Get started with MLOps A comprehensive MLOps tutorial with open source tools
- From DevOps to MLOPS: Integrate Machine Learning Models using Jenkins and Docker
- Example code for a basic ML Platform based on Pulumi, FastAPI, DVC, MLFlow and more
- Software Engineering for Machine Learning: Characterizing and Detecting Mismatch in Machine-Learning Systems
- TWIML Solutions Guide
- How Well Do You Leverage Machine Learning at Scale? Six Questions to Ask
- Getting started with MLOps: Selecting the right capabilities for your use case
- The Latest Work from the SEI: Artificial Intelligence, DevSecOps, and Security Incident Response
- MLOps: The Ultimate Guide. A handbook on MLOps and how to think about it
- Enterprise Readiness of Cloud MLOps
- Should I Train a Model for Each Customer or Use One Model for All of My Customers?
- MLOps-Basics (GitHub repo) by raviraja
- Another tool won’t fix your MLOps problems
- Best MLOps Tools: What to Look for and How to Evaluate Them (by NimbleBox.ai)
- MLOps vs. DevOps: A Detailed Comparison (by NimbleBox.ai)
- A Guide To Setting Up Your MLOps Team (by NimbleBox.ai)
- Open-source Workflow Management Tools: A Survey by Ploomber
- How to Compare ML Experiment Tracking Tools to Fit Your Data Science Workflow (by dagshub)
- 15 Best Tools for Tracking Machine Learning Experiments
Click to expand!
- Feature Stores for Machine Learning Medium Blog
- MLOps with a Feature Store
- Feature Stores for ML
- Hopsworks: Data-Intensive AI with a Feature Store
- Feast: An open-source Feature Store for Machine Learning
- What is a Feature Store?
- ML Feature Stores: A Casual Tour
- Comprehensive List of Feature Store Architectures for Data Scientists and Big Data Professionals
- ML Engineer Guide: Feature Store vs Data Warehouse (vendor blog)
- Building a Gigascale ML Feature Store with Redis, Binary Serialization, String Hashing, and Compression (DoorDash blog)
- Feature Stores: Variety of benefits for Enterprise AI.
- Feature Store as a Foundation for Machine Learning
- ML Feature Serving Infrastructure at Lyft
- Feature Stores for Self-Service Machine Learning
- The Architecture Used at LinkedIn to Improve Feature Management in Machine Learning Models.
- Is There a Feature Store Over the Rainbow? How to select the right feature store for your use case
Click to expand!
- The state of data quality in 2020 – O’Reilly
- Why We Need DevOps for ML Data
- Data Preparation for Machine Learning (7-Day Mini-Course)
- Best practices in data cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting Your Data.
- 17 Strategies for Dealing with Data, Big Data, and Even Bigger Data
- DataOps Data Architecture
- Data Orchestration — A Primer
- 4 Data Trends to Watch in 2020
- CSE 291D / 234: Data Systems for Machine Learning
- A complete picture of the modern data engineering landscape
- Continuous Integration for your data with GitHub Actions and Great Expectations. One step closer to CI/CD for your data pipelines
- Emerging Architectures for Modern Data Infrastructure
- Awesome Data Engineering. Learning path and resources to become a data engineer
- Data Quality at Airbnb Part 1 | Part 2
- DataHub: Popular metadata architectures explained
- Financial Times Data Platform: From zero to hero. An in-depth walkthrough of the evolution of our Data Platform
- Alki, or how we learned to stop worrying and love cold metadata (Dropbox)
- A Beginner's Guide to Clean Data. Practical advice to spot and avoid data quality problems (by Benjamin Greve)
- ML Lake: Building Salesforce’s Data Platform for Machine Learning
- Data Catalog 3.0: Modern Metadata for the Modern Data Stack
- Metadata Management Systems
- Essential resources for data engineers (a curated recommended read and watch list for scalable data processing)
- Comprehensive and Comprehensible Data Catalogs: The What, Who, Where, When, Why, and How of Metadata Management (Paper)
- What I Learned From Attending DataOps Unleashed 2021 (byJames Le)
- Uber's Journey Toward Better Data Culture From First Principles
- Cerberus - lightweight and extensible data validation library for Python
- Design a data mesh architecture using AWS Lake Formation and AWS Glue. AWS Big Data Blog
- Data Management Challenges in Production Machine Learning (slides)
- The Missing Piece of Data Discovery and Observability Platforms: Open Standard for Metadata
- Automating Data Protection at Scale
- A curated list of awesome pipeline toolkits
- Data Mesh Archtitecture
- The Essential Guide to Data Exploration in Machine Learning (by NimbleBox.ai)
- Finding millions of label errors with Cleanlab
Click to expand!
- AI Infrastructure for Everyone: DeterminedAI
- Deploying R Models with MLflow and Docker
- What Does it Mean to Deploy a Machine Learning Model?
- Software Interfaces for Machine Learning Deployment
- Batch Inference for Machine Learning Deployment
- AWS Cost Optimization for ML Infrastructure - EC2 spend
- CI/CD for Machine Learning & AI
- ItaĂş Unibanco: How we built a CI/CD Pipeline for machine learning with online training in Kubeflow
- 101 For Serving ML Models
- Deploying Machine Learning models to production — Inference service architecture patterns
- Serverless ML: Deploying Lightweight Models at Scale
- ML Model Rollout To Production. Part 1 | Part 2
- Deploying Python ML Models with Flask, Docker and Kubernetes
- Deploying Python ML Models with Bodywork
- Framework for a successful Continuous Training Strategy. When should the model be retrained? What data should be used? What should be retrained? A data-driven approach
- Efficient Machine Learning Inference. The benefits of multi-model serving where latency matters
- Deploying Hugging Face ML Models in the Cloud with Infrastructure as Code
Click to expand!
- Building dashboards for operational visibility (AWS)
- Monitoring Machine Learning Models in Production
- Effective testing for machine learning systems
- Unit Testing Data: What is it and how do you do it?
- How to Test Machine Learning Code and Systems (Accompanying code)
- Wu, T., Dong, Y., Dong, Z., Singa, A., Chen, X. and Zhang, Y., 2020. Testing Artificial Intelligence System Towards Safety and Robustness: State of the Art. IAENG International Journal of Computer Science, 47(3).
- Multi-Armed Bandits and the Stitch Fix Experimentation Platform
- A/B Testing Machine Learning Models
- Data validation for machine learning. Polyzotis, N., Zinkevich, M., Roy, S., Breck, E. and Whang, S., 2019. Proceedings of Machine Learning and Systems
- Testing machine learning based systems: a systematic mapping
- Explainable Monitoring: Stop flying blind and monitor your AI
- WhyLogs: Embrace Data Logging Across Your ML Systems
- Evidently AI. Insights on doing machine learning in production. (Vendor blog.)
- The definitive guide to comprehensively monitoring your AI
- Introduction to Unit Testing for Machine Learning
- Production Machine Learning Monitoring: Outliers, Drift, Explainers & Statistical Performance
- Test-Driven Development in MLOps Part 1
- Domain-Specific Machine Learning Monitoring
- Introducing ML Model Performance Management (Blog by fiddler)
- What is ML Observability? (Arize AI)
- Beyond Monitoring: The Rise of Observability (Arize AI & Monte Carlo Data)
- Model Failure Modes (Arize AI)
- Quick Start to Data Quality Monitoring for ML (Arize AI)
- Playbook to Monitoring Model Performance in Production (Arize AI)
- Robust ML by Property Based Domain Coverage Testing (Blog by Efemarai)
- Monitoring and explainability of models in production
- Beyond Monitoring: The Rise of Observability
- ML Model Monitoring – 9 Tips From the Trenches. (by NU bank)
- Model health assurance at LinkedIn. By LinkedIn Engineering
- How to Trust Your Deep Learning Code (Accompanying code)
- Estimating Performance of Regression Models Without Ground-Truth (Using NannyML)
- How Hyperparameter Tuning in Machine Learning Works (by NimbleBox.ai)
Click to expand!
- MLOps Infrastructure Stack Canvas
- Rise of the Canonical Stack in Machine Learning. How a Dominant New Software Stack Will Unlock the Next Generation of Cutting Edge AI Apps
- AI Infrastructure Alliance. Building the canonical stack for AI/ML
- Linux Foundation AI Foundation
- ML Infrastructure Tools for Production | Part 1 — Production ML — The Final Stage of the Model Workflow | Part 2 — Model Deployment and Serving
- The MLOps Stack Template (by valohai)
- Navigating the MLOps tooling landscape
- MLOps.toys curated list of MLOps projects (by Aporia)
- Comparing Cloud MLOps platforms, From a former AWS SageMaker PM
- Machine Learning Ecosystem 101 (whitepaper by Arize AI)
- Selecting your optimal MLOps stack: advantages and challenges. By Intellerts
- Infrastructure Design for Real-time Machine Learning Inference. The Databricks Blog
- The 2021 State of AI Infrastructure Survey
- AI infrastructure Maturity matrix
- A Curated Collection of the Best Open-source MLOps Tools. By Censius
- Best MLOps Tools to Manage the ML Lifecycle (by NimbleBox.ai)
- The minimum set of must-haves for MLOps
A list of scientific and industrial papers and resources about Machine Learning operalization since 2015. See more.
Click to expand!
- "MLOps: Automated Machine Learning" by Emmanuel Raj
- DeliveryConf 2020. "Continuous Delivery For Machine Learning: Patterns And Pains" by Emily Gorcenski
- MLOps Conference: Talks from 2019
- Kubecon 2019: Flyte: Cloud Native Machine Learning and Data Processing Platform
- Kubecon 2019: Running LargeScale Stateful workloads on Kubernetes at Lyft
- A CI/CD Framework for Production Machine Learning at Massive Scale (using Jenkins X and Seldon Core)
- MLOps Virtual Event (Databricks)
- MLOps NY conference 2019
- MLOps.community YouTube Channel
- MLinProduction YouTube Channel
- Introducing MLflow for End-to-End Machine Learning on Databricks. Spark+AI Summit 2020. Sean Owen
- MLOps Tutorial #1: Intro to Continuous Integration for ML
- Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams (2019)
- Damian Brady - The emerging field of MLops
- MLOps - Entwurf, Entwicklung, Betrieb (INNOQ Podcast in German)
- Instrumentation, Observability & Monitoring of Machine Learning Models
- Efficient ML engineering: Tools and best practices
- Beyond the jupyter notebook: how to build data science products
- An introduction to MLOps on Google Cloud (First 19 min are vendor-, language-, and framework-agnostic. @visenger)
- How ML Breaks: A Decade of Outages for One Large ML Pipeline
- Clean Machine Learning Code: Practical Software Engineering
- Machine Learning Engineering: 10 Fundamentale Praktiken
- Architecture of machine learning systems (3-part series)
- Machine Learning Design Patterns
- The laylist that covers techniques and approaches for model deployment on to production
- ML Observability: A Critical Piece in Ensuring Responsible AI (Arize AI at Re-Work)
- ML Engineering vs. Data Science (Arize AI Un/Summit)
- SRE for ML: The First 10 Years and the Next 10
- Demystifying Machine Learning in Production: Reasoning about a Large-Scale ML Platform
- Apply Conf 2022
- Databricks' Data + AI Summit 2022
- RE•WORK MLOps Summit 2022
- Annual MLOps World Conference
Click to expand!
- Introducing FBLearner Flow: Facebook’s AI backbone
- TFX: A TensorFlow-Based Production-Scale Machine Learning Platform
- Accelerate your ML and Data workflows to production: Flyte
- Getting started with Kubeflow Pipelines
- Meet Michelangelo: Uber’s Machine Learning Platform
- Meson: Workflow Orchestration for Netflix Recommendations
- What are Azure Machine Learning pipelines?
- Uber ATG’s Machine Learning Infrastructure for Self-Driving Vehicles
- An overview of ML development platforms
- Snorkel AI: Putting Data First in ML Development
- A Tour of End-to-End Machine Learning Platforms
- Introducing WhyLabs, a Leap Forward in AI Reliability
- Project: Ease.ml (ETH ZĂĽrich)
- Bodywork: model-training and deployment automation
- Lessons on ML Platforms — from Netflix, DoorDash, Spotify, and more
- Papers & tech blogs by companies sharing their work on data science & machine learning in production. By Eugen Yan
- How do different tech companies approach building internal ML platforms? (tweet)
- Declarative Machine Learning Systems
- StreamING Machine Learning Models: How ING Adds Fraud Detection Models at Runtime with Apache Flink
Click to expand!
- Book, Aurélien Géron,"Hands-On Machine Learning with Scikit-Learn and TensorFlow"
- Foundations of Machine Learning
- Best Resources to Learn Machine Learning
- Awesome TensorFlow
- "Papers with Code" - Browse the State-of-the-Art in Machine Learning
- Zhi-Hua Zhou. 2012. Ensemble Methods: Foundations and Algorithms. Chapman & Hall/CRC.
- Feature Engineering for Machine Learning. Principles and Techniques for Data Scientists. By Alice Zheng, Amanda Casari
- Google Research: Looking Back at 2019, and Forward to 2020 and Beyond
- O’Reilly: The road to Software 2.0
- Machine Learning and Data Science Applications in Industry
- Deep Learning for Anomaly Detection
- Federated Learning for Mobile Keyboard Prediction
- Federated Learning. Building better products with on-device data and privacy on default
- Federated Learning: Collaborative Machine Learning without Centralized Training Data
- Yang, Q., Liu, Y., Cheng, Y., Kang, Y., Chen, T. and Yu, H., 2019. Federated learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 13(3). Chapters 1 and 2.
- Federated Learning by FastForward
- THE FEDERATED & DISTRIBUTED MACHINE LEARNING CONFERENCE
- Federated Learning: Challenges, Methods, and Future Directions
- Book: Molnar, Christoph. "Interpretable machine learning. A Guide for Making Black Box Models Explainable", 2019
- Book: Hutter, Frank, Lars Kotthoff, and Joaquin Vanschoren. "Automated Machine Learning". Springer,2019.
- ML resources by topic, curated by the community.
- An Introduction to Machine Learning Interpretability, by Patrick Hall, Navdeep Gill, 2nd Edition. O'Reilly 2019
- Examples of techniques for training interpretable machine learning (ML) models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.
- Paper: "Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence", by Sebastian Raschka, Joshua Patterson, and Corey Nolet. 2020
- Distill: Machine Learning Research
- AtHomeWithAI: Curated Resource List by DeepMind
- Awesome Data Science
- Intro to probabilistic programming. A use case using Tensorflow-Probability (TFP)
- Dive into Snorkel: Weak-Superversion on German Texts. inovex Blog
- Dive into Deep Learning. An interactive deep learning book with code, math, and discussions. Provides NumPy/MXNet, PyTorch, and TensorFlow implementations
- Data Science Collected Resources (GitHub repository)
- Set of illustrated Machine Learning cheatsheets
- "Machine Learning Bookcamp" by Alexey Grigorev
- 130 Machine Learning Projects Solved and Explained
- Machine learning cheat sheet
- Stateoftheart AI. An open-data and free platform built by the research community to facilitate the collaborative development of AI
- Online Machine Learning Courses: 2020 Edition
- End-to-End Machine Learning Library
- Machine Learning Toolbox (by Amit Chaudhary)
- Causality for Machine Learning
- Causal Inference for the Brave and True
- Causal Inference
- A resource list for causality in statistics, data science and physics
- Learning from data. Caltech
- Machine Learning Glossary
- Book: "Distributed Machine Learning Patterns". 2022. By Yuan Tang. Manning
- Machine Learning for Beginners - A Curriculum
- Making Friends with Machine Learning. By Cassie Kozyrkov
- Machine Learning Workflow - A Complete Guide (by NimbleBox.ai)
- Performance Metrics to Monitor in Machine Learning Projects (by NimbleBox.ai)
Click to expand!
- The Twelve Factors
- Book "Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations", 2018 by Nicole Forsgren et.al
- Book "The DevOps Handbook" by Gene Kim, et al. 2016
- State of DevOps 2019
- Clean Code concepts adapted for machine learning and data science.
- School of SRE
- 10 Laws of Software Engineering That People Ignore
- The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
- The Book of Secret Knowledge
- SHADES OF CONWAY'S LAW
- Engineering Practices for Data Scientists
Click to expand!
- What you need to know about product management for AI. A product manager for AI does everything a traditional PM does, and much more.
- Bringing an AI Product to Market. Previous articles have gone through the basics of AI product management. Here we get to the meat: how do you bring a product to market?
- The People + AI Guidebook
- User Needs + Defining Success
- Building machine learning products: a problem well-defined is a problem half-solved.
- Talk: Designing Great ML Experiences (Apple)
- Machine Learning for Product Managers
- Understanding the Data Landscape and Strategic Play Through Wardley Mapping
- Techniques for prototyping machine learning systems across products and features
- Machine Learning and User Experience: A Few Resources
- AI ideation canvas
- Ideation in AI
- 5 Steps for Building Machine Learning Models for Business. By shopify engineering
- Metric Design for Data Scientists and Business Leaders
Click to expand!
- Book: "Prediction Machines: The Simple Economics of Artificial Intelligence"
- Book: "The AI Organization" by David Carmona
- Book: "Succeeding with AI". 2020. By Veljko Krunic. Manning Publications
- A list of articles about AI and the economy
- Gartner AI Trends 2019
- Global AI Survey: AI proves its worth, but few scale impact
- Getting started with AI? Start here! Everything you need to know to dive into your project
- 11 questions to ask before starting a successful Machine Learning project
- What AI still can’t do
- Demystifying AI Part 4: What is an AI Canvas and how do you use it?
- A Data Science Workflow Canvas to Kickstart Your Projects
- Is your AI project a nonstarter? Here’s a reality check(list) to help you avoid the pain of learning the hard way
- What is THE main reason most ML projects fail?
- Designing great data products. The Drivetrain Approach: A four-step process for building data products.
- The New Business of AI (and How It’s Different From Traditional Software)
- The idea maze for AI startups
- The Enterprise AI Challenge: Common Misconceptions
- Misconception 1 (of 5): Enterprise AI Is Primarily About The Technology
- Misconception 2 (of 5): Automated Machine Learning Will Unlock Enterprise AI
- Three Principles for Designing ML-Powered Products
- A Step-by-Step Guide to Machine Learning Problem Framing
- AI adoption in the enterprise 2020
- How Adopting MLOps can Help Companies With ML Culture?
- Weaving AI into Your Organization
- What to Do When AI Fails
- Introduction to Machine Learning Problem Framing
- Structured Approach for Identifying AI Use Cases
- Book: "Machine Learning for Business" by Doug Hudgeon, Richard Nichol, O'reilly
- Why Commercial Artificial Intelligence Products Do Not Scale (FemTech)
- Google Cloud’s AI Adoption Framework (White Paper)
- Data Science Project Management
- Book: "Competing in the Age of AI" by Marco Iansiti, Karim R. Lakhani. Harvard Business Review Press. 2020
- The Three Questions about AI that Startups Need to Ask. The first is: Are you sure you need AI?
- Taming the Tail: Adventures in Improving AI Economics
- Managing the Risks of Adopting AI Engineering
- Get rid of AI Saviorism
- Collection of articles listing reasons why data science projects fail
- How to Choose Your First AI Project by Andrew Ng
- How to Set AI Goals
- Expanding AI's Impact With Organizational Learning
- Potemkin Data Science
- When Should You Not Invest in AI?
- Why 90% of machine learning models never hit the market. Most companies lack leadership support, effective communication between teams, and accessible data
This topic is extracted into our new Awesome ML Model Governace repository
Click to expand!
- Scaling An ML Team (0–10 People)
- The Knowledge Repo project is focused on facilitating the sharing of knowledge between data scientists and other technical roles.
- Scaling Knowledge at Airbnb
- Models for integrating data science teams within companies A comparative analysis
- How to Write Better with The Why, What, How Framework. How to write design documents for data science/machine learning projects? (by Eugene Yan)
- Technical Writing Courses
- Building a data team at a mid-stage startup: a short story. By Erik Bernhardsson
- The Cultural Benefits of Artificial Intelligence in the Enterprise. by Sam Ransbotham, François Candelon, David Kiron, Burt LaFountain, and Shervin Khodabandeh
Click to expand!
- ML in Production newsletter
- MLOps.community
- Andriy Burkov newsletter
- Decision Intelligence by Cassie Kozyrkov
- Laszlo's Newsletter about Data Science
- Data Elixir newsletter for a weekly dose of the top data science picks from around the web. Covering machine learning, data visualization, analytics, and strategy.
- The Data Science Roundup by Tristan Handy
- Vicki Boykis Newsletter about Data Science
- KDnuggets News
- Analytics Vidhya, Any questions on business analytics, data science, big data, data visualizations tools and techniques
- Data Science Weekly Newsletter: A free weekly newsletter featuring curated news, articles and jobs related to Data Science
- The Machine Learning Engineer Newsletter
- Gradient Flow helps you stay ahead of the latest technology trends and tools with in-depth coverage, analysis and insights. See the latest on data, technology and business, with a focus on machine learning and AI
- Your guide to AI by Nathan Benaich. Monthly analysis of AI technology, geopolitics, research, and startups.
- O'Reilly Data & AI Newsletter
- deeplearning.ai’s newsletter by Andrew Ng
- Deep Learning Weekly
- Import AI is a weekly newsletter about artificial intelligence, read by more than ten thousand experts. By Jack Clark.
- AI Ethics Weekly
- Announcing Projects To Know, a weekly machine intelligence and data science newsletter
- TWIML: This Week in Machine Learning and AI newsletter
- featurestore.org: Monthly Newsletter on Feature Stores for ML
- DataTalks.Club Community: Slack, Newsletter, Podcast, Weeekly Events
- Machine Learning Ops Roundup
- Data Science Programming Newsletter by Eric Ma
- Marginally Interesting by Mikio L. Braun
- Synced
- The Ground Truth: Newsletter for Computer Vision Practitioners
- SwirlAI: Data Engineering, MLOps and overall Data focused Newsletter by Aurimas Griciūnas
- Marvelous MLOps
- Made with ML
- MLOps Insights Newsletter - 8 episodes covering topics like Model Feedback Vacuums, Deployment Reproducibility and Serverless in the context of MLOps
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for awesome-mlops
Similar Open Source Tools
TensorRT-LLM
TensorRT-LLM is an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM contains components to create Python and C++ runtimes that execute those TensorRT engines. It also includes a backend for integration with the NVIDIA Triton Inference Server; a production-quality system to serve LLMs. Models built with TensorRT-LLM can be executed on a wide range of configurations going from a single GPU to multiple nodes with multiple GPUs (using Tensor Parallelism and/or Pipeline Parallelism).
RAG-Survey
This repository is dedicated to collecting and categorizing papers related to Retrieval-Augmented Generation (RAG) for AI-generated content. It serves as a survey repository based on the paper 'Retrieval-Augmented Generation for AI-Generated Content: A Survey'. The repository is continuously updated to keep up with the rapid growth in the field of RAG.
Weekly-Top-LLM-Papers
This repository provides a curated list of weekly published Large Language Model (LLM) papers. It includes top important LLM papers for each week, organized by month and year. The papers are categorized into different time periods, making it easy to find the most recent and relevant research in the field of LLM.
Awesome-AI-Agents
Awesome-AI-Agents is a curated list of projects, frameworks, benchmarks, platforms, and related resources focused on autonomous AI agents powered by Large Language Models (LLMs). The repository showcases a wide range of applications, multi-agent task solver projects, agent society simulations, and advanced components for building and customizing AI agents. It also includes frameworks for orchestrating role-playing, evaluating LLM-as-Agent performance, and connecting LLMs with real-world applications through platforms and APIs. Additionally, the repository features surveys, paper lists, and blogs related to LLM-based autonomous agents, making it a valuable resource for researchers, developers, and enthusiasts in the field of AI.
auto-round
AutoRound is an advanced weight-only quantization algorithm for low-bits LLM inference. It competes impressively against recent methods without introducing any additional inference overhead. The method adopts sign gradient descent to fine-tune rounding values and minmax values of weights in just 200 steps, often significantly outperforming SignRound with the cost of more tuning time for quantization. AutoRound is tailored for a wide range of models and consistently delivers noticeable improvements.
Awesome-Colorful-LLM
Awesome-Colorful-LLM is a meticulously assembled anthology of vibrant multimodal research focusing on advancements propelled by large language models (LLMs) in domains such as Vision, Audio, Agent, Robotics, and Fundamental Sciences like Mathematics. The repository contains curated collections of works, datasets, benchmarks, projects, and tools related to LLMs and multimodal learning. It serves as a comprehensive resource for researchers and practitioners interested in exploring the intersection of language models and various modalities for tasks like image understanding, video pretraining, 3D modeling, document understanding, audio analysis, agent learning, robotic applications, and mathematical research.
ST-LLM
ST-LLM is a temporal-sensitive video large language model that incorporates joint spatial-temporal modeling, dynamic masking strategy, and global-local input module for effective video understanding. It has achieved state-of-the-art results on various video benchmarks. The repository provides code and weights for the model, along with demo scripts for easy usage. Users can train, validate, and use the model for tasks like video description, action identification, and reasoning.
speakeasy
Speakeasy is a tool that helps developers create production-quality SDKs, Terraform providers, documentation, and more from OpenAPI specifications. It supports a wide range of languages, including Go, Python, TypeScript, Java, and C#, and provides features such as automatic maintenance, type safety, and fault tolerance. Speakeasy also integrates with popular package managers like npm, PyPI, Maven, and Terraform Registry for easy distribution.
UMOE-Scaling-Unified-Multimodal-LLMs
Uni-MoE is a MoE-based unified multimodal model that can handle diverse modalities including audio, speech, image, text, and video. The project focuses on scaling Unified Multimodal LLMs with a Mixture of Experts framework. It offers enhanced functionality for training across multiple nodes and GPUs, as well as parallel processing at both the expert and modality levels. The model architecture involves three training stages: building connectors for multimodal understanding, developing modality-specific experts, and incorporating multiple trained experts into LLMs using the LoRA technique on mixed multimodal data. The tool provides instructions for installation, weights organization, inference, training, and evaluation on various datasets.
cuckoo
Cuckoo is a Decentralized AI Platform that focuses on GPU-sharing for text-to-image generation and LLM inference. It provides a platform for users to generate images using Telegram or Discord.
VideoRefer
VideoRefer Suite is a tool designed to enhance the fine-grained spatial-temporal understanding capabilities of Video Large Language Models (Video LLMs). It consists of three primary components: Model (VideoRefer) for perceiving, reasoning, and retrieval for user-defined regions at any specified timestamps, Dataset (VideoRefer-700K) for high-quality object-level video instruction data, and Benchmark (VideoRefer-Bench) to evaluate object-level video understanding capabilities. The tool can understand any object within a video.
LLM-FineTuning-Large-Language-Models
This repository contains projects and notes on common practical techniques for fine-tuning Large Language Models (LLMs). It includes fine-tuning LLM notebooks, Colab links, LLM techniques and utils, and other smaller language models. The repository also provides links to YouTube videos explaining the concepts and techniques discussed in the notebooks.
nttu-chatbot
NTTU Chatbot is a student support chatbot developed using LLM + Document Retriever (RAG) technology in Vietnamese. It provides assistance to students by answering their queries and retrieving relevant documents. The chatbot aims to enhance the student support system by offering quick and accurate responses to user inquiries. It utilizes advanced language models and document retrieval techniques to deliver efficient and effective support to users.
AiTreasureBox
AiTreasureBox is a versatile AI tool that provides a collection of pre-trained models and algorithms for various machine learning tasks. It simplifies the process of implementing AI solutions by offering ready-to-use components that can be easily integrated into projects. With AiTreasureBox, users can quickly prototype and deploy AI applications without the need for extensive knowledge in machine learning or deep learning. The tool covers a wide range of tasks such as image classification, text generation, sentiment analysis, object detection, and more. It is designed to be user-friendly and accessible to both beginners and experienced developers, making AI development more efficient and accessible to a wider audience.
AITreasureBox
AITreasureBox is a comprehensive collection of AI tools and resources designed to simplify and accelerate the development of AI projects. It provides a wide range of pre-trained models, datasets, and utilities that can be easily integrated into various AI applications. With AITreasureBox, developers can quickly prototype, test, and deploy AI solutions without having to build everything from scratch. Whether you are working on computer vision, natural language processing, or reinforcement learning projects, AITreasureBox has something to offer for everyone. The repository is regularly updated with new tools and resources to keep up with the latest advancements in the field of artificial intelligence.