ml-road-map

The most streamlined road map to learn ML fundamentals for free.

Stars: 253

Visit

The Machine Learning Road Map is a comprehensive guide designed to take individuals from various levels of machine learning knowledge to a basic understanding of machine learning principles using high-quality, free resources. It aims to simplify the complex and rapidly growing field of machine learning by providing a structured roadmap for learning. The guide emphasizes the importance of understanding AI for everyone, the need for patience in learning machine learning due to its complexity, and the value of learning from experts in the field. It covers five different paths to learning about machine learning, catering to consumers, aspiring AI researchers, ML engineers, developers interested in building ML applications, and companies looking to implement AI solutions.

README:

Machine Learning Road Map

Welcome to the Machine Learning Road Map. This is the fastest, high-quality road map to get up to speed on machine learning fundamentals. It teaches you the prerequisites and machine learning fundamentals necessary to understand how machine learning works and build with it. The goal is to quickly get to a point where you can comfortably explore machine learning topics on your own. Many other road maps are more comprehensive, this one is purposefully streamlined.

These resources are aggregated from the best ML educators. I've linked to the authors as much as possible. Please support them. Feedback/suggestions/corrections are always welcome and appreciated.

If you're less interested in the technical details of machine learning and want to know more about how machine learning will affect you as a consumer, I've written an article just for that. You can also check out the Google AI Essentials Course learn how to use generative AI to boost your productivity.

To support this project and stay updated:

⭐ Star this repository
🐦 Follow me on X (Twitter): @loganthorneloe
👨‍💻 Follow me on GitHub: loganthorneloe

This road map will be updated as new learning resources are created and new ML topics emerge. Let's get started!

Machine Learning Road Map

Things to Know Before You Begin

Machine Learning will impact everyone's life. It's a new paradigm of computing that will completely change the way consumers expect their devices to work.
Machine learning is a rapidly developing field. There are many complex fields within machine learning. Take it slow and don't expect to become an expert in it all.
The best way to understand machine learning is to learn from others who understand the topics you want to know more about. I've created a list of accounts to follow on X. I've also aggregated a list of newsletters, blogs, and channels I find helpful to stay updated.

Machine Learning Prerequisites

These prerequisites contain a mixture of math and programming concepts. Feel free to skip things you already understand.

Topic	Source	Author
Programming
General Programming	CS50	Harvard
Python	Intro to Python (For Beginners)	Harvard
	Google's Python Class (Refresher)	Google
NumPy	NumPy Tutorial	NumPy Team
Pandas	Pandas Course	Kaggle
Math
Algebra	Algebra Curriculum	Khan Academy
Linear Algebra	Linear Algebra Curriculum	Khan Academy
Probability	Uncertainty Section of CS50	Harvard
Calculus	Derivatives/Partial Derivatives	Khan Academy
	Gradients	Khan Academy
	Backpropagation Visualization	Google
Tools
Version Control	Learn How to Use Git	Open Source Git Community
	Github Tutorial	GitHub
Terminal	Learn Shell	learnshell.org

Machine Learning Fundamentals

This is the main material. Complete these to understand machine learning fundamentals:

Topic	Source	Author
Intro	20 Min Introduction to Machine Learning	Google
Fundamentals	Machine Learning Crash Course	Google

Advanced ML Topics

High-quality resources to explore more advanced topics that are helpful for machine learning:

Topic	Source	Author	Type
General Advanced ML Topics	Machine Learning Q and AI	Sebastian Raschka	Book
Large Language Models	Intro to LLMs	Andrej Karpathy	Video
	Developing, Building, and Fine-tuning a LLM	Sebastian Raschka	Video
	Build a Large Language Model (From Scratch)	Sebastian Raschka	Book/Repo
	Quantization Section of LLM Course	Maxime Labonne	Course/Repo
	LLM Tools	Maxime Labonne	Course/Repo
	LLM Engineering	Maxime Labonne	Course/Repo
	LLM Engineer's Handbook	Paul Iusztin, Maxime Labonne, Alex Vesa	Book
Generative AI	Generative AI For Beginners	Microsoft	Course/Repo
Natural Language Processing (NLP)	NLP Course	Huggingface	Course
Transformers	Start of NLP Course	Huggingface	Course
Deep Learning	Deep Learning Fundamentals	LightningAI	Course
	Deep Learning Book	Ian Goodfellow and Yoshua Bengio and Aaron Courville	Book
	The Engineer's Guide To Deep Learning	Hironobu Suzuki	Book
Reinforcement Learning (RL)	Spinning Up	OpenAI	Course
Computer Vision	Computer Vision	Kaggle	Course
Unsupervised Learning	Second Half of CS229	Andrew Ng/Stanford	Lecture
Supervised Learning	Supervised Machine Learning for Science	Christoph Molnar & Timo Freiesleben	Book
ML for Video Games	Machine Learning for Games	Huggingface	Course
Feature Engineering	Data Prep	Google	Course
AI Ethics	Intro to AI Ethics	Kaggle	Course
ML Explainability	Machine Learning Explainability	Kaggle	Course
ML Ops	Made with ML	Goku Mohandas	Course
Virtual Classroom for Building LLMs	ML School	Santiago	Interactive Course
More Python	The Python Coding Place	Stephen Gruppetta	Website/Book
SQL	Intro to SQL	Kaggle	Course
	Advanced SQL	Kaggle	Course
Studying for ML Interviews	Study Plan for ML Interviews	Khang Pham	Repo
Machine Learning Math	Mathematics of Machine Learning	Tivadar Danka	Book
Machine Learning Efficiency	EfficientML.ai Lecture	MIT	Course
Knowledge Distillation	Awesome Knowledge Distillation	Dmitry Kozlov	Repo
System Design	System Design Interview Volume 1 and Volume 2	Alex Xu	Book

Your support helps keep this resource up-to-date and valuable for the ML community!

Job Skills and Where to Learn Them

This section contains the technologies and skills I find most often as I go through real machine learning-related job descriptions and the resources for learning each.

Topic	Source	Author
Tensorflow	TensorFlow 2.0 Complete Course	freeCodeCamp
PyTorch	PyTorch for Deep Learning	Daniel Bourke
Scikit-learn	Scikit-learn Tutorials	Scikit-learn Developers
Keras	Keras Tutorial	TutorialsPoint
NumPy	NumPy Tutorial	NumPy Team
Pandas	Pandas Course	Kaggle
SQL	Intro to SQL	Kaggle
Python	Intro to Python (For Beginners)	Harvard
C++	C++ Tutorial for Beginners	freeCodeCamp
Rust	The Rust Programming Language	Rust Team
JAX	JAX Quickstart	Google
Linear Algebra	Linear Algebra Curriculum	Khan Academy
Calculus	Essence of Calculus	3Blue1Brown
Deep Learning	Deep Learning Fundamentals	LightningAI
Computer Vision	Computer Vision	Kaggle
Natural Language Processing	NLP Course	Huggingface
ONNX	ONNX Tutorial	ONNX Team
TensorRT	TensorRT Developer Guide	NVIDIA
LangChain	LangChain Crash Course	Patrick Loeber
AWS	AWS Machine Learning	Amazon Web Services
Azure	Azure AI Fundamentals	Microsoft
GCP	Machine Learning on Google Cloud	Google Cloud
XGBoost	XGBoost Documentation	XGBoost Team
Transformers	Transformers Course	Hugging Face
CUDA	CUDA C++ Programming Guide	NVIDIA
Java	Java Programming	University of Helsinki
LLMs	Building LLMs from the Ground Up	Sebastian Raschka
RAG	Building RAG-based LLM Applications for Production	DeepLearning.AI
Kubernetes	Kubernetes Tutorial for Beginners	TechWorld with Nana
Docker	Docker Tutorial for Beginners	freeCodeCamp

Newsletters, Blogs, and Channels for Machine Learning

All of these are must-subscribes:

Resource	Author
Blogs/Newsletters
Ahead of AI	Sebastian Raschka
AI Made Simple	Devansh
Society's Backend	Logan Thorneloe
The Batch	Andrew Ng
Interconnects	Nathan Lambert
Deep (Learning) Focus	Cameron R. Wolfe
ML Spring	Akshay Pachaar
Spatial Intelligence	Bilawal Sidhu
The AIEdge	Damien Benveniste
Google DeepMind Blog	Multiple
OpenAI Blog	Multiple
Meta AI Blog	Multiple
QiuByte	Hesam Sheikh
NLP Newsletter	Elvis
The Palindrome	Tivadar Danka
YouTube
Andrej Karpathy	Andrej Karpathy
Spatial Intelligence	Bilawal Sidhu
Jay Alammar	Jay Alammar
Mervin Praison	Mervin Praison
Nicholas Renotte	Nicholas Renotte
Jeremy Howard	Jeremy Howard
Logan Thorneloe	Logan Thorneloe
3Blue1Brown	Grant Sanderson
RohanPaulAI	Rohan Paul

For a list of almost all the available ML YouTube Courses check out this repo by Dair AI.

Free GPUs for Training

I've aggregated a list of cloud providers that offer a free tier for training machine learning models. Anyone can get started with ML- you don't need a powerful local machine. If anything is incorrect, reach out to me on X so I can make the fix. If there is a cloud computing platform that I've missed, also let me know.

Resource	Details
Top Choices
Google Colab	Offers free access to GPUs (usually NVIDIA T4 or P100) and TPUs with limited usage time and resources. Excellent for small projects and experimentation.
Kaggle Notebooks	Provides 30 hours/week of GPU usage (NVIDIA Tesla P100 or T4) for free. It's a good option with access to Kaggle's datasets and community.
Other Options
Lightning AI	Offers one free studio with 22 GPU hours and is pay-as-you-go after that.
Google Cloud Platform	Offers $300 in free credits to new users.
Amazon SageMaker	Provides a free tier with limited access to various machine learning resources.
Paperspace Gradient	Offers a free community tier with access to limited GPU resources for experimentation and learning.

Support This Guide

Don't forget to star this repo and follow me on X to support this guide. Please support the authors of these resources by following them at the links I included. You can also find them in my ML on X list.

If any information is missing, you are the author of a resource and you'd like it removed, or any other general feedback send me a message to let me know.

For Tasks:

Click tags to check more tools for each tasks

understand ml impact on consumers develop ml models acquire ml engineer skills build ml applications implement ai solutions

For Jobs:

data scientist machine learning engineer ai researcher software developer data analyst

Alternative AI tools for ml-road-map

Similar Open Source Tools

ml-road-map

github

: 253

llm-engineer-toolkit

The LLM Engineer Toolkit is a curated repository containing over 120 LLM libraries categorized for various tasks such as training, application development, inference, serving, data extraction, data generation, agents, evaluation, monitoring, prompts, structured outputs, safety, security, embedding models, and other miscellaneous tools. It includes libraries for fine-tuning LLMs, building applications powered by LLMs, serving LLM models, extracting data, generating synthetic data, creating AI agents, evaluating LLM applications, monitoring LLM performance, optimizing prompts, handling structured outputs, ensuring safety and security, embedding models, and more. The toolkit covers a wide range of tools and frameworks to streamline the development, deployment, and optimization of large language models.

github

: 2.6k

redis-ai-resources

A curated repository of code recipes, demos, and resources for basic and advanced Redis use cases in the AI ecosystem. It includes demos for ArxivChatGuru, Redis VSS, Vertex AI & Redis, Agentic RAG, ArXiv Search, and Product Search. Recipes cover topics like Getting started with RAG, Semantic Cache, Advanced RAG, and Recommendation systems. The repository also provides integrations/tools like RedisVL, AWS Bedrock, LangChain Python, LangChain JS, LlamaIndex, Semantic Kernel, RelevanceAI, and DocArray. Additional content includes blog posts, talks, reviews, and documentation related to Vector Similarity Search, AI-Powered Document Search, Vector Databases, Real-Time Product Recommendations, and more. Benchmarks compare Redis against other Vector Databases and ANN benchmarks. Documentation includes QuickStart guides, official literature for Vector Similarity Search, Redis-py client library docs, Redis Stack documentation, and Redis client list.

github

: 170

awesome-generative-ai-data-scientist

A curated list of 50+ resources to help you become a Generative AI Data Scientist. This repository includes resources on building GenAI applications with Large Language Models (LLMs), and deploying LLMs and GenAI with Cloud-based solutions.

github

: 425

watsonx-ai-samples

Sample notebooks for IBM Watsonx.ai for IBM Cloud and IBM Watsonx.ai software product. The notebooks demonstrate capabilities such as running experiments on model building using AutoAI or Deep Learning, deploying third-party models as web services or batch jobs, monitoring deployments with OpenScale, managing model lifecycles, inferencing Watsonx.ai foundation models, and integrating LangChain with Watsonx.ai. Notebooks with Python code and the Python SDK can be found in the `python_sdk` folder. The REST API examples are organized in the `rest_api` folder.

github

: 128

Model-References

The 'Model-References' repository contains examples for training and inference using Intel Gaudi AI Accelerator. It includes models for computer vision, natural language processing, audio, generative models, MLPerf™ training, and MLPerf™ inference. The repository provides performance data and model validation information for various frameworks like PyTorch. Users can find examples of popular models like ResNet, BERT, and Stable Diffusion optimized for Intel Gaudi AI accelerator.

github

: 138

tamingLLMs

The 'Taming LLMs' repository provides a practical guide to the pitfalls and challenges associated with Large Language Models (LLMs) when building applications. It focuses on key limitations and implementation pitfalls, offering practical Python examples and open source solutions to help engineers and technical leaders navigate these challenges. The repository aims to equip readers with the knowledge to harness the power of LLMs while avoiding their inherent limitations.

github

: 233

ai-samples

AI Samples for .NET is a repository containing various samples demonstrating how to use AI in .NET applications. It provides quickstarts using Semantic Kernel and Azure OpenAI SDK, covers LLM Core Concepts, End to End Examples, Local Models, Local Embedding Models, Tokenizers, Vector Databases, and Reference Examples. The repository showcases different AI-related projects and tools for developers to explore and learn from.

github

: 229

Awesome-LLM-Safety

Welcome to our Awesome-llm-safety repository! We've curated a collection of the latest, most comprehensive, and most valuable resources on large language model safety (llm-safety). But we don't stop there; included are also relevant talks, tutorials, conferences, news, and articles. Our repository is constantly updated to ensure you have the most current information at your fingertips.

github

: 1.1k

data-prep-kit

Data Prep Kit is a community project aimed at democratizing and speeding up unstructured data preparation for LLM app developers. It provides high-level APIs and modules for transforming data (code, language, speech, visual) to optimize LLM performance across different use cases. The toolkit supports Python, Ray, Spark, and Kubeflow Pipelines runtimes, offering scalability from laptop to datacenter-scale processing. Developers can contribute new custom modules and leverage the data processing library for building data pipelines. Automation features include workflow automation with Kubeflow Pipelines for transform execution.

github

: 530

RAGHub

RAGHub is a community-driven project focused on cataloging new and emerging frameworks, projects, and resources in the Retrieval-Augmented Generation (RAG) ecosystem. It aims to help users stay ahead of changes in the field by providing a platform for the latest innovations in RAG. The repository includes information on RAG frameworks, evaluation frameworks, optimization frameworks, citation frameworks, engines, search reranker frameworks, projects, resources, and real-world use cases across industries and professions.

github

: 465

Awesome-LLM-Large-Language-Models-Notes

Awesome-LLM-Large-Language-Models-Notes is a repository that provides a comprehensive collection of information on various Large Language Models (LLMs) classified by year, size, and name. It includes details on known LLM models, their papers, implementations, and specific characteristics. The repository also covers LLM models classified by architecture, must-read papers, blog articles, tutorials, and implementations from scratch. It serves as a valuable resource for individuals interested in understanding and working with LLMs in the field of Natural Language Processing (NLP).

github

: 156

llm-compression-intelligence

This repository presents the findings of the paper "Compression Represents Intelligence Linearly". The study reveals a strong linear correlation between the intelligence of LLMs, as measured by benchmark scores, and their ability to compress external text corpora. Compression efficiency, derived from raw text corpora, serves as a reliable evaluation metric that is linearly associated with model capabilities. The repository includes the compression corpora used in the paper, code for computing compression efficiency, and data collection and processing pipelines.

github

: 98

OpenGPT-4o

OpenGPT 4o is a free alternative to OpenAI GPT 4o. It offers various features such as free pricing, image and video generation, image QnA, voice and video chat, multilingual support, high customization, and continuous learning capability. The tool aims to provide an alternative to OpenAI GPT 4o with enhanced capabilities and features for users.

github

: 144

CogVLM2

CogVLM2 is a new generation of open source models that offer significant improvements in benchmarks such as TextVQA and DocVQA. It supports 8K content length, image resolution up to 1344 * 1344, and both Chinese and English languages. The project provides basic calling methods, fine-tuning examples, and OpenAI API format calling examples to help developers quickly get started with the model.

github

: 83

are-copilots-local-yet

Current trends and state of the art for using open & local LLM models as copilots to complete code, generate projects, act as shell assistants, automatically fix bugs, and more. This document is a curated list of local Copilots, shell assistants, and related projects, intended to be a resource for those interested in a survey of the existing tools and to help developers discover the state of the art for projects like these.

github

: 511

For similar tasks

ml-road-map

github

: 253

God-Level-AI

A drill of scientific methods, processes, algorithms, and systems to build stories & models. An in-depth learning resource for humans. This repository is designed for individuals aiming to excel in the field of Data and AI, providing video sessions and text content for learning. It caters to those in leadership positions, professionals, and students, emphasizing the need for dedicated effort to achieve excellence in the tech field. The content covers various topics with a focus on practical application.

github

: 3.5k

ai_igu

AI-IGU is a GitHub repository focused on Artificial Intelligence (AI) concepts, technology, software development, and algorithm improvement for all ages and professions. It emphasizes the importance of future software for future scientists and the increasing need for software developers in the industry. The repository covers various topics related to AI, including machine learning, deep learning, data mining, data science, big data, and more. It provides educational materials, practical examples, and hands-on projects to enhance software development skills and create awareness in the field of AI.

github

: 74

AI-PhD-S24

AI-PhD-S24 is a mono-repo for the PhD course 'AI for Business Research' at CUHK Business School in Spring 2024. The course aims to provide a basic understanding of machine learning and artificial intelligence concepts/methods used in business research, showcase how ML/AI is utilized in business research, and introduce state-of-the-art AI/ML technologies. The course includes scribed lecture notes, class recordings, and covers topics like AI/ML fundamentals, DL, NLP, CV, unsupervised learning, and diffusion models.

github

: 90

Dispider

Dispider is an implementation enabling real-time interactions with streaming videos, providing continuous feedback in live scenarios. It separates perception, decision-making, and reaction into asynchronous modules, ensuring timely interactions. Dispider outperforms VideoLLM-online on benchmarks like StreamingBench and excels in temporal reasoning. The tool requires CUDA 11.8 and specific library versions for optimal performance.

github

: 92

AI-PhD-S25

AI-PhD-S25 is a mono-repo for the DOTE 6635 course on AI for Business Research at CUHK Business School. The course aims to provide a fundamental understanding of ML/AI concepts and methods relevant to business research, explore applications of ML/AI in business research, and discover cutting-edge AI/ML technologies. The course resources include Google CoLab for code distribution, Jupyter Notebooks, Google Sheets for group tasks, Overleaf template for lecture notes, replication projects, and access to HPC Server compute resource. The course covers topics like AI/ML in business research, deep learning basics, attention mechanisms, transformer models, LLM pretraining, posttraining, causal inference fundamentals, and more.

github

: 64

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675