data-scientist-roadmap2024

data-scientist-roadmap2024

Here lies the resources and topics necessary for the role of Data Scientist and Machine Learning

Stars: 254

Visit
 screenshot

The Data Scientist Roadmap2024 provides a comprehensive guide to mastering essential tools for data science success. It includes programming languages, machine learning libraries, cloud platforms, and concepts categorized by difficulty. The roadmap covers a wide range of topics from programming languages to machine learning techniques, data visualization tools, and DevOps/MLOps tools. It also includes web development frameworks and specific concepts like supervised and unsupervised learning, NLP, deep learning, reinforcement learning, and statistics. Additionally, it delves into DevOps tools like Airflow and MLFlow, data visualization tools like Tableau and Matplotlib, and other topics such as ETL processes, optimization algorithms, and financial modeling.

README:

Data Scientist Roadmap2024

Description

Mastering the tools in this guide — including programming languages, machine learning libraries, and cloud platforms — is crucial for data science success.

I've categorized them based on difficulty:

  • Green text: Mandatory and easiest
  • Yellow text: Mediocre tough
  • Red text: Toughest and for pros (color codes are present here)

Structure:

List of tools, libraries and concepts


Programming Languages:

  • Python
    • GRIND 75 - Questions and multiple solutions.
  • R

Frameworks & Libraries:


Cloud Platforms & Services:

  • Docker (Containerization platform)
  • Learn any one of the following:
    • GCP (Google Cloud Platform)
      • Cloud Storage
      • Compute Engine
      • Cloud SQL
      • Cloud Functions
      • BigQuery
      • AI Platform (includes Vertex AI)
    • Azure (Microsoft Azure)
      • Blob Storage
      • Virtual Machines
      • SQL Database / Azure Database for PostgreSQL/MySQL
      • Azure Functions
      • Azure Synapse Analytics
      • Azure Machine Learning
    • AWS (Amazon Web Services)
      • AWS S3
      • AWS EC2
      • AWS RDS
      • AWS Lambda
      • AWS Redshift
      • AWS SageMaker
  • Kubeflow (Cloud-native machine learning platform)
  • Kubernetes (Container orchestration platform)

Data Tools & Libraries:

  • SQL (including OLAP & OLTP variations)
    • SQLBOLT, a simple & interactive. [2H]
  • Pandas
  • Elasticsearch
  • Dask (Parallel computing library for big data)
  • Spark (Large-scale data processing framework)
  • Airbyte (Open-source data integration platform)

Web Development Frameworks:

  • FastAPI
  • Uvicorn (likely mentioned in conjunction with FastAPI)
  • Streamlit (Machine learning app development framework)

Machine Learning Concepts:

  • Supervised Learning
    • Regression
    • Classification
  • Unsupervised Learning
    • Clustering
    • Dimensionality Reduction
  • Recommendation Systems
  • Time Series Forecasting
  • Natural Language Processing (NLP)
    • Text Mining
    • Natural Language Understanding (NLU)
      • Sentiment Analysis
      • Named Entity Recognition (NER)
      • Question Answering (QA)
    • Natural Language Generation (NLG)
  • Deep Learning Techniques
    • Convolutional Neural Networks (CNNs)
    • Long Short-Term Memory networks (LSTMs)
    • Generative AI
  • Reinforcement Learning
  • Bayesian Optimization
  • Statistics

DevOps & MLOps Tools:

  • Airflow (Workflow orchestration tool)
  • MLFlow (Machine learning lifecycle management)
  • Prometheus (Monitoring and alerting system)
  • Grafana (Data visualization and analytics tool)
  • Git version control (e.g., GitLab, GitHub)

Data Visualization Tools:

  • Tableau
  • Matplotlib (Python plotting library)
  • Seaborn (Statistical data visualization library built on top of Matplotlib)
  • Power BI (Microsoft business intelligence platform)

Other:

  • ETL (Extract, Transform, Load) processes

  • Optimisation algorithms (can be broader than just machine learning)

  • Distributed training

  • Curse of dimensionality

  • Financial modeling

    • MIT Course: Mathematics With Applications In Finance
      • The purpose of the class is to expose undergraduate and graduate students to the mathematical concepts and techniques used in the financial industry. Mathematics lectures are mixed with lectures illustrating the corresponding application in the financial industry.
  • LLMs

    • Lang-chain Agents
    • Prompt engineering
    • RAG
    • Fine-tuning

Interviews

Notes and Study Material

  • Neural Networks
    • Part 1: Basics, Gradient Descent, Backpropagation, Learning Rate, Activation Functions.
    • Part 2: Premitive systems, RNN, GRU and LSTM, Transformers, BERT
    • A quick recap, designed for last review before any interview.

Work in progress:

  1. Updating the pytorch material with notebooks containing code & concepts.(3/20 done)
  2. Updating notes for Neural Networks consisting on basics, RNN, GRU, LSTM, Tranformers, etc.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for data-scientist-roadmap2024

Similar Open Source Tools

For similar tasks

For similar jobs