
Ultimate-Data-Science-Toolkit---From-Python-Basics-to-GenerativeAI
None
Stars: 897

Ultimate Data Science Toolkit is a comprehensive repository covering Python basics to Generative AI. It includes modules on Python programming, data analysis, statistics, machine learning, MLOps, case studies, and deep learning. The repository provides detailed tutorials on various topics such as Python data structures, control statements, functions, modules, object-oriented programming, exception handling, file handling, web API, databases, list comprehension, lambda functions, Pandas, Numpy, data visualization, statistical analysis, supervised and unsupervised machine learning algorithms, model serialization, ML pipeline orchestration, case studies, and deep learning concepts like neural networks and autoencoders.
README:
Topic Name | What's Covered |
---|---|
Intro to Python | Applications and Features of Python, Hello World Program, Identifiers and Rules to define identifiers, Data Types (numeric, boolean, strings, list, tuple, set and dict), Comments, Input and Output, Operators - Arithmatic, Reltaional, Equality, Logical, Bitwise, Assignment, Ternary, Identity and Membership |
Data Structures in Python (Strings, List, Tuple, Set, Dictionary) | Strings - Creating a string, Indexing, Slicing, Split, Join, etc, List - Initialization, Indexing, Slicing, Sorting, Appending, etc, Tuple - Initialization, Indexing, Slicing, Count, Index, etc, Set - Initialization, Unordered Sequence, Set Opertaions, etc, Dictionary - Initialization, Updating, Keys, Values, Items, etc |
Control Statements (Conditionals and Loops) | Conditional Statements - Introducing Indentation, if statement, if...else statement, if..elif...else statement, Nested if else statement, Loops - while loops, while...else loop, Membership operator, for loop, for...else loop, Nested Loops, Break and Continue Statement, Why else? |
Functions and Modules | Functions - Introduction to Python Functions, Function Definition and Calling, Functions with Arguments/Parameters, Return Statement, Scope of a Variable, Global Variables, Modules - Introduction to Modules, Importing a Module, Aliasing, from...import statement, import everything, Some important modules - math, platform, random, webbrowser, etc |
Object Oriented Programming | Classes and Objects - Creating a class, Instantiating an Object, Constructor, Class Members - Variables and Mentods, Types of Variables - Instance, Static and Local Variables, Types of Methods - Instance, Class and Static Methods, Access Modifiers - Public, Private and Protected, Pillars of Object Oriented Programming - Inheritance, Polymorphism, Abstraction and Encapsulation, Setters and Getters, Inheritance vs Association |
Exception Handling | Errors vs Exception, Syntax and Indentation Errors, try...except block, Control Flow in try...except block, try with multiple except, finally block, try...except...else, Nested try...except...finally, User Defined Exception |
File Handling | Introduction to File Handling, Opening and Closing a File, File Object Properties, Read Data from Text Files, Write Data to Text Files, with statement, Renaming and Deleting Files |
Web API | Application Programming Interface, Indian Space Station API, API Request, Status Code, Query Parameters, Getting JSON from an API Request, Working with JSON - dump and load, Working with Twitter API |
Databases | Introduction to Databases, SQLite3 - Connecting Python with SQLite3, Performing CRUD Opertations, MySQL - Connecting Python with MySQL, Performing CRUD Opertations, MongoDB - Connecting Python with MongoDB, Performing CRUD Opertations, Object Relation Mapping - SQLAlchemy ORM, CRUD operations and Complex DB operations |
List Comprehension, Lambda, Filter, Map, Reduce | List Comprehension, Anonymous Functions, Filter, Map, Reduce, Function Aliasing |
Problem Solving for Interviews | Swapping two numbers, Factorial of a number, Prime Number, Fibbonnacci Sequence, Armstrong Number, Palindrome Number, etc |
Topic Name | What's Covered |
---|---|
Data Analytics Framework | Data Collection, Business Understanding, Exploratory Data Analysis, Data Preparation, Model Building, Model Evaluation, Deployment, Understanding Cross Industry Standard Process for Data Mining (CRISP-DM) and Microsoft's Team Data Science Process (TDSP) |
Numpy | Array Oriented Numerical Computations using Numpy, Creating a Numpy Array, Basic Operations on Numpy Array - Check Dimensions, Shape, Datatypes and ItemSize, Why Numpy, Various ways to create Numpy Array, Numpy arange() function, Numpy Random Module - rand(), randn(), randint(), uniform(), etc, Indexing and Slicing in Numpy Arrays, Applying Mathematical Operations on Numpy Array - add(), subtract(), multiply(), divide(), dot(), matmul(), sum(), log(), exp(), etc, Statistical Operations on Numpy Array - min(), max(), mean(), median(), var(), std(), corrcoef(), etc, Reshaping a Numpy Array, Miscellaneous Topics - Linspace, Sorting, Stacking, Concatenation, Append, Where and Numpy Broadcasting |
Pandas for Beginners | Pandas Data Structures - Series, Dataframe and Panel, Creating a Series, Data Access, Creating a Dataframe using Tuples and Dictionaries, DataFrame Attributes - columns, shape, dtypes, axes, values, etc, DataFrame Methods - head(), tail(), info(), describe(), Working with .csv and .xlsx - read_csv() and read_excel(), DataFrame to .csv and .xlsx - to_csv() and to_excel() |
Advance Pandas Operations | What's Covered |
Case Study - Pandas Manipulation | What's Covered |
Missing Value Treatment | What's Covered |
Visuallization Basics - Matplotlib and Seaborn | What's Covered |
Case Study - Covid_19_TimeSeries | What's Covered |
Plotly and Express | What's Covered |
Outliers - Coming Soon | What's Covered |
Topic Name | What's Covered |
---|---|
Normal Distribution | What's Covered |
Central Limit Theorem | What's Covered |
Hypothesis Testing | What's Covered |
Chi Square Testing | What's Covered |
Performing Statistical Test | What's Covered |
- Data Preparation and Modelling with SKLearn
- Working with Text Data
- Working with Image Data
-
Supervised ML Algorithms
- K - Nearest Neighbours
- Linear Regression
- Logistic Regression
- Gradient Descent
- Decision Trees
- Support Vector Machines
- Models with Feature Engineering
- Hyperparameter Tuning
- Ensembles -
Unsupervised ML Algorithms
- Clustering
- Principal Component Analysis
Topic Name | What's Covered |
---|---|
Model Serialization and Deserialization | What's Covered |
Application Integration | What's Covered |
MLFlow - Experiment Tracking and Model Management | What's Covered |
Prefect - Orchestrate ML Pipeline | What's Covered |
Topic Name | What's Covered |
---|---|
Car Price Prediction (Regression) | What's Covered |
Airline Sentiment Analysis (NLP - Classification) | What's Covered |
Adult Income Prediction (Classification) | What's Covered |
Web App Development + Serialization and Deserialization | What's Covered |
AWS Deployment | What's Covered |
Streamlit Heroku Deployment | What's Covered |
Customer Segmentation | What's Covered |
Web Scrapping | What's Covered |
Topic Name | What's Covered |
---|---|
Introduction to Deep Learning | What's Covered |
Training a Deep Neural Network + TensorFlow.Keras | What's Covered |
Convolutional Neural Network + TensorFlow.Keras | What's Covered |
Auto Encoders for Image Compression | What's Covered |
Recurrent Neural Network (Coming Soon) | What's Covered |
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Ultimate-Data-Science-Toolkit---From-Python-Basics-to-GenerativeAI
Similar Open Source Tools

Ultimate-Data-Science-Toolkit---From-Python-Basics-to-GenerativeAI
Ultimate Data Science Toolkit is a comprehensive repository covering Python basics to Generative AI. It includes modules on Python programming, data analysis, statistics, machine learning, MLOps, case studies, and deep learning. The repository provides detailed tutorials on various topics such as Python data structures, control statements, functions, modules, object-oriented programming, exception handling, file handling, web API, databases, list comprehension, lambda functions, Pandas, Numpy, data visualization, statistical analysis, supervised and unsupervised machine learning algorithms, model serialization, ML pipeline orchestration, case studies, and deep learning concepts like neural networks and autoencoders.

PocketFlow
Pocket Flow is a 100-line minimalist LLM framework designed for (Multi-)Agents, Workflow, RAG, etc. It provides a core abstraction for LLM projects by focusing on computation and communication through a graph structure and shared store. The framework aims to support the development of LLM Agents, such as Cursor AI, by offering a minimal and low-level approach that is well-suited for understanding and usage. Users can install Pocket Flow via pip or by copying the source code, and detailed documentation is available on the project website.

helicone
Helicone is an open-source observability platform designed for Language Learning Models (LLMs). It logs requests to OpenAI in a user-friendly UI, offers caching, rate limits, and retries, tracks costs and latencies, provides a playground for iterating on prompts and chat conversations, supports collaboration, and will soon have APIs for feedback and evaluation. The platform is deployed on Cloudflare and consists of services like Web (NextJs), Worker (Cloudflare Workers), Jawn (Express), Supabase, and ClickHouse. Users can interact with Helicone locally by setting up the required services and environment variables. The platform encourages contributions and provides resources for learning, documentation, and integrations.

YaneuraOu
YaneuraOu is the World's Strongest Shogi engine (AI player), winner of WCSC29 and other prestigious competitions. It is an educational and USI compliant engine that supports various features such as Ponder, MultiPV, and ultra-parallel search. The engine is known for its compatibility with different platforms like Windows, Ubuntu, macOS, and ARM. Additionally, YaneuraOu offers a standard opening book format, on-the-fly opening book support, and various maintenance commands for opening books. With a massive transposition table size of up to 33TB, YaneuraOu is a powerful and versatile tool for Shogi enthusiasts and developers.

X-AnyLabeling
X-AnyLabeling is a robust annotation tool that seamlessly incorporates an AI inference engine alongside an array of sophisticated features. Tailored for practical applications, it is committed to delivering comprehensive, industrial-grade solutions for image data engineers. This tool excels in swiftly and automatically executing annotations across diverse and intricate tasks.

LLaMA-Factory
LLaMA Factory is a unified framework for fine-tuning 100+ large language models (LLMs) with various methods, including pre-training, supervised fine-tuning, reward modeling, PPO, DPO and ORPO. It features integrated algorithms like GaLore, BAdam, DoRA, LongLoRA, LLaMA Pro, LoRA+, LoftQ and Agent tuning, as well as practical tricks like FlashAttention-2, Unsloth, RoPE scaling, NEFTune and rsLoRA. LLaMA Factory provides experiment monitors like LlamaBoard, TensorBoard, Wandb, MLflow, etc., and supports faster inference with OpenAI-style API, Gradio UI and CLI with vLLM worker. Compared to ChatGLM's P-Tuning, LLaMA Factory's LoRA tuning offers up to 3.7 times faster training speed with a better Rouge score on the advertising text generation task. By leveraging 4-bit quantization technique, LLaMA Factory's QLoRA further improves the efficiency regarding the GPU memory.

WristAssist
WristAssist is the first app for all WearOS watches that fully brings the classic ChatGPT and DALL-E features to your wrist. It allows users to chat, save chats, edit chats, view galleries, create images, and share images directly from their wrist. The app provides a seamless user experience with easy installation from Google Play and free lifetime updates. Users can refer to the Wiki page for detailed setup instructions. WristAssist is licensed under the terms of the Apache 2.0 license.

vectordb-recipes
This repository contains examples, applications, starter code, & tutorials to help you kickstart your GenAI projects. * These are built using LanceDB, a free, open-source, serverless vectorDB that **requires no setup**. * It **integrates into python data ecosystem** so you can simply start using these in your existing data pipelines in pandas, arrow, pydantic etc. * LanceDB has **native Typescript SDK** using which you can **run vector search** in serverless functions! This repository is divided into 3 sections: - Examples - Get right into the code with minimal introduction, aimed at getting you from an idea to PoC within minutes! - Applications - Ready to use Python and web apps using applied LLMs, VectorDB and GenAI tools - Tutorials - A curated list of tutorials, blogs, Colabs and courses to get you started with GenAI in greater depth.

neural-compressor
Intel® Neural Compressor is an open-source Python library that supports popular model compression techniques such as quantization, pruning (sparsity), distillation, and neural architecture search on mainstream frameworks such as TensorFlow, PyTorch, ONNX Runtime, and MXNet. It provides key features, typical examples, and open collaborations, including support for a wide range of Intel hardware, validation of popular LLMs, and collaboration with cloud marketplaces, software platforms, and open AI ecosystems.

Steel-LLM
Steel-LLM is a project to pre-train a large Chinese language model from scratch using over 1T of data to achieve a parameter size of around 1B, similar to TinyLlama. The project aims to share the entire process including data collection, data processing, pre-training framework selection, model design, and open-source all the code. The goal is to enable reproducibility of the work even with limited resources. The name 'Steel' is inspired by a band '万能青年旅店' and signifies the desire to create a strong model despite limited conditions. The project involves continuous data collection of various cultural elements, trivia, lyrics, niche literature, and personal secrets to train the LLM. The ultimate aim is to fill the model with diverse data and leave room for individual input, fostering collaboration among users.

bytedesk
Bytedesk is an AI-powered customer service and team instant messaging tool that offers features like enterprise instant messaging, online customer service, large model AI assistant, and local area network file transfer. It supports multi-level organizational structure, role management, permission management, chat record management, seating workbench, work order system, seat management, data dashboard, manual knowledge base, skill group management, real-time monitoring, announcements, sensitive words, CRM, report function, and integrated customer service workbench services. The tool is designed for team use with easy configuration throughout the company, and it allows file transfer across platforms using WiFi/hotspots without the need for internet connection.

fastapi-admin
智元 Fast API is a one-stop API management system that unifies various LLM APIs in terms of format, standards, and management to achieve the ultimate in functionality, performance, and user experience. It includes features such as model management with intelligent and regex matching, backup model functionality, key management, proxy management, company management, user management, and chat management for both admin and user ends. The project supports cluster deployment, multi-site deployment, and cross-region deployment. It also provides a public API site for registration with a contact to the author for a 10 million quota. The tool offers a comprehensive dashboard, model management, application management, key management, and chat management functionalities for users.

ipex-llm
IPEX-LLM is a PyTorch library for running Large Language Models (LLMs) on Intel CPUs and GPUs with very low latency. It provides seamless integration with various LLM frameworks and tools, including llama.cpp, ollama, Text-Generation-WebUI, HuggingFace transformers, and more. IPEX-LLM has been optimized and verified on over 50 LLM models, including LLaMA, Mistral, Mixtral, Gemma, LLaVA, Whisper, ChatGLM, Baichuan, Qwen, and RWKV. It supports a range of low-bit inference formats, including INT4, FP8, FP4, INT8, INT2, FP16, and BF16, as well as finetuning capabilities for LoRA, QLoRA, DPO, QA-LoRA, and ReLoRA. IPEX-LLM is actively maintained and updated with new features and optimizations, making it a valuable tool for researchers, developers, and anyone interested in exploring and utilizing LLMs.

LynxHub
LynxHub is a platform that allows users to seamlessly install, configure, launch, and manage all their AI interfaces from a single, intuitive dashboard. It offers features like AI interface management, arguments manager, custom run commands, pre-launch actions, extension management, in-app tools like terminal and web browser, AI information dashboard, Discord integration, and additional features like theme options and favorite interface pinning. The platform supports modular design for custom AI modules and upcoming extensions system for complete customization. LynxHub aims to streamline AI workflow and enhance user experience with a user-friendly interface and comprehensive functionalities.

ASTRA.ai
Astra.ai is a multimodal agent powered by TEN, showcasing its capabilities in speech, vision, and reasoning through RAG from local documentation. It provides a platform for developing AI agents with features like RTC transportation, extension store, workflow builder, and local deployment. Users can build and test agents locally using Docker and Node.js, with prerequisites including Agora App ID, Azure's speech-to-text and text-to-speech API keys, and OpenAI API key. The platform offers advanced customization options through config files and API keys setup, enabling users to create and deploy their AI agents for various tasks.
For similar tasks

Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.

sorrentum
Sorrentum is an open-source project that aims to combine open-source development, startups, and brilliant students to build machine learning, AI, and Web3 / DeFi protocols geared towards finance and economics. The project provides opportunities for internships, research assistantships, and development grants, as well as the chance to work on cutting-edge problems, learn about startups, write academic papers, and get internships and full-time positions at companies working on Sorrentum applications.

tidb
TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

zep-python
Zep is an open-source platform for building and deploying large language model (LLM) applications. It provides a suite of tools and services that make it easy to integrate LLMs into your applications, including chat history memory, embedding, vector search, and data enrichment. Zep is designed to be scalable, reliable, and easy to use, making it a great choice for developers who want to build LLM-powered applications quickly and easily.

telemetry-airflow
This repository codifies the Airflow cluster that is deployed at workflow.telemetry.mozilla.org (behind SSO) and commonly referred to as "WTMO" or simply "Airflow". Some links relevant to users and developers of WTMO: * The `dags` directory in this repository contains some custom DAG definitions * Many of the DAGs registered with WTMO don't live in this repository, but are instead generated from ETL task definitions in bigquery-etl * The Data SRE team maintains a WTMO Developer Guide (behind SSO)

mojo
Mojo is a new programming language that bridges the gap between research and production by combining Python syntax and ecosystem with systems programming and metaprogramming features. Mojo is still young, but it is designed to become a superset of Python over time.

pandas-ai
PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps you to explore, clean, and analyze your data using generative AI.

databend
Databend is an open-source cloud data warehouse that serves as a cost-effective alternative to Snowflake. With its focus on fast query execution and data ingestion, it's designed for complex analysis of the world's largest datasets.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.