![python-tutorial-notebooks](/statics/github-mark.png)
python-tutorial-notebooks
Python tutorials as Jupyter Notebooks for NLP, ML, AI
Stars: 121
![screenshot](/screenshots_githubs/dcavar-python-tutorial-notebooks.jpg)
This repository contains Jupyter-based tutorials for NLP, ML, AI in Python for classes in Computational Linguistics, Natural Language Processing (NLP), Machine Learning (ML), and Artificial Intelligence (AI) at Indiana University.
README:
(C) 2016-2024 by Damir Cavar
NLP-Lab at Indiana University.
- Anthropic / VoyageAI Embeddings
- OpenAI Embeddings
- Claude 3 Interaction using the Anthropic API
- GPT-4 interaction using the OpenAI API
- Simple Transformer-based Text Classification
- Stanza Tutorial
- Converting SEC CIKs to a Knowledge Graph
- Allegro Graph example
- Extracting Abbreviations
- Bayesian Classification for Machine Learning for Computational Linguistics
- Python Tutorial 1: Part-of-Speech Tagging 1
- Lexical Clustering
- Linear Algebra
- Neural Network Example with Keras
- Computing Finite State Automata
- Parallel Processing on Multiple Threads
- Perceptron Learning in Python
- Clustering with Scikit-learn
- Simple Language ID with N-grams
- Support Vector Machine (SVM) Classifier Example
- Scikit-Learn for Computational Linguists
- Tutorial: Tokens and N-grams
- Tutorial 1: Part-of-Speech Tagging 1
- Tutorial 2: Hidden Markov Models
- Word Sense Disambiguation
- Python examples and notes for Machine Learning for Computational Linguistics
- RDFlib Graphs
- Scikit-learn Logistic Regression
- Convert the Stanford Sentiment Treebank Data to CSV
- TextRank Example
- NLTK: Texts and Frequencies - N-gram models and frequency profiles
- Parsing with NLTK
- Parsing with NLTK and Foma
- Categorial Grammar Parsing in NLTK
- Dependency Grammar in NLTK
- Document Classification Tutorial 1 - Amazon Reviews
- WordNet using NLTK
- WordNet and NLTK
- Framenet in NLTK
- FrameNet Examples using NLTK
- PropBank in NLTK
- Machine Translation in Python 3 with NLTK
- N-gram Models from Text for Language Models
- Probabilistic Context-free Grammar (PCFG) Parsing using NLTK
- Python for Text Similarities 1
See the licensing details on the individual documents and in the LICENSE file in the code folder.
The files in this folder are Jupyter-based tutorials for NLP, ML, AI in Python for classes I teach in Computational Linguistics, Natural Language Processing (NLP), Machine Learning (ML), and Artificial Intelligence (AI) at Indiana University.
If you find this material useful, please cite the author and source (that is Damir Cavar and all the sources cited in the relevant notebooks). Please let me know if you have some suggestions on how to correct the notebooks, improve them, or add some material and explanations.
The instructions below are somewhat outdated. I use just Jupyter-Lab now. Follow the instructions here to set it up on different machine types and operating systems.
To run this material in Jupyter you need to have Python 3.x and Jupyter installed. You can save yourself some trouble by using the Anaconda Python 3.x distribution.
Clone the project folder using:
git clone https://github.com/dcavar/python-tutorial-for-ipython.git
Some of the notebooks may contain code that requires various kinds of [Python] modules to be installed in specific versions. Some of the installations might be complicated and problematic. I am working on a more detailed description of installation procedures and dependencies for each notebook. Stay tuned, this is coming soon.
Jupyter is a great tool for computational publications, tutorials, and exercises. I set up my favorite components for Jupyter on Linux (for example Ubuntu) this way:
Assuming that I have some of the development tools installed, as for example gcc, make, etc., I install the packages python3-pip and python3-dev:
sudo apt install python3-pip python3-dev
After that I update the global system version of pip to the newest version:
sudo -H pip3 install -U pip
Then I install the newest Jupyter and Jupyterlab modules globally, updating any previously installed version:
sudo -H pip3 install -U jupyter jupyterlab
The module that we should not forget is plotly:
sudo -H pip3 install -U plotly
Scala, Clojure, and Groovy are extremely interesting languages as well, and I love working with Apache Spark, thus I install BeakerX as well. This requires two other [Python] modules: py4j and pandas. This presupposes that there is an existing Java JDK version 8 or newer already installed on the system. I install all the BeakerX related packages:
sudo -H pip3 install -U py4j
sudo -H pip3 install -U pandas
sudo -H pip3 install -U beakerx
To configure and install all BeakerX components I run:
sudo -H beakerx install
Some of the components I like to use require Node.js. On Ubuntu I usually add the newest Node.js as a PPA and not via Ubuntu Snap. Some instructions how to achieve that can be found here. To install Node.js on Ubuntu simply run:
sudo apt install nodejs
The following commands will add plugins and extensions to Jupyter globally:
sudo -H jupyter labextension install @jupyter-widgets/jupyterlab-manager
sudo -H jupyter labextension install @jupyterlab/plotly-extension
sudo -H jupyter labextension install beakerx-jupyterlab
Another useful package is Voilà, which allows you to turn Jupyter notebooks into standalone web applications. I install it using:
sudo -H pip3 install voila
Now the initial version of the platform is ready to go.
To start the Jupyter notebook viewer/editor on your local machine change into the notebooks folder within the cloned project folder and run the following command:
jupyter notebook
A browser window should open up that allows you full access to the notebooks.
Alternatively, check out the instructions how to launch JupyterLab, BeakerX, etc.
Enjoy!
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for python-tutorial-notebooks
Similar Open Source Tools
![python-tutorial-notebooks Screenshot](/screenshots_githubs/dcavar-python-tutorial-notebooks.jpg)
python-tutorial-notebooks
This repository contains Jupyter-based tutorials for NLP, ML, AI in Python for classes in Computational Linguistics, Natural Language Processing (NLP), Machine Learning (ML), and Artificial Intelligence (AI) at Indiana University.
![kdbai-samples Screenshot](/screenshots_githubs/KxSystems-kdbai-samples.jpg)
kdbai-samples
KDB.AI is a time-based vector database that allows developers to build scalable, reliable, and real-time applications by providing advanced search, recommendation, and personalization for Generative AI applications. It supports multiple index types, distance metrics, top-N and metadata filtered retrieval, as well as Python and REST interfaces. The repository contains samples demonstrating various use-cases such as temporal similarity search, document search, image search, recommendation systems, sentiment analysis, and more. KDB.AI integrates with platforms like ChatGPT, Langchain, and LlamaIndex. The setup steps require Unix terminal, Python 3.8+, and pip installed. Users can install necessary Python packages and run Jupyter notebooks to interact with the samples.
![oneAPI-samples Screenshot](/screenshots_githubs/oneapi-src-oneAPI-samples.jpg)
oneAPI-samples
The oneAPI-samples repository contains a collection of samples for the Intel oneAPI Toolkits. These samples cover various topics such as AI and analytics, end-to-end workloads, features and functionality, getting started samples, Jupyter notebooks, direct programming, C++, Fortran, libraries, publications, rendering toolkit, and tools. Users can find samples based on expertise, programming language, and target device. The repository structure is organized by high-level categories, and platform validation includes Ubuntu 22.04, Windows 11, and macOS. The repository provides instructions for getting samples, including cloning the repository or downloading specific tagged versions. Users can also use integrated development environments (IDEs) like Visual Studio Code. The code samples are licensed under the MIT license.
![oreilly-retrieval-augmented-gen-ai Screenshot](/screenshots_githubs/sinanuozdemir-oreilly-retrieval-augmented-gen-ai.jpg)
oreilly-retrieval-augmented-gen-ai
This repository focuses on Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs). It provides code and resources to augment LLMs with real-time data for dynamic, context-aware applications. The content covers topics such as semantic search, fine-tuning embeddings, building RAG chatbots, evaluating LLMs, and using knowledge graphs in RAG. Prerequisites include Python skills, knowledge of machine learning and LLMs, and introductory experience with NLP and AI models.
![llm-on-openshift Screenshot](/screenshots_githubs/rh-aiservices-bu-llm-on-openshift.jpg)
llm-on-openshift
This repository provides resources, demos, and recipes for working with Large Language Models (LLMs) on OpenShift using OpenShift AI or Open Data Hub. It includes instructions for deploying inference servers for LLMs, such as vLLM, Hugging Face TGI, Caikit-TGIS-Serving, and Ollama. Additionally, it offers guidance on deploying serving runtimes, such as vLLM Serving Runtime and Hugging Face Text Generation Inference, in the Single-Model Serving stack of Open Data Hub or OpenShift AI. The repository also covers vector databases that can be used as a Vector Store for Retrieval Augmented Generation (RAG) applications, including Milvus, PostgreSQL+pgvector, and Redis. Furthermore, it provides examples of inference and application usage, such as Caikit, Langchain, Langflow, and UI examples.
![papersgpt-for-zotero Screenshot](/screenshots_githubs/papersgpt-papersgpt-for-zotero.jpg)
papersgpt-for-zotero
PapersGPT For Zotero is an AI plugin that enhances papers reading and research efficiency by integrating cutting-edge LLMs and offering seamless Zotero integration. Users can ask questions, extract insights, and converse with PDFs directly, making it a powerful research assistant for scholars, researchers, and anyone dealing with large amounts of text in PDF format. The plugin ensures privacy and data safety by using locally stored models and modules, with the ability to switch between different models easily. It provides a user-friendly interface for managing and chatting documents within Zotero, making research tasks more streamlined and productive.
![ai-tutor-rag-system Screenshot](/screenshots_githubs/towardsai-ai-tutor-rag-system.jpg)
ai-tutor-rag-system
The AI Tutor RAG System repository contains Jupyter notebooks supporting the RAG course, focusing on enhancing AI models with retrieval-based methods. It covers foundational and advanced concepts in retrieval-augmented generation, including data retrieval techniques, model integration with retrieval systems, and practical applications of RAG in real-world scenarios.
![copilot-codespaces-vscode Screenshot](/screenshots_githubs/skills-copilot-codespaces-vscode.jpg)
copilot-codespaces-vscode
GitHub Copilot is an AI-powered tool that offers autocomplete-style suggestions for coding in VS Code and Codespaces. It analyzes the context in the file being edited and related files to provide code and comment suggestions. This tool is designed for developers, DevOps engineers, software development managers, and testers. Users can learn how to install Copilot, accept suggestions from code and comments, and build JavaScript files with code generated by the AI. To use GitHub Copilot, a subscription is required, and the course can be completed in under an hour.
![model_server Screenshot](/screenshots_githubs/openvinotoolkit-model_server.jpg)
model_server
OpenVINO™ Model Server (OVMS) is a high-performance system for serving models. Implemented in C++ for scalability and optimized for deployment on Intel architectures, the model server uses the same architecture and API as TensorFlow Serving and KServe while applying OpenVINO for inference execution. Inference service is provided via gRPC or REST API, making deploying new algorithms and AI experiments easy.
![OpenDAN-Personal-AI-OS Screenshot](/screenshots_githubs/fiatrete-OpenDAN-Personal-AI-OS.jpg)
OpenDAN-Personal-AI-OS
OpenDAN is an open source Personal AI OS that consolidates various AI modules for personal use. It empowers users to create powerful AI agents like assistants, tutors, and companions. The OS allows agents to collaborate, integrate with services, and control smart devices. OpenDAN offers features like rapid installation, AI agent customization, connectivity via Telegram/Email, building a local knowledge base, distributed AI computing, and more. It aims to simplify life by putting AI in users' hands. The project is in early stages with ongoing development and future plans for user and kernel mode separation, home IoT device control, and an official OpenDAN SDK release.
![semantic-kernel Screenshot](/screenshots_githubs/microsoft-semantic-kernel.jpg)
semantic-kernel
Semantic Kernel is an SDK that integrates Large Language Models (LLMs) like OpenAI, Azure OpenAI, and Hugging Face with conventional programming languages like C#, Python, and Java. Semantic Kernel achieves this by allowing you to define plugins that can be chained together in just a few lines of code. What makes Semantic Kernel _special_ , however, is its ability to _automatically_ orchestrate plugins with AI. With Semantic Kernel planners, you can ask an LLM to generate a plan that achieves a user's unique goal. Afterwards, Semantic Kernel will execute the plan for the user.
![AgentConnect Screenshot](/screenshots_githubs/chgaowei-AgentConnect.jpg)
AgentConnect
AgentConnect is an open-source implementation of the Agent Network Protocol (ANP) aiming to define how agents connect with each other and build an open, secure, and efficient collaboration network for billions of agents. It addresses challenges like interconnectivity, native interfaces, and efficient collaboration. The architecture includes authentication, end-to-end encryption modules, meta-protocol module, and application layer protocol integration framework. AgentConnect focuses on performance and multi-platform support, with plans to rewrite core components in Rust and support mobile platforms and browsers. The project aims to establish ANP as an industry standard and form an ANP Standardization Committee. Installation is done via 'pip install agent-connect' and demos can be run after cloning the repository. Features include decentralized authentication based on did:wba and HTTP, and meta-protocol negotiation examples.
![CodeFuse-muAgent Screenshot](/screenshots_githubs/codefuse-ai-CodeFuse-muAgent.jpg)
CodeFuse-muAgent
CodeFuse-muAgent is a Multi-Agent framework designed to streamline Standard Operating Procedure (SOP) orchestration for agents. It integrates toolkits, code libraries, knowledge bases, and sandbox environments for rapid construction of complex Multi-Agent interactive applications. The framework enables efficient execution and handling of multi-layered and multi-dimensional tasks.
![aide Screenshot](/screenshots_githubs/codestoryai-aide.jpg)
aide
Aide is an Open Source AI-native code editor that combines the powerful features of VS Code with advanced AI capabilities. It provides a combined chat + edit flow, proactive agents for fixing errors, inline editing widget, intelligent code completion, and AST navigation. Aide is designed to be an intelligent coding companion, helping users write better code faster while maintaining control over the development process.
![ianvs Screenshot](/screenshots_githubs/kubeedge-ianvs.jpg)
ianvs
Ianvs is a distributed synergy AI benchmarking project incubated in KubeEdge SIG AI. It aims to test the performance of distributed synergy AI solutions following recognized standards, providing end-to-end benchmark toolkits, test environment management tools, test case control tools, and benchmark presentation tools. It also collaborates with other organizations to establish comprehensive benchmarks and related applications. The architecture includes critical components like Test Environment Manager, Test Case Controller, Generation Assistant, Simulation Controller, and Story Manager. Ianvs documentation covers quick start, guides, dataset descriptions, algorithms, user interfaces, stories, and roadmap.
For similar tasks
![Azure-Analytics-and-AI-Engagement Screenshot](/screenshots_githubs/microsoft-Azure-Analytics-and-AI-Engagement.jpg)
Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.
![sorrentum Screenshot](/screenshots_githubs/sorrentum-sorrentum.jpg)
sorrentum
Sorrentum is an open-source project that aims to combine open-source development, startups, and brilliant students to build machine learning, AI, and Web3 / DeFi protocols geared towards finance and economics. The project provides opportunities for internships, research assistantships, and development grants, as well as the chance to work on cutting-edge problems, learn about startups, write academic papers, and get internships and full-time positions at companies working on Sorrentum applications.
![tidb Screenshot](/screenshots_githubs/pingcap-tidb.jpg)
tidb
TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.
![zep-python Screenshot](/screenshots_githubs/getzep-zep-python.jpg)
zep-python
Zep is an open-source platform for building and deploying large language model (LLM) applications. It provides a suite of tools and services that make it easy to integrate LLMs into your applications, including chat history memory, embedding, vector search, and data enrichment. Zep is designed to be scalable, reliable, and easy to use, making it a great choice for developers who want to build LLM-powered applications quickly and easily.
![telemetry-airflow Screenshot](/screenshots_githubs/mozilla-telemetry-airflow.jpg)
telemetry-airflow
This repository codifies the Airflow cluster that is deployed at workflow.telemetry.mozilla.org (behind SSO) and commonly referred to as "WTMO" or simply "Airflow". Some links relevant to users and developers of WTMO: * The `dags` directory in this repository contains some custom DAG definitions * Many of the DAGs registered with WTMO don't live in this repository, but are instead generated from ETL task definitions in bigquery-etl * The Data SRE team maintains a WTMO Developer Guide (behind SSO)
![mojo Screenshot](/screenshots_githubs/modularml-mojo.jpg)
mojo
Mojo is a new programming language that bridges the gap between research and production by combining Python syntax and ecosystem with systems programming and metaprogramming features. Mojo is still young, but it is designed to become a superset of Python over time.
![pandas-ai Screenshot](/screenshots_githubs/Sinaptik-AI-pandas-ai.jpg)
pandas-ai
PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps you to explore, clean, and analyze your data using generative AI.
![databend Screenshot](/screenshots_githubs/datafuselabs-databend.jpg)
databend
Databend is an open-source cloud data warehouse that serves as a cost-effective alternative to Snowflake. With its focus on fast query execution and data ingestion, it's designed for complex analysis of the world's largest datasets.
For similar jobs
![LLM-FineTuning-Large-Language-Models Screenshot](/screenshots_githubs/rohan-paul-LLM-FineTuning-Large-Language-Models.jpg)
LLM-FineTuning-Large-Language-Models
This repository contains projects and notes on common practical techniques for fine-tuning Large Language Models (LLMs). It includes fine-tuning LLM notebooks, Colab links, LLM techniques and utils, and other smaller language models. The repository also provides links to YouTube videos explaining the concepts and techniques discussed in the notebooks.
![lloco Screenshot](/screenshots_githubs/jeffreysijuntan-lloco.jpg)
lloco
LLoCO is a technique that learns documents offline through context compression and in-domain parameter-efficient finetuning using LoRA, which enables LLMs to handle long context efficiently.
![camel Screenshot](/screenshots_githubs/camel-ai-camel.jpg)
camel
CAMEL is an open-source library designed for the study of autonomous and communicative agents. We believe that studying these agents on a large scale offers valuable insights into their behaviors, capabilities, and potential risks. To facilitate research in this field, we implement and support various types of agents, tasks, prompts, models, and simulated environments.
![llm-baselines Screenshot](/screenshots_githubs/epfml-llm-baselines.jpg)
llm-baselines
LLM-baselines is a modular codebase to experiment with transformers, inspired from NanoGPT. It provides a quick and easy way to train and evaluate transformer models on a variety of datasets. The codebase is well-documented and easy to use, making it a great resource for researchers and practitioners alike.
![python-tutorial-notebooks Screenshot](/screenshots_githubs/dcavar-python-tutorial-notebooks.jpg)
python-tutorial-notebooks
This repository contains Jupyter-based tutorials for NLP, ML, AI in Python for classes in Computational Linguistics, Natural Language Processing (NLP), Machine Learning (ML), and Artificial Intelligence (AI) at Indiana University.
![EvalAI Screenshot](/screenshots_githubs/Cloud-CV-EvalAI.jpg)
EvalAI
EvalAI is an open-source platform for evaluating and comparing machine learning (ML) and artificial intelligence (AI) algorithms at scale. It provides a central leaderboard and submission interface, making it easier for researchers to reproduce results mentioned in papers and perform reliable & accurate quantitative analysis. EvalAI also offers features such as custom evaluation protocols and phases, remote evaluation, evaluation inside environments, CLI support, portability, and faster evaluation.
![Weekly-Top-LLM-Papers Screenshot](/screenshots_githubs/youssefHosni-Weekly-Top-LLM-Papers.jpg)
Weekly-Top-LLM-Papers
This repository provides a curated list of weekly published Large Language Model (LLM) papers. It includes top important LLM papers for each week, organized by month and year. The papers are categorized into different time periods, making it easy to find the most recent and relevant research in the field of LLM.
![self-llm Screenshot](/screenshots_githubs/datawhalechina-self-llm.jpg)
self-llm
This project is a Chinese tutorial for domestic beginners based on the AutoDL platform, providing full-process guidance for various open-source large models, including environment configuration, local deployment, and efficient fine-tuning. It simplifies the deployment, use, and application process of open-source large models, enabling more ordinary students and researchers to better use open-source large models and helping open and free large models integrate into the lives of ordinary learners faster.