learnopencv

Learn OpenCV : C++ and Python Examples

Stars: 22319

Visit

LearnOpenCV is a repository containing code for Computer Vision, Deep learning, and AI research articles shared on the blog LearnOpenCV.com. It serves as a resource for individuals looking to enhance their expertise in AI through various courses offered by OpenCV. The repository includes a wide range of topics such as image inpainting, instance segmentation, robotics, deep learning models, and more, providing practical implementations and code examples for readers to explore and learn from.

README:

LearnOpenCV

This repository contains code for Computer Vision, Deep learning, and AI research articles shared on our blog LearnOpenCV.com.

Want to become an expert in AI? AI Courses by OpenCV is a great place to start.

List of Blog Posts

Blog Post	Code
Top VLM Evaluation Metrics for Optimal Performance Analysis	Code
Getting Started with VLM on Jetson Nano	Code
VLM on Edge: Worth the Hype or Just a Novelty?	Code
AnomalyCLIP : Harnessing CLIP for Weakly-Supervised Video Anomaly Recognition	Code
AI_for_Video_Understanding_From_Content_Moderation_to_Summarization	Code
Video-RAG: Training-Free Retrieval for Long-Video LVLMs	Code
Object Detection and Spatial Understanding with VLMs ft. Qwen2.5-VL	Code
LangGraph: Building Self-Correcting RAG Agent for Code Generation	Code
Inside Sinusoidal Position Embeddings: A Sense of Order	Code
Inside RoPE: Rotary Magic into Position Embeddings	Code
SimLingo-Vision-Language-Action-Model-for-Autonomous-Driving	Code
FineTuning Gemma 3n for Medical VQA on ROCOv2	Code
SmolLM3 Blueprint: SOTA 3B-Parameter LLM
LangGraph-A-Visual-Automation-and-Summarization-Pipeline	Code
Fine-Tuning AnomalyCLIP: Class-Agnostic Zero-Shot Anomaly Detection	Code
SigLIP 2: DeepMind’s Multilingual Vision-Language Model
MedGemma: Google’s Medico VLM for Clinical QA, Imaging, and More	Code
Nanonets-OCR-s: Enabling Rich, Structured Markdown for Document Understanding
Optimizing VJEPA-2: Tackling Latency & Context in Real-Time Video Classification Scripts	Code
V-JEPA 2: Meta’s Breakthrough in AI for the Physical World	Code
NVIDIA Cosmos Reason1: Video Understanding	Code
GR00T N1.5 Explained
LLaVA	Code
SmolVLA: Affordable & Efficient VLA Robotics on Consumer GPUs	Code
Fine-Tuning Grounding DINO: Open-Vocabulary Object Detection	Code
Getting Started with Qwen3 – The Thinking Expert	Code
Inside the GPU: A Comprehensive Guide to Modern Graphics Architecture
Distributed Parallel Training: PyTorch	Code
MONAI: The Definitive Framework for Medical Imaging Powered by PyTorch
SANA-Sprint: The One-Step Revolution in High-Quality AI Image Synthesis
FramePack-Video-Diffusion-but-feels-like-Image-Diffusion	Code
Model Weights File Formats in Machine Learning
Unsloth: A Guide from Basics to Fine-Tuning Vision Models	Code
Iterative Closest Point (ICP) Algorithm Explained	Code
MedSAM2 Explained: One Prompt to Segment Anything in Medical Imaging	Code
Batch Normalization and Dropout as Regularizers
DINOv2_by_Meta_A_Self-Supervised_foundational_vision_model	Code
Beginner's Guide to Embedding Models
MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors	Code
Google's A2A Protocol
Nvidia SANA : Faster Image Generation
Fine-tuning RF-DETR	Code
Qwen2.5-Omni: A Real-Time Multimodal AI
Vision Language Action Models: Robotic Control	Code
Fine-Tuning Gemma 3 VLM using QLoRA for LaTeX-OCR Dataset	Code
ComfyUI	Code
Gemma-3: A Comprehensive Introduction
YOLO11 on Raspberry Pi: Optimizing Object Detection for Edge Devices	Code
VGGT: Visual Geometry Grounded Transformer – For Dense 3D Reconstruction	Code
DDIM: The Faster, Improved Version of DDPM for Efficient AI Image Generation	Code
Introduction to Model Context Protocol (MCP)
MASt3R and MASt3R-SfM Explanation: Image Matching and 3D Reconstruction	Code
MatAnyone Explained: Consistent Memory for Better Video Matting	Code
GraphRAG: For Medical Document Analysis	Code
OmniParser: Vision Based GUI Agent
Fine-Tuning-YOLOv12-Comparison-With-YOLOv11-And-YOLOv7-Based-Darknet	Code
FineTuning RetinaNet for Wildlife Detection with PyTorch: A Step-by-Step Tutorial	Code
DUSt3R: Geometric 3D Vision Made Easy : Explanation and Results	Code
YOLOv12: Attention Meets Speed	Code
Video Generation: A Diffusion based approach	Code
Agentic AI: A Comprehensive Introduction	Code
Finetuning SAM2 for Leaf Disease Segmentation	Code
Object Insertion in Gaussian Splatting: Paper Explained and Training Code for MCMC and Bilateral Grid	Code
Depth Pro: Sharp Monocular Metric Depth	Code
Fine-tuning-Stable-Diffusion-3_5-UI-images	Code
SimSiam: Streamlining SSL with Stop-Gradient Mechanism	Code
Image Captioning using ResNet and LSTM	Code
Molmo VLM: Paper Explanation and Demo	Code
3D Gaussian Splatting Paper Explanation: Training Custom Datasets with NeRF-Studio Gsplats	Code
FLUX Image Generation: Experimenting with the Parameters	Code
Contrastive-Learning-SimCLR-and-BYOL(With Code Example)	Code
The Annotated NeRF : Training on Custom Dataset from Scratch in Pytorch	Code
Stable Diffusion 3 and 3.5: Paper Explanation and Inference	Code
LightRAG - Legal Document Analysis	Code
NVIDIA AI Summit 2024 – India Overview
Introduction to Speech to Speech: Most Efficient Form of NLP	Code
Training 3D U-Net for Brain Tumor Segmentation (BraTS-GLI)	Code
DETR: Overview and Inference	Code
YOLO11: Faster Than You Can Imagine!	Code
Exploring DINO: Self-Supervised Transformers for Road Segmentation with ResNet50 and U-Net	Code
Sapiens: Foundation for Human Vision Models by Meta	Code
Multimodal RAG with ColPali and Gemini	Code
Building Autonomous Vehicle in Carla: Path Following with PID Control & ROS 2	Code
Handwritten Text Recognition using OCR	Code
Training CLIP from Sratch for Image Retrieval	Code
Introduction to LiDAR SLAM: LOAM and LeGO-LOAM Paper and Code Explanation with ROS 2 Implementation	Code
Recommendation System using Vector Search	Code
Fine Tuning Whisper on Custom Dataset	Code
SAM 2 – Promptable Segmentation for Images and Videos	Code
Introduction to Feature Matching Using Neural Networks	Code
Introduction to ROS2 (Robot Operating System 2): Tutorial on ROS2 Working, DDS, ROS1 RMW, Topics, Nodes, Publisher, Subscriber in Python	Code
CVPR 2024 Research Papers - Part- 2	Code
CVPR 2024: An Overview and Key Papers	Code
Object Detection on Edge Device - OAK-D-Lite	Code
Fine-Tuning YOLOv10 Models on Custom Dataset	Code
ROS2 and Carla Setup Guide for Ubuntu 22.04
Understanding Visual SLAM for Robotics Perception: Building Monocular SLAM from Scratch in Python	Code
Enhancing Image Segmentation using U2-Net: An Approach to Efficient Background Removal	Code
YOLOv10: The Dual-Head OG of YOLO Series	Code
Fine-tuning Faster R-CNN on Sea Rescue Dataset	Code
Mastering Recommendation System: A Complete Guide
Automatic Speech Recognition with Diarization : Speech-to-Text	Code
Building MobileViT Image Classification Model from Scratch In Keras 3	Code
SDXL Inpainting: Fusing Image Inpainting with Stable Diffusion	Code
YOLOv9 Instance Segmentation on Medical Dataset	Code
A Comprehensive Guide to Robotics
Integrating Gradio with OpenCV DNN	Code
Fine-Tuning YOLOv9 on Custom Dataset	Code
Dreambooth using Diffusers	Code
Introduction to Hugging Face Diffusers	Code
Introduction to Ultralytics Explorer API	Code
YOLOv9: Advancing the YOLO Legacy	Code
Fine-Tuning LLMs using PEFT	Code
Depth Anything: Accelerating Monocular Depth Perception	Code
Deciphering LLMs: From Transformers to Quantization	Code
YOLO Loss Function Part 2: GFL and VFL Loss	Code
YOLOv8-Object-Tracking-and-Counting-with-OpenCV	Code
Stereo Vision in ADAS: Pioneering Depth Perception Beyond LiDAR	Code
YOLO Loss Function Part 1: SIoU and Focal Loss	Code
Moving Object Detection with OpenCV	Code
Integrating ADAS with Keypoint Feature Pyramid Network for 3D LiDAR Object Detection	Code
Mastering All YOLO Models from YOLOv1 to YOLO-NAS: Papers Explained (2024)
GradCAM: Enhancing Neural Network Interpretability in the Realm of Explainable AI	Code
Text Summarization using T5: Fine-Tuning and Building Gradio App	Code
3D LiDAR Visualization using Open3D: A Case Study on 2D KITTI Depth Frames for Autonomous Driving	Code
Fine Tuning T5: Text2Text Transfer Transformer for Building a Stack Overflow Tag Generator	Code
SegFormer 🤗 : Fine-Tuning for Improved Lane Detection in Autonomous Vehicles	Code
Fine-Tuning BERT using Hugging Face Transformers	Code
YOLO-NAS Pose	Code
BERT: Bidirectional Encoder Representations from Transformers	Code
Comparing KerasCV YOLOv8 Models on the Global Wheat Data 2020	Code
Top 5 AI papers of September 2023
Empowering Drivers: The Rise and Role of Advanced Driver Assistance Systems
Semantic Segmentation using KerasCV DeepLabv3+	Code
Object Detection using KerasCV YOLOv8	Code
Fine-tuning YOLOv8 Pose Models for Animal Pose Estimation	Code
Top 5 AI papers of August 2023
Fine Tuning TrOCR - Training TrOCR to Recognize Curved Text	Code
TrOCR - Getting Started with Transformer Based OCR	Code
Facial Emotion Recognition	Code
Object Keypoint Similarity in Keypoint Detection	Code
Real Time Deep SORT with Torchvision Detectors	Code
Top 5 AI papers of July 2023
Medical Image Segmentation	Code
Weighted Boxes Fusion in Object Detection: A Comparison with Non-Maximum Suppression	Code
Medical Multi-label Classification with PyTorch & Lightning	Code
Getting Started with PaddlePaddle: Exploring Object Detection, Segmentation, and Keypoints	Code
Drone Programming With Computer Vision A Beginners Guide	Code
How to Build a Pip Installable Package & Upload to PyPi
IoU Loss Functions for Faster & More Accurate Object Detection
Exploring Slicing Aided Hyper Inference for Small Object Detection	Code
Advancements in Face Recognition Models, Toolkit and Datasets
Train YOLO NAS on Custom Dataset	Code
Train YOLOv8 Instance Segmentation on Custom Data	Code
YOLO-NAS: New Object Detection Model Beats YOLOv6 & YOLOv8	Code
Segment Anything – A Foundation Model for Image Segmentation	Code
Build a Video to Slides Converter Application using the Power of Background Estimation and Frame Differencing in OpenCV	Code
A Closer Look at CVAT: Perfecting Your Annotations	YouTube
ControlNet - Achieving Superior Image Generation Results	Code
InstructPix2Pix - Edit Images With Prompts	Code
NVIDIA Spring GTC 2023 Day 4: Ending on a High Note with Top Moments from the Finale!
NVIDIA Spring GTC 2023 Day 3: Digging deeper into Deep Learning, Semiconductors & more!
NVIDIA Spring GTC 2023 Day 2: Jensen’s keynote & the iPhone moment of AI is here!
NVIDIA Spring GTC 2023 Day 1: Welcome to the future!
NVIDIA GTC Spring 2023 Curtain Raiser
Stable Diffusion - A New Paradigm in Generative AI	Code
OpenCV Face Recognition – Does Face Recognition Work on AI-Generated Images?
An In-Depth Guide to Denoising Diffusion Probabilistic Models – From Theory to Implementation	Code
From Pixels to Paintings: The Rise of Midjourney AI Art
Mastering DALL·E 2: A Breakthrough in AI Art Generation
Top 10 AI Art Generation Tools using Diffusion Models
The Future of Image Recognition is Here: PyTorch Vision Transformer	Code
Understanding Attention Mechanism in Transformer Neural Networks	Code
Deploying a Deep Learning Model using Hugging Face Spaces and Gradio	Code
Train YOLOv8 on Custom Dataset – A Complete Tutorial	Code
Introduction to Diffusion Models for Image Generation	Code
Building An Automated Image Annotation Tool: PyOpenAnnotate	Code
Ultralytics YOLOv8: State-of-the-Art YOLO Models	Code
Getting Started with YOLOv5 Instance Segmentation	Code
The Ultimate Guide To DeepLabv3 - With PyTorch Inference	Code
AI Fitness Trainer using MediaPipe: Squats Analysis	Code
YoloR - Paper Explanation & Inference -An In-Depth Analysis	Code
Roadmap To an Automated Image Annotation Tool Using Python	Code
Performance Comparison of YOLO Object Detection Models – An Intensive Study
FCOS - Anchor Free Object Detection Explained	Code
YOLOv6 Custom Dataset Training – Underwater Trash Detection	Code
What is EXIF Data in Images?	Code
t-SNE: T-Distributed Stochastic Neighbor Embedding Explained	Code
CenterNet: Objects as Points – Anchor-free Object Detection Explained	Code
YOLOv7 Pose vs MediaPipe in Human Pose Estimation	Code
YOLOv6 Object Detection – Paper Explanation and Inference	Code
YOLOX Object Detector Paper Explanation and Custom Training	Code
Driver Drowsiness Detection Using Mediapipe In Python	Code
GTC 2022 Big Bang AI announcements: Everything you need to know
NVIDIA GTC 2022 : The most important AI event this Fall
Object Tracking and Reidentification with FairMOT	Code
What is Face Detection? – The Ultimate Guide for 2022	Code
Document Scanner: Custom Semantic Segmentation using PyTorch-DeepLabV3	Code
Fine Tuning YOLOv7 on Custom Dataset	Code
Center Stage for Zoom Calls using MediaPipe	Code
Mean Average Precision (mAP) in Object Detection
YOLOv7 Object Detection Paper Explanation and Inference	Code
Pothole Detection using YOLOv4 and Darknet	Code
Automatic Document Scanner using OpenCV	Code
Demystifying GPU architectures for deep learning: Part 2	Code
Demystifying GPU Architectures For Deep Learning	Code
Intersection-over-Union(IoU)-in-Object-Detection-and-Segmentation	Code
Understanding Multiple Object Tracking using DeepSORT	Code
Optical Character Recognition using PaddleOCR	Code
Gesture Control in Zoom Call using Mediapipe	Code
A Deep Dive into Tensorflow Model Optimization	Code
DepthAI Pipeline Overview: Creating a Complex Pipeline	Code
TensorFlow Lite Model Maker: Create Models for On-Device Machine Learning	Code
TensorFlow Lite: Model Optimization for On Device Machine Learning	Code
Object detection with depth measurement using pre-trained models with OAK-D	Code
Custom Object Detection Training using YOLOv5	Code
Object Detection using Yolov5 and OpenCV DNN (C++/Python)	Code
Create Snapchat/Instagram filters using Mediapipe	Code
AUTOSAR C++ compliant deep learning inference with TensorRT	Code
NVIDIA GTC 2022 Day 4 Highlights: Meet the new Jetson Orin
NVIDIA GTC 2022 Day 3 Highlights: Deep Dive into Hopper architecture
NVIDIA GTC 2022 Day 2 Highlights: Jensen’s Keynote
NVIDIA GTC 2022 Day 1 Highlights: Brilliant Start
Automatic License Plate Recognition using Python	Code
Building a Poor Body Posture Detection and Alert System using MediaPipe	Code
Introduction to MediaPipe	Code
Disparity Estimation using Deep Learning	Code
How to build Chrome Dino game bot using OpenCV Feature Matching	Code
Top 10 Sources to Find Computer Vision and AI Models
Multi-Attribute and Graph-based Object Detection
Plastic Waste Detection with Deep Learning	Code
Ensemble Deep Learning-based Defect Classification and Detection in SEM Images
Building Industrial embedded deep learning inference pipelines with TensorRT	Code
Transfer Learning for Medical Images
Stereo Vision and Depth Estimation using OpenCV AI Kit	Code
Introduction to OpenCV AI Kit and DepthAI	Code
WeChat QR Code Scanner in OpenCV	Code
AI behind the Diwali 2021 ‘Not just a Cadbury ad’
Model Selection and Benchmarking with Modelplace.AI	Model Zoo
Real-time style transfer in a zoom meeting	Code
Introduction to OpenVino Deep Learning Workbench	Code
Running OpenVino Models on Intel Integrated GPU	Code
Post Training Quantization with OpenVino Toolkit	Code
Introduction to Intel OpenVINO Toolkit
Human Action Recognition using Detectron2 and LSTM	Code
Pix2Pix:Image-to-Image Translation in PyTorch & TensorFlow	Code
Conditional GAN (cGAN) in PyTorch and TensorFlow	Code
Deep Convolutional GAN in PyTorch and TensorFlow	Code
Introduction to Generative Adversarial Networks (GANs)	Code
Human Pose Estimation using Keypoint RCNN in PyTorch	Code
Non Maximum Suppression: Theory and Implementation in PyTorch	Code
MRNet – The Multi-Task Approach	Code
Generative and Discriminative Models
Playing Chrome's T-Rex Game with Facial Gestures	Code
Variational Autoencoder in TensorFlow	Code
Autoencoder in TensorFlow 2: Beginner’s Guide	Code
Deep Learning with OpenCV DNN Module: A Definitive Guide	Code
Depth perception using stereo camera (Python/C++)	Code
Contour Detection using OpenCV (Python/C++)	Code
Super Resolution in OpenCV	Code
Improving Illumination in Night Time Images	Code
Video Classification and Human Activity Recognition	Code
How to use OpenCV DNN Module with Nvidia GPU on Windows	Code
How to use OpenCV DNN Module with NVIDIA GPUs	Code
Code OpenCV in Visual Studio
Install OpenCV on Windows – C++ / Python	Code
Face Recognition with ArcFace	Code
Background Subtraction with OpenCV and BGS Libraries	Code
RAFT: Optical Flow estimation using Deep Learning	Code
Making A Low-Cost Stereo Camera Using OpenCV	Code
Optical Flow in OpenCV (C++/Python)	Code
Introduction to Epipolar Geometry and Stereo Vision	Code
Classification With Localization: Convert any keras Classifier to a Detector	Code
Photoshop Filters in OpenCV	Code
Tetris Game using OpenCV Python	Code
Image Classification with OpenCV for Android	Code
Image Classification with OpenCV Java	Code
PyTorch to Tensorflow Model Conversion	Code
Snake Game with OpenCV Python	Code
Stanford MRNet Challenge: Classifying Knee MRIs	Code
Experiment Logging with TensorBoard and wandb	Code
Understanding Lens Distortion	Code
Image Matting with state-of-the-art Method “F, B, Alpha Matting”	Code
Bag Of Tricks For Image Classification - Let's check if it is working or not	Code
Getting Started with OpenCV CUDA Module	Code
Training a Custom Object Detector with DLIB & Making Gesture Controlled Applications	Code
How To Run Inference Using TensorRT C++ API	Code
Using Facial Landmarks for Overlaying Faces with Medical Masks	Code
Tensorboard with PyTorch Lightning	Code
Otsu's Thresholding with OpenCV	Code
PyTorch-to-CoreML-model-conversion	Code
Playing Rock, Paper, Scissors with AI	Code
CNN Receptive Field Computation Using Backprop with TensorFlow	Code
CNN Fully Convolutional Image Classification with TensorFlow	Code
How to convert a model from PyTorch to TensorRT and speed up inference	Code
Efficient image loading	Code
Graph Convolutional Networks: Model Relations In Data	Code
Getting Started with Federated Learning with PyTorch and PySyft	Code
Creating a Virtual Pen & Eraser	Code
Getting Started with PyTorch Lightning	Code
Multi-Label Image Classification with PyTorch: Image Tagging	Code
Funny Mirrors Using OpenCV	code
t-SNE for ResNet feature visualization	Code
Multi-Label Image Classification with Pytorch	Code
CNN Receptive Field Computation Using Backprop	Code
CNN Receptive Field Computation Using Backprop with TensorFlow	Code
Augmented Reality using AruCo Markers in OpenCV(C++ and Python)	Code
Fully Convolutional Image Classification on Arbitrary Sized Image	Code
Camera Calibration using OpenCV	Code
Geometry of Image Formation
Ensuring Training Reproducibility in Pytorch
Gaze Tracking
Simple Background Estimation in Videos Using OpenCV	Code
Applications of Foreground-Background separation with Semantic Segmentation	Code
EfficientNet: Theory + Code	Code
PyTorch for Beginners: Mask R-CNN Instance Segmentation with PyTorch	Code
PyTorch for Beginners: Faster R-CNN Object Detection with PyTorch	Code
PyTorch for Beginners: Semantic Segmentation using torchvision	Code
PyTorch for Beginners: Comparison of pre-trained models for Image Classification	Code
PyTorch for Beginners: Basics	Code
PyTorch Model Inference using ONNX and Caffe2	Code
Image Classification Using Transfer Learning in PyTorch	Code
Hangman: Creating games in OpenCV	Code
Image Inpainting with OpenCV (C++/Python)	Code
Hough Transform with OpenCV (C++/Python)	Code
Xeus-Cling: Run C++ code in Jupyter Notebook	Code
Gender & Age Classification using OpenCV Deep Learning ( C++/Python )	Code
Invisibility Cloak using Color Detection and Segmentation with OpenCV	Code
Fast Image Downloader for Open Images V4 (Python)	Code
Deep Learning based Text Detection Using OpenCV (C++/Python)	Code
Video Stabilization Using Point Feature Matching in OpenCV	Code
Training YOLOv3 : Deep Learning based Custom Object Detector	Code
Using OpenVINO with OpenCV	Code
Duplicate Search on Quora Dataset	Code
Shape Matching using Hu Moments (C++/Python)	Code
Install OpenCV 4 on CentOS (C++ and Python)	Code
Install OpenCV 3.4.4 on CentOS (C++ and Python)	Code
Install OpenCV 3.4.4 on Red Hat (C++ and Python)	Code
Install OpenCV 4 on Red Hat (C++ and Python)	Code
Install OpenCV 4 on macOS (C++ and Python)	Code
Install OpenCV 3.4.4 on Raspberry Pi	Code
Install OpenCV 3.4.4 on macOS (C++ and Python)	Code
OpenCV QR Code Scanner (C++ and Python)	Code
Install OpenCV 3.4.4 on Windows (C++ and Python)	Code
Install OpenCV 3.4.4 on Ubuntu 16.04 (C++ and Python)	Code
Install OpenCV 3.4.4 on Ubuntu 18.04 (C++ and Python)	Code
Universal Sentence Encoder	Code
Install OpenCV 4 on Raspberry Pi	Code
Install OpenCV 4 on Windows (C++ and Python)	Code
Face Detection – Dlib, OpenCV, and Deep Learning ( C++ / Python )	Code
Hand Keypoint Detection using Deep Learning and OpenCV	Code
Deep learning based Object Detection and Instance Segmentation using Mask R-CNN in OpenCV (Python / C++)	Code
Install OpenCV 4 on Ubuntu 18.04 (C++ and Python)	Code
Install OpenCV 4 on Ubuntu 16.04 (C++ and Python)	Code
Multi-Person Pose Estimation in OpenCV using OpenPose	Code
Heatmap for Logo Detection using OpenCV (Python)	Code
Deep Learning based Object Detection using YOLOv3 with OpenCV ( Python / C++ )	Code
Convex Hull using OpenCV in Python and C++	Code
MultiTracker : Multiple Object Tracking using OpenCV (C++/Python)	Code
Convolutional Neural Network based Image Colorization using OpenCV	Code
SVM using scikit-learn	Code
GOTURN: Deep Learning based Object Tracking	Code
Find the Center of a Blob (Centroid) using OpenCV (C++/Python)	Code
Support Vector Machines (SVM)	Code
Batch Normalization in Deep Networks	Code
Deep Learning based Character Classification using Synthetic Dataset	Code
Image Quality Assessment : BRISQUE	Code
Understanding AlexNet
Deep Learning based Text Recognition (OCR) using Tesseract and OpenCV	Code
Deep Learning based Human Pose Estimation using OpenCV ( C++ / Python )	Code
Number of Parameters and Tensor Sizes in a Convolutional Neural Network (CNN)
How to convert your OpenCV C++ code into a Python module	Code
CV4Faces : Best Project Award 2018
Facemark : Facial Landmark Detection using OpenCV	Code
Image Alignment (Feature Based) using OpenCV (C++/Python)	Code
Barcode and QR code Scanner using ZBar and OpenCV	Code
Keras Tutorial : Fine-tuning using pre-trained models	Code
OpenCV Transparent API
Face Reconstruction using EigenFaces (C++/Python)	Code
Eigenface using OpenCV (C++/Python)	Code
Principal Component Analysis
Keras Tutorial : Transfer Learning using pre-trained models	Code
Keras Tutorial : Using pre-trained Imagenet models	Code
Technical Aspects of a Digital SLR
Using Harry Potter interactive wand with OpenCV to create magic
Install OpenCV 3 and Dlib on Windows ( Python only )
Image Classification using Convolutional Neural Networks in Keras	Code
Understanding Autoencoders using Tensorflow (Python)	Code
Best Project Award : Computer Vision for Faces
Understanding Activation Functions in Deep Learning
Image Classification using Feedforward Neural Network in Keras	Code
Exposure Fusion using OpenCV (C++/Python)	Code
Understanding Feedforward Neural Networks
High Dynamic Range (HDR) Imaging using OpenCV (C++/Python)	Code
Deep learning using Keras – The Basics	Code
Selective Search for Object Detection (C++ / Python)	Code
Installing Deep Learning Frameworks on Ubuntu with CUDA support
Parallel Pixel Access in OpenCV using forEach	Code
cvui: A GUI lib built on top of OpenCV drawing primitives	Code
Install Dlib on Windows
Install Dlib on Ubuntu
Install OpenCV3 on Ubuntu
Read, Write and Display a video using OpenCV ( C++/ Python )	Code
Install Dlib on MacOS
Install OpenCV 3 on MacOS
Install OpenCV 3 on Windows
Get OpenCV Build Information ( getBuildInformation )
Color spaces in OpenCV (C++ / Python)	Code
Neural Networks : A 30,000 Feet View for Beginners
Alpha Blending using OpenCV (C++ / Python)	Code
User stories : How readers of this blog are applying their knowledge to build applications
How to select a bounding box ( ROI ) in OpenCV (C++/Python) ?
Automatic Red Eye Remover using OpenCV (C++ / Python)	Code
Bias-Variance Tradeoff in Machine Learning
Embedded Computer Vision: Which device should you choose?
Object Tracking using OpenCV (C++/Python)	Code
Handwritten Digits Classification : An OpenCV ( C++ / Python ) Tutorial	Code
Training a better Haar and LBP cascade based Eye Detector using OpenCV
Deep Learning Book Gift Recipients
Minified OpenCV Haar and LBP Cascades	Code
Deep Learning Book Gift
Histogram of Oriented Gradients
Image Recognition and Object Detection : Part 1
Head Pose Estimation using OpenCV and Dlib	Code
Live CV : A Computer Vision Coding Application
Approximate Focal Length for Webcams and Cell Phone Cameras
Configuring Qt for OpenCV on OSX	Code
Rotation Matrix To Euler Angles	Code
Speeding up Dlib’s Facial Landmark Detector
Warp one triangle to another using OpenCV ( C++ / Python )	Code
Average Face : OpenCV ( C++ / Python ) Tutorial	Code
Face Swap using OpenCV ( C++ / Python )	Code
Face Morph Using OpenCV — C++ / Python	Code
Deep Learning Example using NVIDIA DIGITS 3 on EC2
NVIDIA DIGITS 3 on EC2
Homography Examples using OpenCV ( Python / C ++ )	Code
Filling holes in an image using OpenCV ( Python / C++ )	Code
How to find frame rate or frames per second (fps) in OpenCV ( Python / C++ ) ?	Code
Delaunay Triangulation and Voronoi Diagram using OpenCV ( C++ / Python)	Code
OpenCV (C++ vs Python) vs MATLAB for Computer Vision
Facial Landmark Detection
Why does OpenCV use BGR color format ?
Computer Vision for Predicting Facial Attractiveness	Code
applyColorMap for pseudocoloring in OpenCV ( C++ / Python )	Code
Image Alignment (ECC) in OpenCV ( C++ / Python )	Code
How to find OpenCV version in Python and C++ ?
Baidu banned from ILSVRC 2015
OpenCV Transparent API
How Computer Vision Solved the Greatest Soccer Mystery of All Time
Embedded Vision Summit 2015
Read an Image in OpenCV ( Python, C++ )	Code
Non-Photorealistic Rendering using OpenCV ( Python, C++ )	Code
Seamless Cloning using OpenCV ( Python , C++ )	Code
OpenCV Threshold ( Python , C++ )	Code
Blob Detection Using OpenCV ( Python, C++ )	Code
Turn your OpenCV Code into a Web API in under 10 minutes — Part 1
How to compile OpenCV sample Code ?
Install OpenCV 3 on Yosemite ( OSX 10.10.x )

For Tasks:

Click tags to check more tools for each tasks

analyze images detect objects train models implement algorithms explore ai concepts

For Jobs:

computer vision engineer deep learning researcher ai developer robotics engineer data scientist

Alternative AI tools for learnopencv

Similar Open Source Tools

learnopencv

github

: 22.3k

Awesome-LLM-Constrained-Decoding

Awesome-LLM-Constrained-Decoding is a curated list of papers, code, and resources related to constrained decoding of Large Language Models (LLMs). The repository aims to facilitate reliable, controllable, and efficient generation with LLMs by providing a comprehensive collection of materials in this domain.

github

: 180

watsonx-ai-samples

Sample notebooks for IBM Watsonx.ai for IBM Cloud and IBM Watsonx.ai software product. The notebooks demonstrate capabilities such as running experiments on model building using AutoAI or Deep Learning, deploying third-party models as web services or batch jobs, monitoring deployments with OpenScale, managing model lifecycles, inferencing Watsonx.ai foundation models, and integrating LangChain with Watsonx.ai. Notebooks with Python code and the Python SDK can be found in the `python_sdk` folder. The REST API examples are organized in the `rest_api` folder.

github

: 128

LLM4EC

LLM4EC is an interdisciplinary research repository focusing on the intersection of Large Language Models (LLM) and Evolutionary Computation (EC). It provides a comprehensive collection of papers and resources exploring various applications, enhancements, and synergies between LLM and EC. The repository covers topics such as LLM-assisted optimization, EA-based LLM architecture search, and applications in code generation, software engineering, neural architecture search, and other generative tasks. The goal is to facilitate research and development in leveraging LLM and EC for innovative solutions in diverse domains.

github

: 78

nntrainer

NNtrainer is a software framework for training neural network models on devices with limited resources. It enables on-device fine-tuning of neural networks using user data for personalization. NNtrainer supports various machine learning algorithms and provides examples for tasks such as few-shot learning, ResNet, VGG, and product rating. It is optimized for embedded devices and utilizes CBLAS and CUBLAS for accelerated calculations. NNtrainer is open source and released under the Apache License version 2.0.

github

: 135

LLM4Opt

LLM4Opt is a collection of references and papers focusing on applying Large Language Models (LLMs) for diverse optimization tasks. The repository includes research papers, tutorials, workshops, competitions, and related collections related to LLMs in optimization. It covers a wide range of topics such as algorithm search, code generation, machine learning, science, industry, and more. The goal is to provide a comprehensive resource for researchers and practitioners interested in leveraging LLMs for optimization tasks.

github

: 125

Awesome-Resource-Efficient-LLM-Papers

A curated list of high-quality papers on resource-efficient Large Language Models (LLMs) with a focus on various aspects such as architecture design, pre-training, fine-tuning, inference, system design, and evaluation metrics. The repository covers topics like efficient transformer architectures, non-transformer architectures, memory efficiency, data efficiency, model compression, dynamic acceleration, deployment optimization, support infrastructure, and other related systems. It also provides detailed information on computation metrics, memory metrics, energy metrics, financial cost metrics, network communication metrics, and other metrics relevant to resource-efficient LLMs. The repository includes benchmarks for evaluating the efficiency of NLP models and references for further reading.

github

: 105

LLM-KG4QA

LLM-KG4QA is a repository focused on the integration of Large Language Models (LLMs) and Knowledge Graphs (KGs) for Question Answering (QA). It covers various aspects such as using KGs as background knowledge, reasoning guideline, and refiner/filter. The repository provides detailed information on pre-training, fine-tuning, and Retrieval Augmented Generation (RAG) techniques for enhancing QA performance. It also explores complex QA tasks like Explainable QA, Multi-Modal QA, Multi-Document QA, Multi-Hop QA, Multi-run and Conversational QA, Temporal QA, Multi-domain and Multilingual QA, along with advanced topics like Optimization and Data Management. Additionally, it includes benchmark datasets, industrial and scientific applications, demos, and related surveys in the field.

github

: 80

Model-References

The 'Model-References' repository contains examples for training and inference using Intel Gaudi AI Accelerator. It includes models for computer vision, natural language processing, audio, generative models, MLPerf™ training, and MLPerf™ inference. The repository provides performance data and model validation information for various frameworks like PyTorch. Users can find examples of popular models like ResNet, BERT, and Stable Diffusion optimized for Intel Gaudi AI accelerator.

github

: 138

awesome-open-data-annotation

At ZenML, we believe in the importance of annotation and labeling workflows in the machine learning lifecycle. This repository showcases a curated list of open-source data annotation and labeling tools that are actively maintained and fit for purpose. The tools cover various domains such as multi-modal, text, images, audio, video, time series, and other data types. Users can contribute to the list and discover tools for tasks like named entity recognition, data annotation for machine learning, image and video annotation, text classification, sequence labeling, object detection, and more. The repository aims to help users enhance their data-centric workflows by leveraging these tools.

github

: 425

awesome-llm-planning-reasoning

The 'Awesome LLMs Planning Reasoning' repository is a curated collection focusing on exploring the capabilities of Large Language Models (LLMs) in planning and reasoning tasks. It includes research papers, code repositories, and benchmarks that delve into innovative techniques, reasoning limitations, and standardized evaluations related to LLMs' performance in complex cognitive tasks. The repository serves as a comprehensive resource for researchers, developers, and enthusiasts interested in understanding the advancements and challenges in leveraging LLMs for planning and reasoning in real-world scenarios.

github

: 117

Azure-AIGEN-demos

Microsoft Foundry is a unified Azure platform-as-a-service offering for enterprise AI operations, model builders, and application development. This foundation combines production-grade infrastructure with friendly interfaces, enabling developers to focus on building applications rather than managing infrastructure. Microsoft Foundry unifies agents, models, and tools under a single management grouping with built-in enterprise-readiness capabilities including tracing, monitoring, evaluations, and customizable enterprise setup configurations. The platform provides streamlined management through unified Role-based access control (RBAC), networking, and policies under one Azure resource provider namespace.

github

: 746

are-copilots-local-yet

Current trends and state of the art for using open & local LLM models as copilots to complete code, generate projects, act as shell assistants, automatically fix bugs, and more. This document is a curated list of local Copilots, shell assistants, and related projects, intended to be a resource for those interested in a survey of the existing tools and to help developers discover the state of the art for projects like these.

github

: 511

ai-game-devtools

github

: 735

models

The Intel® AI Reference Models repository contains links to pre-trained models, sample scripts, best practices, and tutorials for popular open-source machine learning models optimized by Intel to run on Intel® Xeon® Scalable processors and Intel® Data Center GPUs. It aims to replicate the best-known performance of target model/dataset combinations in optimally-configured hardware environments. The repository will be deprecated upon the publication of v3.2.0 and will no longer be maintained or published.

github

: 669

Awesome-Model-Merging-Methods-Theories-Applications

A comprehensive repository focusing on 'Model Merging in LLMs, MLLMs, and Beyond', providing an exhaustive overview of model merging methods, theories, applications, and future research directions. The repository covers various advanced methods, applications in foundation models, different machine learning subfields, and tasks like pre-merging methods, architecture transformation, weight alignment, basic merging methods, and more.

github

: 519

For similar tasks

HPT

Hyper-Pretrained Transformers (HPT) is a novel multimodal LLM framework from HyperGAI, trained for vision-language models capable of understanding both textual and visual inputs. The repository contains the open-source implementation of inference code to reproduce the evaluation results of HPT Air on different benchmarks. HPT has achieved competitive results with state-of-the-art models on various multimodal LLM benchmarks. It offers models like HPT 1.5 Air and HPT 1.0 Air, providing efficient solutions for vision-and-language tasks.

github

: 236

learnopencv

github

: 22.3k

spark-free-api

Spark AI Free 服务 provides high-speed streaming output, multi-turn dialogue support, AI drawing support, long document interpretation, and image parsing. It offers zero-configuration deployment, multi-token support, and automatic session trace cleaning. It is fully compatible with the ChatGPT interface. The repository includes multiple free-api projects for various AI services. Users can access the API for tasks such as chat completions, AI drawing, document interpretation, image analysis, and ssoSessionId live checking. The project also provides guidelines for deployment using Docker, Docker-compose, Render, Vercel, and native deployment methods. It recommends using custom clients for faster and simpler access to the free-api series projects.

github

: 57

mlx-vlm

MLX-VLM is a package designed for running Vision LLMs on Mac systems using MLX. It provides a convenient way to install and utilize the package for processing large language models related to vision tasks. The tool simplifies the process of running LLMs on Mac computers, offering a seamless experience for users interested in leveraging MLX for vision-related projects.

github

: 2.1k

clarifai-python-grpc

This is the official Clarifai gRPC Python client for interacting with their recognition API. Clarifai offers a platform for data scientists, developers, researchers, and enterprises to utilize artificial intelligence for image, video, and text analysis through computer vision and natural language processing. The client allows users to authenticate, predict concepts in images, and access various functionalities provided by the Clarifai API. It follows a versioning scheme that aligns with the backend API updates and includes specific instructions for installation and troubleshooting. Users can explore the Clarifai demo, sign up for an account, and refer to the documentation for detailed information.

github

: 56

horde-worker-reGen

This repository provides the latest implementation for the AI Horde Worker, allowing users to utilize their graphics card(s) to generate, post-process, or analyze images for others. It offers a platform where users can create images and earn 'kudos' in return, granting priority for their own image generations. The repository includes important details for setup, recommendations for system configurations, instructions for installation on Windows and Linux, basic usage guidelines, and information on updating the AI Horde Worker. Users can also run the worker with multiple GPUs and receive notifications for updates through Discord. Additionally, the repository contains models that are licensed under the CreativeML OpenRAIL License.

github

: 109

geospy

Geospy is a Python tool that utilizes Graylark's AI-powered geolocation service to determine the location where photos were taken. It allows users to analyze images and retrieve information such as country, city, explanation, coordinates, and Google Maps links. The tool provides a seamless way to integrate geolocation services into various projects and applications.

github

: 71

Awesome-Colorful-LLM

Awesome-Colorful-LLM is a meticulously assembled anthology of vibrant multimodal research focusing on advancements propelled by large language models (LLMs) in domains such as Vision, Audio, Agent, Robotics, and Fundamental Sciences like Mathematics. The repository contains curated collections of works, datasets, benchmarks, projects, and tools related to LLMs and multimodal learning. It serves as a comprehensive resource for researchers and practitioners interested in exploring the intersection of language models and various modalities for tasks like image understanding, video pretraining, 3D modeling, document understanding, audio analysis, agent learning, robotic applications, and mathematical research.

github

: 106

For similar jobs

promptflow

**Prompt flow** is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.

github

: 9.2k

deepeval

DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.

github

: 13.7k

MegaDetector

MegaDetector is an AI model that identifies animals, people, and vehicles in camera trap images (which also makes it useful for eliminating blank images). This model is trained on several million images from a variety of ecosystems. MegaDetector is just one of many tools that aims to make conservation biologists more efficient with AI. If you want to learn about other ways to use AI to accelerate camera trap workflows, check out our of the field, affectionately titled "Everything I know about machine learning and camera traps".

github

: 186

leapfrogai

LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.

github

: 255

llava-docker

This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.

github

: 59

carrot

The 'carrot' repository on GitHub provides a list of free and user-friendly ChatGPT mirror sites for easy access. The repository includes sponsored sites offering various GPT models and services. Users can find and share sites, report errors, and access stable and recommended sites for ChatGPT usage. The repository also includes a detailed list of ChatGPT sites, their features, and accessibility options, making it a valuable resource for ChatGPT users seeking free and unlimited GPT services.

github

: 17.1k

TrustLLM

TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.

github

: 535

AI-YinMei

AI-YinMei is an AI virtual anchor Vtuber development tool (N card version). It supports fastgpt knowledge base chat dialogue, a complete set of solutions for LLM large language models: [fastgpt] + [one-api] + [Xinference], supports docking bilibili live broadcast barrage reply and entering live broadcast welcome speech, supports Microsoft edge-tts speech synthesis, supports Bert-VITS2 speech synthesis, supports GPT-SoVITS speech synthesis, supports expression control Vtuber Studio, supports painting stable-diffusion-webui output OBS live broadcast room, supports painting picture pornography public-NSFW-y-distinguish, supports search and image search service duckduckgo (requires magic Internet access), supports image search service Baidu image search (no magic Internet access), supports AI reply chat box [html plug-in], supports AI singing Auto-Convert-Music, supports playlist [html plug-in], supports dancing function, supports expression video playback, supports head touching action, supports gift smashing action, supports singing automatic start dancing function, chat and singing automatic cycle swing action, supports multi scene switching, background music switching, day and night automatic switching scene, supports open singing and painting, let AI automatically judge the content.

github

: 529