human
Human: AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition
Stars: 2005
AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition, Body Segmentation
README:
AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition,
Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis,
Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition, Body Segmentation
- Compatible with most server-side and client-side environments and frameworks
- Combines multiple machine learning models which can be switched on-demand depending on the use-case
- Related models are executed in an attention pipeline to provide details when needed
- Optimized input pre-processing that can enhance image quality of any type of inputs
- Detection of frame changes to trigger only required models for improved performance
- Intelligent temporal interpolation to provide smooth results regardless of processing performance
- Simple unified API
- Built-in Image, Video and WebCam handling
-
Browser:
Compatible with both desktop and mobile platforms
Compatible with CPU, WebGL, WASM backends
Compatible with WebWorker execution
Compatible with WebView -
NodeJS:
Compatibile with WASM backend for executions on architectures where tensorflow binaries are not available
Compatible with tfjs-node using software execution via tensorflow shared libraries
Compatible with tfjs-node using GPU-accelerated execution via tensorflow shared libraries and nVidia CUDA
Check out Simple Live Demo fully annotated app as a good start starting point (html)(code)
Check out Main Live Demo app for advanced processing of of webcam, video stream or images static images with all possible tunable options
- To start video detection, simply press Play
- To process images, simply drag & drop in your Browser window
- Note: For optimal performance, select only models you'd like to use
- Note: If you have modern GPU, WebGL (default) backend is preferred, otherwise select WASM backend
All browser demos are self-contained without any external dependencies
- Full [Live] [Details]: Main browser demo app that showcases all Human capabilities
- Simple [Live] [Details]: Simple demo in WebCam processing demo in TypeScript
- Embedded [Live] [Details]: Even simpler demo with tiny code embedded in HTML file
- Face Detect [Live] [Details]: Extract faces from images and processes details
- Face Match [Live] [Details]: Extract faces from images, calculates face descriptors and similarities and matches them to known database
- Face ID [Live] [Details]: Runs multiple checks to validate webcam input before performing face match to faces in IndexDB
- Multi-thread [Live] [Details]: Runs each Human module in a separate web worker for highest possible performance
- NextJS [Live] [Details]: Use Human with TypeScript, NextJS and ReactJS
- ElectronJS [Details]: Use Human with TypeScript and ElectonJS to create standalone cross-platform apps
- 3D Analysis with BabylonJS [Live] [Details]: 3D tracking and visualization of heead, face, eye, body and hand
- VRM Virtual Model Tracking with Three.JS [Live] [Details]: VR model with head, face, eye, body and hand tracking
- VRM Virtual Model Tracking with BabylonJS [Live] [Details]: VR model with head, face, eye, body and hand tracking
NodeJS demos may require extra dependencies which are used to decode inputs
See header of each demo to see its dependencies as they are not automatically installed with Human
- Main [Details]: Process images from files, folders or URLs using native methods
-
Canvas [Details]: Process image from file or URL and draw results to a new image file using
node-canvas
-
Video [Details]: Processing of video input using
ffmpeg
-
WebCam [Details]: Processing of webcam screenshots using
fswebcam
-
Events [Details]: Showcases usage of
Human
eventing to get notifications on processing - Similarity [Details]: Compares two input images for similarity of detected faces
- Face Match [Details]: Parallel processing of face match in multiple child worker threads
-
Multiple Workers [Details]: Runs multiple parallel
human
by dispaching them to pool of pre-created worker processes - Dynamic Load [Details]: Loads Human dynamically with multiple different desired backends
- Code Repository
- NPM Package
- Issues Tracker
- TypeDoc API Specification - Main class
- TypeDoc API Specification - Full
- Change Log
- Current To-do List
- Home
- Installation
- Usage & Functions
- Configuration Details
- Result Details
- Customizing Draw Methods
- Caching & Smoothing
- Input Processing
- Face Recognition & Face Description
- Gesture Recognition
- Common Issues
- Background and Benchmarks
- Comparing Backends
- Development Server
- Build Process
- Adding Custom Modules
- Performance Notes
- Performance Profiling
- Platform Support
- Diagnostic and Performance trace information
- Dockerize Human applications
- List of Models & Credits
- Models Download Repository
- Security & Privacy Policy
- License & Usage Restrictions
See issues and discussions for list of known limitations and planned enhancements
Suggestions are welcome!
Visit Examples gallery for more examples
All options as presented in the demo application...
demo/index.html
Results Browser:
[ Demo -> Display -> Show Results ]
-
Face Similarity Matching:
Extracts all faces from provided input images,
sorts them by similarity to selected face
and optionally matches detected face with database of known people to guess their names
-
Face Detect:
Extracts all detect faces from loaded images on-demand and highlights face details on a selected face
-
Face ID:
Performs validation check on a webcam input to detect a real face and matches it to known faces stored in database
- 3D Rendering:
- VR Model Tracking:
- Human as OS native application:
468-Point Face Mesh Defails:
(view in full resolution to see keypoints)
Simply load Human
(IIFE version) directly from a cloud CDN in your HTML file:
(pick one: jsdelirv
, unpkg
or cdnjs
)
<!DOCTYPE HTML>
<script src="https://cdn.jsdelivr.net/npm/@vladmandic/human/dist/human.js"></script>
<script src="https://unpkg.dev/@vladmandic/human/dist/human.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/human/3.0.0/human.js"></script>
For details, including how to use Browser ESM
version or NodeJS
version of Human
, see Installation
Simple app that uses Human to process video input and
draw output on screen using internal draw helper functions
// create instance of human with simple configuration using default values
const config = { backend: 'webgl' };
const human = new Human.Human(config);
// select input HTMLVideoElement and output HTMLCanvasElement from page
const inputVideo = document.getElementById('video-id');
const outputCanvas = document.getElementById('canvas-id');
function detectVideo() {
// perform processing using default configuration
human.detect(inputVideo).then((result) => {
// result object will contain detected details
// as well as the processed canvas itself
// so lets first draw processed frame on canvas
human.draw.canvas(result.canvas, outputCanvas);
// then draw results on the same canvas
human.draw.face(outputCanvas, result.face);
human.draw.body(outputCanvas, result.body);
human.draw.hand(outputCanvas, result.hand);
human.draw.gesture(outputCanvas, result.gesture);
// and loop immediate to the next frame
requestAnimationFrame(detectVideo);
return result;
});
}
detectVideo();
or using async/await
:
// create instance of human with simple configuration using default values
const config = { backend: 'webgl' };
const human = new Human(config); // create instance of Human
const inputVideo = document.getElementById('video-id');
const outputCanvas = document.getElementById('canvas-id');
async function detectVideo() {
const result = await human.detect(inputVideo); // run detection
human.draw.all(outputCanvas, result); // draw all results
requestAnimationFrame(detectVideo); // run loop
}
detectVideo(); // start loop
or using Events
:
// create instance of human with simple configuration using default values
const config = { backend: 'webgl' };
const human = new Human(config); // create instance of Human
const inputVideo = document.getElementById('video-id');
const outputCanvas = document.getElementById('canvas-id');
human.events.addEventListener('detect', () => { // event gets triggered when detect is complete
human.draw.all(outputCanvas, human.result); // draw all results
});
function detectVideo() {
human.detect(inputVideo) // run detection
.then(() => requestAnimationFrame(detectVideo)); // upon detect complete start processing of the next frame
}
detectVideo(); // start loop
or using interpolated results for smooth video processing by separating detection and drawing loops:
const human = new Human(); // create instance of Human
const inputVideo = document.getElementById('video-id');
const outputCanvas = document.getElementById('canvas-id');
let result;
async function detectVideo() {
result = await human.detect(inputVideo); // run detection
requestAnimationFrame(detectVideo); // run detect loop
}
async function drawVideo() {
if (result) { // check if result is available
const interpolated = human.next(result); // get smoothened result using last-known results
human.draw.all(outputCanvas, interpolated); // draw the frame
}
requestAnimationFrame(drawVideo); // run draw loop
}
detectVideo(); // start detection loop
drawVideo(); // start draw loop
or same, but using built-in full video processing instead of running manual frame-by-frame loop:
const human = new Human(); // create instance of Human
const inputVideo = document.getElementById('video-id');
const outputCanvas = document.getElementById('canvas-id');
async function drawResults() {
const interpolated = human.next(); // get smoothened result using last-known results
human.draw.all(outputCanvas, interpolated); // draw the frame
requestAnimationFrame(drawResults); // run draw loop
}
human.video(inputVideo); // start detection loop which continously updates results
drawResults(); // start draw loop
or using built-in webcam helper methods that take care of video handling completely:
const human = new Human(); // create instance of Human
const outputCanvas = document.getElementById('canvas-id');
async function drawResults() {
const interpolated = human.next(); // get smoothened result using last-known results
human.draw.canvas(outputCanvas, human.webcam.element); // draw current webcam frame
human.draw.all(outputCanvas, interpolated); // draw the frame detectgion results
requestAnimationFrame(drawResults); // run draw loop
}
await human.webcam.start({ crop: true });
human.video(human.webcam.element); // start detection loop which continously updates results
drawResults(); // start draw loop
And for even better results, you can run detection in a separate web worker thread
Human
library can process all known input types:
-
Image
,ImageData
,ImageBitmap
,Canvas
,OffscreenCanvas
,Tensor
, -
HTMLImageElement
,HTMLCanvasElement
,HTMLVideoElement
,HTMLMediaElement
Additionally, HTMLVideoElement
, HTMLMediaElement
can be a standard <video>
tag that links to:
- WebCam on user's system
- Any supported video type
e.g..mp4
,.avi
, etc. - Additional video types supported via HTML5 Media Source Extensions
e.g.: HLS (HTTP Live Streaming) usinghls.js
or DASH (Dynamic Adaptive Streaming over HTTP) usingdash.js
- WebRTC media track using built-in support
Human
is written using TypeScript strong typing and ships with full TypeDefs for all classes defined by the library bundled in types/human.d.ts
and enabled by default
Note: This does not include embedded tfjs
If you want to use embedded tfjs
inside Human
(human.tf
namespace) and still full typedefs, add this code:
import type * as tfjs from '@vladmandic/human/dist/tfjs.esm';
const tf = human.tf as typeof tfjs;
This is not enabled by default as Human
does not ship with full TFJS TypeDefs due to size considerations
Enabling tfjs
TypeDefs as above creates additional project (dev-only as only types are required) dependencies as defined in @vladmandic/human/dist/tfjs.esm.d.ts
:
@tensorflow/tfjs-core, @tensorflow/tfjs-converter, @tensorflow/tfjs-backend-wasm, @tensorflow/tfjs-backend-webgl
Default models in Human library are:
- Face Detection: MediaPipe BlazeFace Back variation
- Face Mesh: MediaPipe FaceMesh
- Face Iris Analysis: MediaPipe Iris
- Face Description: HSE FaceRes
- Emotion Detection: Oarriaga Emotion
- Body Analysis: MoveNet Lightning variation
- Hand Analysis: HandTrack & MediaPipe HandLandmarks
- Body Segmentation: Google Selfie
- Object Detection: CenterNet with MobileNet v3
Note that alternative models are provided and can be enabled via configuration
For example, body pose detection by default uses MoveNet Lightning, but can be switched to MultiNet Thunder for higher precision or Multinet MultiPose for multi-person detection or even PoseNet, BlazePose or EfficientPose depending on the use case
For more info, see Configuration Details and List of Models
Human
library is written in TypeScript 5.1 using TensorFlow/JS 4.10 and conforming to latest JavaScript
ECMAScript version 2022 standard
Build target for distributables is JavaScript
EMCAScript version 2018
For details see Wiki Pages
and API Specification
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for human
Similar Open Source Tools
human
AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition, Body Segmentation
asktube
AskTube is an AI-powered YouTube video summarizer and QA assistant that utilizes Retrieval Augmented Generation (RAG) technology. It offers a comprehensive solution with Q&A functionality and aims to provide a user-friendly experience for local machine usage. The project integrates various technologies including Python, JS, Sanic, Peewee, Pytubefix, Sentence Transformers, Sqlite, Chroma, and NuxtJs/DaisyUI. AskTube supports multiple providers for analysis, AI services, and speech-to-text conversion. The tool is designed to extract data from YouTube URLs, store embedding chapter subtitles, and facilitate interactive Q&A sessions with enriched questions. It is not intended for production use but rather for end-users on their local machines.
WebAI-to-API
This project implements a web API that offers a unified interface to Google Gemini and Claude 3. It provides a self-hosted, lightweight, and scalable solution for accessing these AI models through a streaming API. The API supports both Claude and Gemini models, allowing users to interact with them in real-time. The project includes a user-friendly web UI for configuration and documentation, making it easy to get started and explore the capabilities of the API.
ort
Ort is an unofficial ONNX Runtime 1.17 wrapper for Rust based on the now inactive onnxruntime-rs. ONNX Runtime accelerates ML inference on both CPU and GPU.
summarize
The 'summarize' tool is designed to transcribe and summarize videos from various sources using AI models. It helps users efficiently summarize lengthy videos, take notes, and extract key insights by providing timestamps, original transcripts, and support for auto-generated captions. Users can utilize different AI models via Groq, OpenAI, or custom local models to generate grammatically correct video transcripts and extract wisdom from video content. The tool simplifies the process of summarizing video content, making it easier to remember and reference important information.
aiotieba
Aiotieba is an asynchronous Python library for interacting with the Tieba API. It provides a comprehensive set of features for working with Tieba, including support for authentication, thread and post management, and image and file uploading. Aiotieba is well-documented and easy to use, making it a great choice for developers who want to build applications that interact with Tieba.
composio
Composio is a production-ready toolset for AI agents that enables users to integrate AI agents with various agentic tools effortlessly. It provides support for over 100 tools across different categories, including popular softwares like GitHub, Notion, Linear, Gmail, Slack, and more. Composio ensures managed authorization with support for six different authentication protocols, offering better agentic accuracy and ease of use. Users can easily extend Composio with additional tools, frameworks, and authorization protocols. The toolset is designed to be embeddable and pluggable, allowing for seamless integration and consistent user experience.
Visionatrix
Visionatrix is a project aimed at providing easy use of ComfyUI workflows. It offers simplified setup and update processes, a minimalistic UI for daily workflow use, stable workflows with versioning and update support, scalability for multiple instances and task workers, multiple user support with integration of different user backends, LLM power for integration with Ollama/Gemini, and seamless integration as a service with backend endpoints and webhook support. The project is approaching version 1.0 release and welcomes new ideas for further implementation.
xtuner
XTuner is an efficient, flexible, and full-featured toolkit for fine-tuning large models. It supports various LLMs (InternLM, Mixtral-8x7B, Llama 2, ChatGLM, Qwen, Baichuan, ...), VLMs (LLaVA), and various training algorithms (QLoRA, LoRA, full-parameter fine-tune). XTuner also provides tools for chatting with pretrained / fine-tuned LLMs and deploying fine-tuned LLMs with any other framework, such as LMDeploy.
cog
Cog is an open-source tool that lets you package machine learning models in a standard, production-ready container. You can deploy your packaged model to your own infrastructure, or to Replicate.
rivet
Rivet is a desktop application for creating complex AI agents and prompt chaining, and embedding it in your application. Rivet currently has LLM support for OpenAI GPT-3.5 and GPT-4, Anthropic Claude Instant and Claude 2, [Anthropic Claude 3 Haiku, Sonnet, and Opus](https://www.anthropic.com/news/claude-3-family), and AssemblyAI LeMUR framework for voice data. Rivet has embedding/vector database support for OpenAI Embeddings and Pinecone. Rivet also supports these additional integrations: Audio Transcription from AssemblyAI. Rivet core is a TypeScript library for running graphs created in Rivet. It is used by the Rivet application, but can also be used in your own applications, so that Rivet can call into your own application's code, and your application can call into Rivet graphs.
langchain4j-aideepin
LangChain4j-AIDeepin is an open-source, offline deployable retrieval enhancement generation (RAG) project based on large language models such as ChatGPT and Langchain4j application framework. It offers features like registration & login, multi-session support, image generation, prompt words, quota control, knowledge base, model-based search, model switching, and search engine switching. The project integrates models like ChatGPT 3.5, Tongyi Qianwen, Wenxin Yiyuan, Ollama, and DALL-E 2. The backend uses technologies like JDK 17, Spring Boot 3.0.5, Langchain4j, and PostgreSQL with pgvector extension, while the frontend is built with Vue3, TypeScript, and PNPM.
instill-core
Instill Core is an open-source orchestrator comprising a collection of source-available projects designed to streamline every aspect of building versatile AI features with unstructured data. It includes Instill VDP (Versatile Data Pipeline) for unstructured data, AI, and pipeline orchestration, Instill Model for scalable MLOps and LLMOps for open-source or custom AI models, and Instill Artifact for unified unstructured data management. Instill Core can be used for tasks such as building, testing, and sharing pipelines, importing, serving, fine-tuning, and monitoring ML models, and transforming documents, images, audio, and video into a unified AI-ready format.
polyfire-js
Polyfire is an all-in-one managed backend for AI apps that allows users to build AI apps directly from the frontend, eliminating the need for a separate backend. It simplifies the process by providing most backend services in just a few lines of code. With Polyfire, users can easily create chatbots, transcribe audio files to text, generate simple text, create a long-term memory, and generate images with Dall-E. The tool also offers starter guides and tutorials to help users get started quickly and efficiently.
hugging-llm
HuggingLLM is a project that aims to introduce ChatGPT to a wider audience, particularly those interested in using the technology to create new products or applications. The project focuses on providing practical guidance on how to use ChatGPT-related APIs to create new features and applications. It also includes detailed background information and system design introductions for relevant tasks, as well as example code and implementation processes. The project is designed for individuals with some programming experience who are interested in using ChatGPT for practical applications, and it encourages users to experiment and create their own applications and demos.
anylabeling
AnyLabeling is a tool for effortless data labeling with AI support from YOLO and Segment Anything. It combines features from LabelImg and Labelme with an improved UI and auto-labeling capabilities. Users can annotate images with polygons, rectangles, circles, lines, and points, as well as perform auto-labeling using YOLOv5 and Segment Anything. The tool also supports text detection, recognition, and Key Information Extraction (KIE) labeling, with multiple language options available such as English, Vietnamese, and Chinese.
For similar tasks
human
AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition, Body Segmentation
Fay
Fay is an open-source digital human framework that offers different versions for various purposes. The '带货完整版' is suitable for online and offline salespersons. The '助理完整版' serves as a human-machine interactive digital assistant that can also control devices upon command. The 'agent版' is designed to be an autonomous agent capable of making decisions and contacting its owner. The framework provides updates and improvements across its different versions, including features like emotion analysis integration, model optimizations, and compatibility enhancements. Users can access detailed documentation for each version through the provided links.
hume-api-examples
This repository contains examples of how to use the Hume API with different frameworks and languages. It includes examples for Empathic Voice Interface (EVI) and Expression Measurement API. The EVI examples cover custom language models, modal, Next.js integration, Vue integration, Hume Python SDK, and React integration. The Expression Measurement API examples include models for face, language, burst, and speech, with implementations in Python and Typescript using frameworks like Next.js.
Starmoon
Starmoon is an affordable, compact AI-enabled device that can understand and respond to your emotions with empathy. It offers supportive conversations and personalized learning assistance. The device is cost-effective, voice-enabled, open-source, compact, and aims to reduce screen time. Users can assemble the device themselves using off-the-shelf components and deploy it locally for data privacy. Starmoon integrates various APIs for AI language models, speech-to-text, text-to-speech, and emotion intelligence. The hardware setup involves components like ESP32S3, microphone, amplifier, speaker, LED light, and button, along with software setup instructions for developers. The project also includes a web app, backend API, and background task dashboard for monitoring and management.
MiniAI-Face-Recognition-LivenessDetection-WindowsSDK
This repository contains a C++ application that demonstrates face recognition capabilities using computer vision techniques. The demo utilizes OpenCV and dlib libraries for efficient face detection and recognition with 3D passive face liveness detection (face anti-spoofing). Key Features: Face detection: The SDK utilizes advanced computer vision techniques to detect faces in images or video frames, enabling a wide range of applications. Face recognition: It can recognize known faces by comparing them with a pre-defined database of individuals. Age estimation: It can estimate the age of detected faces. Gender detection: It can determine the gender of detected faces. Liveness detection: It can detect whether a face is from a live person or a static image.
face-api
FaceAPI is an AI-powered tool for face detection, rotation tracking, face description, recognition, age, gender, and emotion prediction. It can be used in both browser and NodeJS environments using TensorFlow/JS. The tool provides live demos for processing images and webcam feeds, along with NodeJS examples for various tasks such as face similarity comparison and multiprocessing. FaceAPI offers different pre-built versions for client-side browser execution and server-side NodeJS execution, with or without TFJS pre-bundled. It is compatible with TFJS 2.0+ and TFJS 3.0+.
MiKaPo
MiKaPo is a web-based tool that allows users to pose MMD models in real-time using video input. It utilizes technologies such as Mediapipe for 3D key points detection, Babylon.js for 3D scene rendering, babylon-mmd for MMD model viewing, and Vite+React for the web framework. Users can upload videos and images, select different environments, and choose models for posing. MiKaPo also supports camera input and Ollama (electron version). The tool is open to feature requests and pull requests, with ongoing development to add VMD export functionality.
mediapipe-rs
MediaPipe-rs is a Rust library designed for MediaPipe tasks on WasmEdge WASI-NN. It offers easy-to-use low-code APIs similar to mediapipe-python, with low overhead and flexibility for custom media input. The library supports various tasks like object detection, image classification, gesture recognition, and more, including TfLite models, TF Hub models, and custom models. Users can create task instances, run sessions for pre-processing, inference, and post-processing, and speed up processing by reusing sessions. The library also provides support for audio tasks using audio data from symphonia, ffmpeg, or raw audio. Users can choose between CPU, GPU, or TPU devices for processing.
For similar jobs
human
AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition, Body Segmentation
ailia-models
The collection of pre-trained, state-of-the-art AI models. ailia SDK is a self-contained, cross-platform, high-speed inference SDK for AI. The ailia SDK provides a consistent C++ API across Windows, Mac, Linux, iOS, Android, Jetson, and Raspberry Pi platforms. It also supports Unity (C#), Python, Rust, Flutter(Dart) and JNI for efficient AI implementation. The ailia SDK makes extensive use of the GPU through Vulkan and Metal to enable accelerated computing. # Supported models 323 models as of April 8th, 2024