human

Human: AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition

Stars: 2005

Visit

AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition, Body Segmentation

README:

Human Library

AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition,
Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis,
Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition, Body Segmentation

Highlights

Compatible with most server-side and client-side environments and frameworks
Combines multiple machine learning models which can be switched on-demand depending on the use-case
Related models are executed in an attention pipeline to provide details when needed
Optimized input pre-processing that can enhance image quality of any type of inputs
Detection of frame changes to trigger only required models for improved performance
Intelligent temporal interpolation to provide smooth results regardless of processing performance
Simple unified API
Built-in Image, Video and WebCam handling

Jump to Quick Start

Compatibility

Browser:
Compatible with both desktop and mobile platforms
Compatible with CPU, WebGL, WASM backends
Compatible with WebWorker execution
Compatible with WebView
NodeJS:
Compatibile with WASM backend for executions on architectures where tensorflow binaries are not available
Compatible with tfjs-node using software execution via tensorflow shared libraries
Compatible with tfjs-node using GPU-accelerated execution via tensorflow shared libraries and nVidia CUDA

Releases

Demos

Check out Simple Live Demo fully annotated app as a good start starting point (html)(code)

Check out Main Live Demo app for advanced processing of of webcam, video stream or images static images with all possible tunable options

To start video detection, simply press Play
To process images, simply drag & drop in your Browser window
Note: For optimal performance, select only models you'd like to use
Note: If you have modern GPU, WebGL (default) backend is preferred, otherwise select WASM backend

Browser Demos

All browser demos are self-contained without any external dependencies

Full [Live] [Details]: Main browser demo app that showcases all Human capabilities
Simple [Live] [Details]: Simple demo in WebCam processing demo in TypeScript
Embedded [Live] [Details]: Even simpler demo with tiny code embedded in HTML file
Face Detect [Live] [Details]: Extract faces from images and processes details
Face Match [Live] [Details]: Extract faces from images, calculates face descriptors and similarities and matches them to known database
Face ID [Live] [Details]: Runs multiple checks to validate webcam input before performing face match to faces in IndexDB
Multi-thread [Live] [Details]: Runs each Human module in a separate web worker for highest possible performance
NextJS [Live] [Details]: Use Human with TypeScript, NextJS and ReactJS
ElectronJS [Details]: Use Human with TypeScript and ElectonJS to create standalone cross-platform apps
3D Analysis with BabylonJS [Live] [Details]: 3D tracking and visualization of heead, face, eye, body and hand
VRM Virtual Model Tracking with Three.JS [Live] [Details]: VR model with head, face, eye, body and hand tracking
VRM Virtual Model Tracking with BabylonJS [Live] [Details]: VR model with head, face, eye, body and hand tracking

NodeJS Demos

NodeJS demos may require extra dependencies which are used to decode inputs
See header of each demo to see its dependencies as they are not automatically installed with Human

Main [Details]: Process images from files, folders or URLs using native methods
Canvas [Details]: Process image from file or URL and draw results to a new image file using node-canvas
Video [Details]: Processing of video input using ffmpeg
WebCam [Details]: Processing of webcam screenshots using fswebcam
Events [Details]: Showcases usage of Human eventing to get notifications on processing
Similarity [Details]: Compares two input images for similarity of detected faces
Face Match [Details]: Parallel processing of face match in multiple child worker threads
Multiple Workers [Details]: Runs multiple parallel human by dispaching them to pool of pre-created worker processes
Dynamic Load [Details]: Loads Human dynamically with multiple different desired backends

Project pages

Wiki pages

Additional notes

See issues and discussions for list of known limitations and planned enhancements

Suggestions are welcome!

App Examples

Visit Examples gallery for more examples

Options

All options as presented in the demo application...
demo/index.html

Results Browser:
[ Demo -> Display -> Show Results ]

Advanced Examples

Face Similarity Matching:
Extracts all faces from provided input images,
sorts them by similarity to selected face
and optionally matches detected face with database of known people to guess their names

demo/facematch

Face Detect:
Extracts all detect faces from loaded images on-demand and highlights face details on a selected face

demo/facedetect

Face ID:
Performs validation check on a webcam input to detect a real face and matches it to known faces stored in database

demo/faceid

3D Rendering:

human-motion

VR Model Tracking:

human-three-vrm
human-bjs-vrm

Human as OS native application:

human-electron

468-Point Face Mesh Defails:
(view in full resolution to see keypoints)

Quick Start

Simply load Human (IIFE version) directly from a cloud CDN in your HTML file:
(pick one: jsdelirv, unpkg or cdnjs)

<!DOCTYPE HTML>
<script src="https://cdn.jsdelivr.net/npm/@vladmandic/human/dist/human.js"></script>
<script src="https://unpkg.dev/@vladmandic/human/dist/human.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/human/3.0.0/human.js"></script>

For details, including how to use Browser ESM version or NodeJS version of Human, see Installation

Code Examples

Simple app that uses Human to process video input and
draw output on screen using internal draw helper functions

// create instance of human with simple configuration using default values
const config = { backend: 'webgl' };
const human = new Human.Human(config);
// select input HTMLVideoElement and output HTMLCanvasElement from page
const inputVideo = document.getElementById('video-id');
const outputCanvas = document.getElementById('canvas-id');

function detectVideo() {
  // perform processing using default configuration
  human.detect(inputVideo).then((result) => {
    // result object will contain detected details
    // as well as the processed canvas itself
    // so lets first draw processed frame on canvas
    human.draw.canvas(result.canvas, outputCanvas);
    // then draw results on the same canvas
    human.draw.face(outputCanvas, result.face);
    human.draw.body(outputCanvas, result.body);
    human.draw.hand(outputCanvas, result.hand);
    human.draw.gesture(outputCanvas, result.gesture);
    // and loop immediate to the next frame
    requestAnimationFrame(detectVideo);
    return result;
  });
}

detectVideo();

or using async/await:

// create instance of human with simple configuration using default values
const config = { backend: 'webgl' };
const human = new Human(config); // create instance of Human
const inputVideo = document.getElementById('video-id');
const outputCanvas = document.getElementById('canvas-id');

async function detectVideo() {
  const result = await human.detect(inputVideo); // run detection
  human.draw.all(outputCanvas, result); // draw all results
  requestAnimationFrame(detectVideo); // run loop
}

detectVideo(); // start loop

or using Events:

// create instance of human with simple configuration using default values
const config = { backend: 'webgl' };
const human = new Human(config); // create instance of Human
const inputVideo = document.getElementById('video-id');
const outputCanvas = document.getElementById('canvas-id');

human.events.addEventListener('detect', () => { // event gets triggered when detect is complete
  human.draw.all(outputCanvas, human.result); // draw all results
});

function detectVideo() {
  human.detect(inputVideo) // run detection
    .then(() => requestAnimationFrame(detectVideo)); // upon detect complete start processing of the next frame
}

detectVideo(); // start loop

or using interpolated results for smooth video processing by separating detection and drawing loops:

const human = new Human(); // create instance of Human
const inputVideo = document.getElementById('video-id');
const outputCanvas = document.getElementById('canvas-id');
let result;

async function detectVideo() {
  result = await human.detect(inputVideo); // run detection
  requestAnimationFrame(detectVideo); // run detect loop
}

async function drawVideo() {
  if (result) { // check if result is available
    const interpolated = human.next(result); // get smoothened result using last-known results
    human.draw.all(outputCanvas, interpolated); // draw the frame
  }
  requestAnimationFrame(drawVideo); // run draw loop
}

detectVideo(); // start detection loop
drawVideo(); // start draw loop

or same, but using built-in full video processing instead of running manual frame-by-frame loop:

const human = new Human(); // create instance of Human
const inputVideo = document.getElementById('video-id');
const outputCanvas = document.getElementById('canvas-id');

async function drawResults() {
  const interpolated = human.next(); // get smoothened result using last-known results
  human.draw.all(outputCanvas, interpolated); // draw the frame
  requestAnimationFrame(drawResults); // run draw loop
}

human.video(inputVideo); // start detection loop which continously updates results
drawResults(); // start draw loop

or using built-in webcam helper methods that take care of video handling completely:

const human = new Human(); // create instance of Human
const outputCanvas = document.getElementById('canvas-id');

async function drawResults() {
  const interpolated = human.next(); // get smoothened result using last-known results
  human.draw.canvas(outputCanvas, human.webcam.element); // draw current webcam frame
  human.draw.all(outputCanvas, interpolated); // draw the frame detectgion results
  requestAnimationFrame(drawResults); // run draw loop
}

await human.webcam.start({ crop: true });
human.video(human.webcam.element); // start detection loop which continously updates results
drawResults(); // start draw loop

And for even better results, you can run detection in a separate web worker thread

Inputs

Human library can process all known input types:

Image, ImageData, ImageBitmap, Canvas, OffscreenCanvas, Tensor,
HTMLImageElement, HTMLCanvasElement, HTMLVideoElement, HTMLMediaElement

Additionally, HTMLVideoElement, HTMLMediaElement can be a standard <video> tag that links to:

WebCam on user's system
Any supported video type
e.g. .mp4, .avi, etc.
Additional video types supported via HTML5 Media Source Extensions
e.g.: HLS (HTTP Live Streaming) using hls.js or DASH (Dynamic Adaptive Streaming over HTTP) using dash.js
WebRTC media track using built-in support

Detailed Usage

TypeDefs

Human is written using TypeScript strong typing and ships with full TypeDefs for all classes defined by the library bundled in types/human.d.ts and enabled by default

Note: This does not include embedded tfjs
If you want to use embedded tfjs inside Human (human.tf namespace) and still full typedefs, add this code:

import type * as tfjs from '@vladmandic/human/dist/tfjs.esm';
const tf = human.tf as typeof tfjs;

This is not enabled by default as Human does not ship with full TFJS TypeDefs due to size considerations
Enabling tfjs TypeDefs as above creates additional project (dev-only as only types are required) dependencies as defined in @vladmandic/human/dist/tfjs.esm.d.ts:

@tensorflow/tfjs-core, @tensorflow/tfjs-converter, @tensorflow/tfjs-backend-wasm, @tensorflow/tfjs-backend-webgl

Default models

Default models in Human library are:

Face Detection: MediaPipe BlazeFace Back variation
Face Mesh: MediaPipe FaceMesh
Face Iris Analysis: MediaPipe Iris
Face Description: HSE FaceRes
Emotion Detection: Oarriaga Emotion
Body Analysis: MoveNet Lightning variation
Hand Analysis: HandTrack & MediaPipe HandLandmarks
Body Segmentation: Google Selfie
Object Detection: CenterNet with MobileNet v3

Note that alternative models are provided and can be enabled via configuration
For example, body pose detection by default uses MoveNet Lightning, but can be switched to MultiNet Thunder for higher precision or Multinet MultiPose for multi-person detection or even PoseNet, BlazePose or EfficientPose depending on the use case

For more info, see Configuration Details and List of Models

Diagnostics

How to get diagnostic information or performance trace information

Human library is written in TypeScript 5.1 using TensorFlow/JS 4.10 and conforming to latest JavaScript ECMAScript version 2022 standard

Build target for distributables is JavaScript EMCAScript version 2018

For details see Wiki Pages
and API Specification

For Tasks:

Click tags to check more tools for each tasks

detect faces track body recognize gestures analyze emotions segment body

For Jobs:

face detection face recognition body pose tracking hand tracking gesture recognition

Alternative AI tools for human

Similar Open Source Tools

human

github

: 2.0k

agentica

Agentica is a specialized Agentic AI library focused on LLM Function Calling. Users can provide Swagger/OpenAPI documents or TypeScript class types to Agentica for seamless functionality. The library simplifies AI development by handling various tasks effortlessly.

github

: 393

ai-rules-builder

github

: 83

asktube

AskTube is an AI-powered YouTube video summarizer and QA assistant that utilizes Retrieval Augmented Generation (RAG) technology. It offers a comprehensive solution with Q&A functionality and aims to provide a user-friendly experience for local machine usage. The project integrates various technologies including Python, JS, Sanic, Peewee, Pytubefix, Sentence Transformers, Sqlite, Chroma, and NuxtJs/DaisyUI. AskTube supports multiple providers for analysis, AI services, and speech-to-text conversion. The tool is designed to extract data from YouTube URLs, store embedding chapter subtitles, and facilitate interactive Q&A sessions with enriched questions. It is not intended for production use but rather for end-users on their local machines.

github

: 65

ComfyUI-Ollama-Describer

ComfyUI-Ollama-Describer is an extension for ComfyUI that enables the use of LLM models provided by Ollama, such as Gemma, Llava (multimodal), Llama2, Llama3, or Mistral. It requires the Ollama library for interacting with large-scale language models, supporting GPUs using CUDA and AMD GPUs on Windows, Linux, and Mac. The extension allows users to run Ollama through Docker and utilize NVIDIA GPUs for faster processing. It provides nodes for image description, text description, image captioning, and text transformation, with various customizable parameters for model selection, API communication, response generation, and model memory management.

github

: 70

summarize

The 'summarize' tool is designed to transcribe and summarize videos from various sources using AI models. It helps users efficiently summarize lengthy videos, take notes, and extract key insights by providing timestamps, original transcripts, and support for auto-generated captions. Users can utilize different AI models via Groq, OpenAI, or custom local models to generate grammatically correct video transcripts and extract wisdom from video content. The tool simplifies the process of summarizing video content, making it easier to remember and reference important information.

github

: 73

ort

Ort is an unofficial ONNX Runtime 1.17 wrapper for Rust based on the now inactive onnxruntime-rs. ONNX Runtime accelerates ML inference on both CPU and GPU.

github

: 1.2k

aiotieba

Aiotieba is an asynchronous Python library for interacting with the Tieba API. It provides a comprehensive set of features for working with Tieba, including support for authentication, thread and post management, and image and file uploading. Aiotieba is well-documented and easy to use, making it a great choice for developers who want to build applications that interact with Tieba.

github

: 340

nestia

Nestia is a set of helper libraries for NestJS, providing super-fast/easy decorators, advanced WebSocket routes, Swagger generator, SDK library generator for clients, mockup simulator for client applications, automatic E2E test functions generator, test program utilizing e2e test functions, benchmark program using e2e test functions, super A.I. chatbot by Swagger document, Swagger-UI with online TypeScript editor, and a CLI tool. It enhances performance significantly and offers a collection of typed fetch functions with DTO structures like tRPC, along with a mockup simulator that is fully automated.

github

: 2.0k

TrustEval-toolkit

TrustEval-toolkit is a dynamic and comprehensive framework for evaluating the trustworthiness of Generative Foundation Models (GenFMs) across dimensions such as safety, fairness, robustness, privacy, and more. It offers features like dynamic dataset generation, multi-model compatibility, customizable metrics, metadata-driven pipelines, comprehensive evaluation dimensions, optimized inference, and detailed reports.

github

: 95

tensorzero

TensorZero is an open-source platform that helps LLM applications graduate from API wrappers into defensible AI products. It enables a data & learning flywheel for LLMs by unifying inference, observability, optimization, and experimentation. The platform includes a high-performance model gateway, structured schema-based inference, observability, experimentation, and data warehouse for analytics. TensorZero Recipes optimize prompts and models, and the platform supports experimentation features and GitOps orchestration for deployment.

github

: 3.4k

xtuner

XTuner is an efficient, flexible, and full-featured toolkit for fine-tuning large models. It supports various LLMs (InternLM, Mixtral-8x7B, Llama 2, ChatGLM, Qwen, Baichuan, ...), VLMs (LLaVA), and various training algorithms (QLoRA, LoRA, full-parameter fine-tune). XTuner also provides tools for chatting with pretrained / fine-tuned LLMs and deploying fine-tuned LLMs with any other framework, such as LMDeploy.

github

: 4.4k

rkllama

RKLLama is a server and client tool designed for running and interacting with LLM models optimized for Rockchip RK3588(S) and RK3576 platforms. It allows models to run on the NPU, with features such as running models on NPU, partial Ollama API compatibility, pulling models from Huggingface, API REST with documentation, dynamic loading/unloading of models, inference requests with streaming modes, simplified model naming, CPU model auto-detection, and optional debug mode. The tool supports Python 3.8 to 3.12 and has been tested on Orange Pi 5 Pro and Orange Pi 5 Plus with specific OS versions.

github

: 88

Rystem.OpenAi

github

: 94

cog

Cog is an open-source tool that lets you package machine learning models in a standard, production-ready container. You can deploy your packaged model to your own infrastructure, or to Replicate.

github

: 8.5k

jan

Jan is an open-source ChatGPT alternative that runs 100% offline on your computer. It supports universal architectures, including Nvidia GPUs, Apple M-series, Apple Intel, Linux Debian, and Windows x64. Jan is currently in development, so expect breaking changes and bugs. It is lightweight and embeddable, and can be used on its own within your own projects.

github

: 24.8k

For similar tasks

human

github

: 2.0k

Fay

Fay is an open-source digital human framework that offers different versions for various purposes. The '带货完整版' is suitable for online and offline salespersons. The '助理完整版' serves as a human-machine interactive digital assistant that can also control devices upon command. The 'agent版' is designed to be an autonomous agent capable of making decisions and contacting its owner. The framework provides updates and improvements across its different versions, including features like emotion analysis integration, model optimizations, and compatibility enhancements. Users can access detailed documentation for each version through the provided links.

github

: 10.7k

hume-api-examples

This repository contains examples of how to use the Hume API with different frameworks and languages. It includes examples for Empathic Voice Interface (EVI) and Expression Measurement API. The EVI examples cover custom language models, modal, Next.js integration, Vue integration, Hume Python SDK, and React integration. The Expression Measurement API examples include models for face, language, burst, and speech, with implementations in Python and Typescript using frameworks like Next.js.

github

: 164

Starmoon

Starmoon is an affordable, compact AI-enabled device that can understand and respond to your emotions with empathy. It offers supportive conversations and personalized learning assistance. The device is cost-effective, voice-enabled, open-source, compact, and aims to reduce screen time. Users can assemble the device themselves using off-the-shelf components and deploy it locally for data privacy. Starmoon integrates various APIs for AI language models, speech-to-text, text-to-speech, and emotion intelligence. The hardware setup involves components like ESP32S3, microphone, amplifier, speaker, LED light, and button, along with software setup instructions for developers. The project also includes a web app, backend API, and background task dashboard for monitoring and management.

github

: 457

MiniAI-Face-Recognition-LivenessDetection-WindowsSDK

This repository contains a C++ application that demonstrates face recognition capabilities using computer vision techniques. The demo utilizes OpenCV and dlib libraries for efficient face detection and recognition with 3D passive face liveness detection (face anti-spoofing). Key Features: Face detection: The SDK utilizes advanced computer vision techniques to detect faces in images or video frames, enabling a wide range of applications. Face recognition: It can recognize known faces by comparing them with a pre-defined database of individuals. Age estimation: It can estimate the age of detected faces. Gender detection: It can determine the gender of detected faces. Liveness detection: It can detect whether a face is from a live person or a static image.

github

: 102

face-api

FaceAPI is an AI-powered tool for face detection, rotation tracking, face description, recognition, age, gender, and emotion prediction. It can be used in both browser and NodeJS environments using TensorFlow/JS. The tool provides live demos for processing images and webcam feeds, along with NodeJS examples for various tasks such as face similarity comparison and multiprocessing. FaceAPI offers different pre-built versions for client-side browser execution and server-side NodeJS execution, with or without TFJS pre-bundled. It is compatible with TFJS 2.0+ and TFJS 3.0+.

github

: 826

MiKaPo

MiKaPo is a web-based tool that allows users to pose MMD models in real-time using video input. It utilizes technologies such as Mediapipe for 3D key points detection, Babylon.js for 3D scene rendering, babylon-mmd for MMD model viewing, and Vite+React for the web framework. Users can upload videos and images, select different environments, and choose models for posing. MiKaPo also supports camera input and Ollama (electron version). The tool is open to feature requests and pull requests, with ongoing development to add VMD export functionality.

github

: 75

DeepSparkHub

DeepSparkHub is a repository that curates hundreds of application algorithms and models covering various fields in AI and general computing. It supports mainstream intelligent computing scenarios in markets such as smart cities, digital individuals, healthcare, education, communication, energy, and more. The repository provides a wide range of models for tasks such as computer vision, face detection, face recognition, instance segmentation, image generation, knowledge distillation, network pruning, object detection, 3D object detection, OCR, pose estimation, self-supervised learning, semantic segmentation, super resolution, tracking, traffic forecast, GNN, HPC, methodology, multimodal, NLP, recommendation, reinforcement learning, speech recognition, speech synthesis, and 3D reconstruction.

github

: 63

For similar jobs

human

github

: 2.0k

ailia-models

The collection of pre-trained, state-of-the-art AI models. ailia SDK is a self-contained, cross-platform, high-speed inference SDK for AI. The ailia SDK provides a consistent C++ API across Windows, Mac, Linux, iOS, Android, Jetson, and Raspberry Pi platforms. It also supports Unity (C#), Python, Rust, Flutter(Dart) and JNI for efficient AI implementation. The ailia SDK makes extensive use of the GPU through Vulkan and Metal to enable accelerated computing. # Supported models 323 models as of April 8th, 2024

github

: 2.2k