
QuestCameraKit
QuestVisionKit is a collection of template and reference projects demonstrating how to use Meta Quest’s new Passthrough Camera API for advanced AR/VR vision, tracking, and shader effects.
Stars: 169

QuestCameraKit is a collection of template and reference projects demonstrating how to use Meta Quest’s new Passthrough Camera API (PCA) for advanced AR/VR vision, tracking, and shader effects. It includes samples like Color Picker, Object Detection with Unity Sentis, QR Code Tracking with ZXing, Frosted Glass Shader, OpenAI vision model, and WebRTC video streaming. The repository provides detailed instructions on how to run each sample and troubleshoot known issues. Users can explore various functionalities such as converting 3D points to 2D image pixels, detecting objects, tracking QR codes, applying custom shader effects, interacting with OpenAI's vision model, and streaming camera feed over WebRTC.
README:
QuestCameraKit is a collection of template and reference projects demonstrating how to use Meta Quest’s new Passthrough Camera API (PCA)
for advanced AR/VR vision, tracking, and shader effects.
- Overview
- Getting Started with PCA
- Running the Samples
- General Troubleshooting & Known Issues
- Acknowledgements & Credits
- Community Contributions
- News
- License
- Contact
- Purpose: Convert a 3D point in space to its corresponding 2D image pixel.
- Description: This sample shows the mapping between 3D space and 2D image coordinates using the Passthrough Camera API. We use MRUK's EnvironmentRaycastManager to determine a 3D point in our environment and map it to the location on our WebcamTexture. We then extract the pixel on that point, to determine the color of a real world object.
- Purpose: Convert 2D screen coordinates into their corresponding 3D points in space.
-
Description: Use the Unity Sentis framework to infer different ML models to detect and track objects. Learn how to convert detected image coordinates (e.g. bounding boxes) back into 3D points for dynamic interaction within your scenes. In this sample you will also see how to filter labels. This means e.g. you can only detect humans and pets, to create a more safe play-area for your VR game. The sample video below is filtered to monitor, person and laptop. The sample is running at around
60 fps
.
1. 🎨 Color Picker | 2. 🍎 Object Detection |
---|---|
![]() |
![]() |
- Purpose: Detect and track QR codes in real time. Open webviews or log-in to 3rd party services with ease.
-
Description: Similarly to the object detection sample, get QR code coordinated and projects them into 3D space. Detect QR codes and call their URLs. You can select between a multiple or single QR code mode. The sample is running at around
70 fps
for multiple QR codes and a stable72 fps
for a single code.
- Purpose: Apply a custom frosted glass shader effect to virtual surfaces.
- Description: A shader which takes our camera feed as input to blur the content behind it.
-
Todo
: We have a shader that correctly maps the camera texture onto a quad, and we have one vertical blur shader and one horizontal blur shader. Ideally we would combine all of these into one shader effect to be able to easily apply it to meshes or UI elements.
3. 📱 QR Code Tracking | 4. 🪟 Frosted Glass |
---|---|
![]() |
![]() |
- Purpose: Ask OpenAI's vision model (or any other multi-modal LLM) for context of your current scene.
-
Description: We use a the OpenAI Speech to text API to create a coommand. We then send this command together with a screenshot to the Vision model. Lastly, we get the response back and use the Text to speech API to turn the response text into an audio file in Unity to speak the response. The user can select different speakers, models, and speed. For the command we can add additional instructions for the model, as well as select an image, image & text, or just a text mode. The whole loop takes anywhere from
2-6 seconds
, depending on the internet connection.
https://github.com/user-attachments/assets/a4cfbfc2-0306-40dc-a9a3-cdccffa7afea
- Purpose: Stream the Passthrough Camera stream over WebRTC to another client using WebSockets.
- Description: This sample uses SimpleWebRTC, which is a Unity-based WebRTC wrapper that facilitates peer-to-peer audio, video, and data communication over WebRTC using Unitys WebRTC package. It leverages NativeWebSocket for signaling and supports both video and audio streaming. You will need to setup your own websocket signaling server beforehand, either online or in LAN. You can find more information about the necessary steps here
6. 🎥 WebRTC video streaming |
---|
![]() |
Information | Details |
---|---|
Device Requirements | - Only for Meta Quest 3 and 3s - HorizonOS v74 or later |
Unity WebcamTexture | - Access through Unity’s WebcamTexture - Only one camera at a time (left or right), a Unity limitation |
Android Camera2 API | - Unobstructed forward-facing RGB cameras - Provides camera intrinsics ( camera ID , height , width , lens translation & rotation )- Android Manifest: horizonos.permission.HEADSET_CAMERA
|
Public Experimental | Apps using PCA are not allowed to be submitted to the Meta Horizon Store yet. |
Specifications | - Frame Rate: 30fps - Image latency: 40-60ms - Available resolutions per eye: 320x240 , 640x480 , 800x600 , 1280x960
|
-
Meta Quest Device: Ensure you are runnning on a
Quest 3
orQuest 3s
and your device is updated toHorizonOS v74
or later. -
Unity: Recommended is
Unity 6
. Also runs on Unity2022.3. LTS
. - Camera Passthrough API does not work in the Editor or XR Simulator.
- Get more information from the Meta Quest Developer Documentation
[!CAUTION] Every feature involving accessing the camera has significant impact on your application's performance. Be aware of this and ask yourself if the feature you are trying to implement can be done any other way besides using cameras.
-
Clone the Repository:
git clone https://github.com/xrdevrob/QuestCameraKit.git
-
Open the Project in Unity: Launch Unity and open the cloned project folder.
-
Configure Dependencies: Follow the instructions in the section below to run one of the samples.
1. Color Picker
- Open the
ColorPicker
scene. - Build the scene and run the APK on your headset.
- Aim the ray onto a surface in your real space and press the A button or pinch your fingers to observe the cube changing it's color to the color in your real environment.
- Open the
ObjectDetection
scene. - You will need Unity Sentis for this project to run ([email protected]).
- Select the labels you would like to track. No label means all objects will be tracked.
Show all available labels
person bicycle car motorbike aeroplane bus train truck boat traffic light fire hydrant stop sign parking meter bench bird cat dog horse sheep cow elephant bear zebra giraffe backpack umbrella handbag tie suitcase frisbee skis snowboard sports ball kite baseball bat baseball glove skateboard surfboard tennis racket bottle wine glass cup fork knife spoon bowl banana apple sandwich orange broccoli carrot hot dog pizza donut cake chair sofa pottedplant bed diningtable toilet tvmonitor laptop mouse remote keyboard cell phone microwave oven toaster sink refrigerator book clock vase scissors teddy bear hair drier toothbrush
- Build the scene and run the APK on your headset. Look around your room and see how tracked objects receive a bounding box in accurate 3D space.
- Open the
QRCodeTracking
scene to test real-time QR code detection and tracking. - Install NuGet for Unity
- Click on the
NuGet
menu and then onManage NuGet Packages
. Search for the ZXing.Net package from Michael Jahn and install it. - Make sure in your
Player Settings
underScripting Define Symbols
you seeZXING_ENABLED
. The ZXingDefineSymbolChecker class should automatically detect ifZXing.Net
is installed and add the symbol. - In order to see the label of your QR code, you will also need to install TextMeshPro!
- Build the scene and run the APK on your headset. Look at a QR code to see the marker in 3D space and URL of the QR code.
- Open the
FrostedGlass
scene. - Make sure in your render asset the
Opaque Texture
check-box is checked. - Build the scene and run the APK on your headset.
- Look at the panel from different angles and observe how objects behind it are blurred.
[!WARNING]
The Meta Project Setup Tool (PST) will show a warning and tell you to uncheck it, so do not fix this warning.
- Open the
ImageLLM
scene. - Make sure to create an API key and enter it in the
OpenAI Manager prefab
. - Select your desired model and optionally give the LLM some instructions.
- Make sure your headset is connected to the internet (the faster the better).
- Build the scene and run the APK on your headset.
[!NOTE]
File uploads are currently limited to25 MB
and the following input file types are supported:mp3
,mp4
,mpeg
,mpga
,m4a
,wav
, andwebm
.
You can send commands and receive results in any of these languages:
Show all suppported languages
Afrikaans | Arabic | Armenian | Azerbaijani | Belarusian | Bosnian | Bulgarian | Catalan | Chinese |
Croatian | Czech | Danish | Dutch | English | Estonian | Finnish | French | Galician |
German | Greek | Hebrew | Hindi | Hungarian | Icelandic | Indonesian | Italian | Japanese |
Kannada | Kazakh | Korean | Latvian | Lithuanian | Macedonian | Malay | Marathi | Maori |
Nepali | Norwegian | Persian | Polish | Portuguese | Romanian | Russian | Serbian | Slovak |
Slovenian | Spanish | Swahili | Swedish | Tagalog | Tamil | Thai | Turkish | Ukrainian |
Urdu | Vietnamese | Welsh |
- Open the
Package Manager
, click on the + sign in the upper left/right corner.- Select "Add package from git URL".
- Enter URL: https://github.com/endel/NativeWebSocket.git#upm and click in Install.
- After the installation finished, click on the + sign in the upper left/right corner again.
- Enter URL https://github.com/FireDragonGameStudio/SimpleWebRTC.git?path=/Assets/SimpleWebRTC#upm and click on Install
- Open the
WebcamToWebRTC
scene. - Link up your signaling server on the
Client-STUNConnection
component in theWeb Socket Server Address
field. - Build and deploy the
WebRTC-Quest
scene to your Quest3 device. - Open the
WebRTC-SingleClient
scene on your Editor. - Build and deploy the
WebRTC-SingleClient
scene to another device or start it from within the Unity Editor. More information can be found here - Start the WebRTC app on your Quest and on your other devices. Quest and client streaming devices should connect automatically to the websocket signaling server.
- Perform the Start gesture with your left hand, or press the menu button on your left controller to start streaming from Quest3 to your WebRTC client app.
Troubleshooting:
- If there are compiler errors, make sure all packages were imported correctly.
- Open the
Package Manager
, click on the + sign in the upper left/right corner. - Select "Add package from git URL".
- Enter URL: https://github.com/endel/NativeWebSocket.git#upm and click in Install.
- After the installation finished, click on the + sign in the upper left/right corner again.
- Enter URL https://github.com/FireDragonGameStudio/SimpleWebRTC.git?path=/Assets/SimpleWebRTC#upm and click on Install
- Use the menu
Tools/Update WebRTC Define Symbol
to update the scripting define symbols if needed.
- Open the
- Make sure your own websocket signaling server is up and running. You can find more information about the necessary steps here.
- If you're going to stream over LAN, make sure the
STUN Server Address
field on[BuildingBlock] Camera Rig/TrackingSpace/CenterEyeAnchor/Client-STUNConnection
is empty, otherwise leave the default value. - Make sure to enable the
Web Socket Connection active
flag on[BuildingBlock] Camera Rig/TrackingSpace/CenterEyeAnchor/Client-STUNConnection
to connect to the websocket server automatically on start. - WebRTC video streaming does NOT work, when the Graphics API is set to Vulkan. Make sure to switch to OpenGLES3 under
Project Settings/Player
. - Make sure to DISABLE the Low Overhead Mode (GLES) setting for Android in
Project Settings/XR Plug-In Management/Oculus
. Otherwise this optimization will prevent your Quest from sending the video stream to a receiving client.
[!WARNING] The Meta Project Setup Tool (PST) will show 2 warnings (opaque textures and low overhead mode GLES). Do NOT fix this warnings.
- Some users have reported that the app crashes the second and every following time the app is opened. A solution described was to go to the Quest settings under
Privacy & Security
and toggle the camera permission and then start the app and accept the permission again. If you encounter this problem please open an issue and send me the crash logs. Thank you! - If switching betwenn Unity 6 and other versions such as 2023 or 2022 it can happen that your Android Manifest is getting modified and the app won't run anymore. Should this happen to you make sure to go to
Meta > Tools > Update AndroidManifest.xml
orMeta > Tools > Create store-compatible AndroidManifest.xml
. After that make sure you add back thehorizonos.permission.HEADSET_CAMERA
manually into your manifest file.
- Thanks to Meta for the Passthrough Camera API and Passthrough Camera API Samples.
- Thanks to shader wizard Daniel Ilett for helping me set up the
FrostedGlass
sample. - Thanks to Michael Jahn for the XZing.Net library used for the QR code tracking samples.
- Thanks to Julian Triveri for constantly pushing the boundaries with what is possible with Meta Quest hardware and software.
- Special thanks to Markus Altenhofer from FireDragonGameStudio for contributing the WebRTC sample scene.
-
Tutorials
- XR Dev Rob - XR AI Tutorials, Watch on YouTube
- Dilmer Valecillos, Watch on YouTube
- Skarredghost, Watch on YouTube
- FireDragonGameStudio, Watch on YouTube
- xr masiso, Watch on YouTube
- Urals Technologies, Watch on YouTube
-
Object Detection
-
Shaders
-
Environment Understanding & Mapping
-
Environment Sampling
-
Image to 3D
-
Image to Image, Diffusion & Generation
-
Video recording and replay
-
OpenCV for Unity
-
QR Code Tracking
- (Mar 21 2025) The Mysticle - One of Quests Most Exciting Updates is Now Here!
- (Mar 18 2025) Road to VR - Meta Releases Quest Camera Access for Developers, Promising Even More Immersive MR Games
- (Mar 17 2025) MIXED Reality News - Quest developers get new powerful API for mixed reality apps
- (Mar 14 2025) UploadVR - Quest's Passthrough Camera API Is Out Now, Though Store Apps Can't Yet Use It
This project is licensed under the MIT License. See the LICENSE file for details. Feel free to use the samples for your own projects, though I would appreciate if you would leave some credits to this repo in your work ❤️
For questions, suggestions, or feedback, please open an issue in the repository or contact me on X, LinkedIn, or at [email protected]. Find all my info here or join our growing XR developer community on Discord.
Happy coding and enjoy exploring the possibilities with QuestCameraKit!
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for QuestCameraKit
Similar Open Source Tools

QuestCameraKit
QuestCameraKit is a collection of template and reference projects demonstrating how to use Meta Quest’s new Passthrough Camera API (PCA) for advanced AR/VR vision, tracking, and shader effects. It includes samples like Color Picker, Object Detection with Unity Sentis, QR Code Tracking with ZXing, Frosted Glass Shader, OpenAI vision model, and WebRTC video streaming. The repository provides detailed instructions on how to run each sample and troubleshoot known issues. Users can explore various functionalities such as converting 3D points to 2D image pixels, detecting objects, tracking QR codes, applying custom shader effects, interacting with OpenAI's vision model, and streaming camera feed over WebRTC.

workbench-example-hybrid-rag
This NVIDIA AI Workbench project is designed for developing a Retrieval Augmented Generation application with a customizable Gradio Chat app. It allows users to embed documents into a locally running vector database and run inference locally on a Hugging Face TGI server, in the cloud using NVIDIA inference endpoints, or using microservices via NVIDIA Inference Microservices (NIMs). The project supports various models with different quantization options and provides tutorials for using different inference modes. Users can troubleshoot issues, customize the Gradio app, and access advanced tutorials for specific tasks.

llmcord
llmcord is a Discord bot that transforms Discord into a collaborative LLM frontend, allowing users to interact with various LLM models. It features a reply-based chat system that enables branching conversations, supports remote and local LLM models, allows image and text file attachments, offers customizable personality settings, and provides streamed responses. The bot is fully asynchronous, efficient in managing message data, and offers hot reloading config. With just one Python file and around 200 lines of code, llmcord provides a seamless experience for engaging with LLMs on Discord.

discourse-air
Discourse-air is a clean and modern theme for forums, featuring light and dark modes, clickable topics, loading slider, search banner, and category + group boxes. Users need to enable specific settings for the theme components to render properly. It offers customization options for color schemes, search banner placement, and category organization.

spaCy
spaCy is an industrial-strength Natural Language Processing (NLP) library in Python and Cython. It incorporates the latest research and is designed for real-world applications. The library offers pretrained pipelines supporting 70+ languages, with advanced neural network models for tasks such as tagging, parsing, named entity recognition, and text classification. It also facilitates multi-task learning with pretrained transformers like BERT, along with a production-ready training system and streamlined model packaging, deployment, and workflow management. spaCy is commercial open-source software released under the MIT license.

AmigaGPT
AmigaGPT is a versatile ChatGPT client for AmigaOS 3.x, 4.1, and MorphOS. It brings the capabilities of OpenAI’s GPT to Amiga systems, enabling text generation, question answering, and creative exploration. AmigaGPT can generate images using DALL-E, supports speech output, and seamlessly integrates with AmigaOS. Users can customize the UI, choose fonts and colors, and enjoy a native user experience. The tool requires specific system requirements and offers features like state-of-the-art language models, AI image generation, speech capability, and UI customization.

llmcord.py
llmcord.py is a tool that allows users to chat with Language Model Models (LLMs) directly in Discord. It supports various LLM providers, both remote and locally hosted, and offers features like reply-based chat system, choosing any LLM, support for image and text file attachments, customizable system prompt, private access via DM, user identity awareness, streamed responses, warning messages, efficient message data caching, and asynchronous operation. The tool is designed to facilitate seamless conversations with LLMs and enhance user experience on Discord.

AirConnect-Synology
AirConnect-Synology is a minimal Synology package that allows users to use AirPlay to stream to UPnP/Sonos & Chromecast devices that do not natively support AirPlay. It is compatible with DSM 7.0 and DSM 7.1, and provides detailed information on installation, configuration, supported devices, troubleshooting, and more. The package automates the installation and usage of AirConnect on Synology devices, ensuring compatibility with various architectures and firmware versions. Users can customize the configuration using the airconnect.conf file and adjust settings for specific speakers like Sonos, Bose SoundTouch, and Pioneer/Phorus/Play-Fi.

swark
Swark is a VS Code extension that automatically generates architecture diagrams from code using large language models (LLMs). It is directly integrated with GitHub Copilot, requires no authentication or API key, and supports all languages. Swark helps users learn new codebases, review AI-generated code, improve documentation, understand legacy code, spot design flaws, and gain test coverage insights. It saves output in a 'swark-output' folder with diagram and log files. Source code is only shared with GitHub Copilot for privacy. The extension settings allow customization for file reading, file extensions, exclusion patterns, and language model selection. Swark is open source under the GNU Affero General Public License v3.0.

open-webui
Open WebUI is an extensible, feature-rich, and user-friendly self-hosted WebUI designed to operate entirely offline. It supports various LLM runners, including Ollama and OpenAI-compatible APIs. For more information, be sure to check out our Open WebUI Documentation.

llm-twin-course
The LLM Twin Course is a free, end-to-end framework for building production-ready LLM systems. It teaches you how to design, train, and deploy a production-ready LLM twin of yourself powered by LLMs, vector DBs, and LLMOps good practices. The course is split into 11 hands-on written lessons and the open-source code you can access on GitHub. You can read everything and try out the code at your own pace.

easydiffusion
Easy Diffusion 3.0 is a user-friendly tool for installing and using Stable Diffusion on your computer. It offers hassle-free installation, clutter-free UI, task queue, intelligent model detection, live preview, image modifiers, multiple prompts file, saving generated images, UI themes, searchable models dropdown, and supports various image generation tasks like 'Text to Image', 'Image to Image', and 'InPainting'. The tool also provides advanced features such as custom models, merge models, custom VAE models, multi-GPU support, auto-updater, developer console, and more. It is designed for both new users and advanced users looking for powerful AI image generation capabilities.

Director
Director is a framework to build video agents that can reason through complex video tasks like search, editing, compilation, generation, etc. It enables users to summarize videos, search for specific moments, create clips instantly, integrate GenAI projects and APIs, add overlays, generate thumbnails, and more. Built on VideoDB's 'video-as-data' infrastructure, Director is perfect for developers, creators, and teams looking to simplify media workflows and unlock new possibilities.

SurfSense
SurfSense is a tool designed to help users save and organize content from the internet into a personal Knowledge Graph. It allows users to capture web browsing sessions and webpage content using a Chrome extension, enabling easy retrieval and recall of saved information. SurfSense offers features like powerful search capabilities, natural language interaction with saved content, self-hosting options, and integration with GraphRAG for meaningful content relations. The tool eliminates the need for web scraping by directly reading data from the DOM, making it a convenient solution for managing online information.

FinAnGPT-Pro
FinAnGPT-Pro is a financial data downloader and AI query system that downloads quarterly and annual financial data for stocks from EOD Historical Data, storing it in MongoDB and Google BigQuery. It includes an AI-powered natural language interface for querying financial data. Users can set up the tool by following the prerequisites and setup instructions provided in the README. The tool allows users to download financial data for all stocks in a watchlist or for a single stock, query financial data using a natural language interface, and receive responses in a structured format. Important considerations include error handling, rate limiting, data validation, BigQuery costs, MongoDB connection, and security measures for API keys and credentials.

Local-File-Organizer
The Local File Organizer is an AI-powered tool designed to help users organize their digital files efficiently and securely on their local device. By leveraging advanced AI models for text and visual content analysis, the tool automatically scans and categorizes files, generates relevant descriptions and filenames, and organizes them into a new directory structure. All AI processing occurs locally using the Nexa SDK, ensuring privacy and security. With support for multiple file types and customizable prompts, this tool aims to simplify file management and bring order to users' digital lives.
For similar tasks

QuestCameraKit
QuestCameraKit is a collection of template and reference projects demonstrating how to use Meta Quest’s new Passthrough Camera API (PCA) for advanced AR/VR vision, tracking, and shader effects. It includes samples like Color Picker, Object Detection with Unity Sentis, QR Code Tracking with ZXing, Frosted Glass Shader, OpenAI vision model, and WebRTC video streaming. The repository provides detailed instructions on how to run each sample and troubleshoot known issues. Users can explore various functionalities such as converting 3D points to 2D image pixels, detecting objects, tracking QR codes, applying custom shader effects, interacting with OpenAI's vision model, and streaming camera feed over WebRTC.

AiTreasureBox
AiTreasureBox is a versatile AI tool that provides a collection of pre-trained models and algorithms for various machine learning tasks. It simplifies the process of implementing AI solutions by offering ready-to-use components that can be easily integrated into projects. With AiTreasureBox, users can quickly prototype and deploy AI applications without the need for extensive knowledge in machine learning or deep learning. The tool covers a wide range of tasks such as image classification, text generation, sentiment analysis, object detection, and more. It is designed to be user-friendly and accessible to both beginners and experienced developers, making AI development more efficient and accessible to a wider audience.

react-native-vision-camera
VisionCamera is a powerful, high-performance Camera library for React Native. It features Photo and Video capture, QR/Barcode scanner, Customizable devices and multi-cameras ("fish-eye" zoom), Customizable resolutions and aspect-ratios (4k/8k images), Customizable FPS (30..240 FPS), Frame Processors (JS worklets to run facial recognition, AI object detection, realtime video chats, ...), Smooth zooming (Reanimated), Fast pause and resume, HDR & Night modes, Custom C++/GPU accelerated video pipeline (OpenGL).

InternVL
InternVL scales up the ViT to _**6B parameters**_ and aligns it with LLM. It is a vision-language foundation model that can perform various tasks, including: **Visual Perception** - Linear-Probe Image Classification - Semantic Segmentation - Zero-Shot Image Classification - Multilingual Zero-Shot Image Classification - Zero-Shot Video Classification **Cross-Modal Retrieval** - English Zero-Shot Image-Text Retrieval - Chinese Zero-Shot Image-Text Retrieval - Multilingual Zero-Shot Image-Text Retrieval on XTD **Multimodal Dialogue** - Zero-Shot Image Captioning - Multimodal Benchmarks with Frozen LLM - Multimodal Benchmarks with Trainable LLM - Tiny LVLM InternVL has been shown to achieve state-of-the-art results on a variety of benchmarks. For example, on the MMMU image classification benchmark, InternVL achieves a top-1 accuracy of 51.6%, which is higher than GPT-4V and Gemini Pro. On the DocVQA question answering benchmark, InternVL achieves a score of 82.2%, which is also higher than GPT-4V and Gemini Pro. InternVL is open-sourced and available on Hugging Face. It can be used for a variety of applications, including image classification, object detection, semantic segmentation, image captioning, and question answering.

clarifai-python
The Clarifai Python SDK offers a comprehensive set of tools to integrate Clarifai's AI platform to leverage computer vision capabilities like classification , detection ,segementation and natural language capabilities like classification , summarisation , generation , Q&A ,etc into your applications. With just a few lines of code, you can leverage cutting-edge artificial intelligence to unlock valuable insights from visual and textual content.

ailia-models
The collection of pre-trained, state-of-the-art AI models. ailia SDK is a self-contained, cross-platform, high-speed inference SDK for AI. The ailia SDK provides a consistent C++ API across Windows, Mac, Linux, iOS, Android, Jetson, and Raspberry Pi platforms. It also supports Unity (C#), Python, Rust, Flutter(Dart) and JNI for efficient AI implementation. The ailia SDK makes extensive use of the GPU through Vulkan and Metal to enable accelerated computing. # Supported models 323 models as of April 8th, 2024

edenai-apis
Eden AI aims to simplify the use and deployment of AI technologies by providing a unique API that connects to all the best AI engines. With the rise of **AI as a Service** , a lot of companies provide off-the-shelf trained models that you can access directly through an API. These companies are either the tech giants (Google, Microsoft , Amazon) or other smaller, more specialized companies, and there are hundreds of them. Some of the most known are : DeepL (translation), OpenAI (text and image analysis), AssemblyAI (speech analysis). There are **hundreds of companies** doing that. We're regrouping the best ones **in one place** !

artificial-intelligence
This repository contains a collection of AI projects implemented in Python, primarily in Jupyter notebooks. The projects cover various aspects of artificial intelligence, including machine learning, deep learning, natural language processing, computer vision, and more. Each project is designed to showcase different AI techniques and algorithms, providing a hands-on learning experience for users interested in exploring the field of artificial intelligence.
For similar jobs

sdk-examples
Spectacular AI SDK fuses data from cameras and IMU sensors to output an accurate 6-degree-of-freedom pose of a device, enabling Visual-Inertial SLAM for tracking robots and vehicles, as well as Augmented, Mixed, and Virtual Reality. The SDK includes a Mapping API for real-time and offline 3D reconstruction use cases.

QuestCameraKit
QuestCameraKit is a collection of template and reference projects demonstrating how to use Meta Quest’s new Passthrough Camera API (PCA) for advanced AR/VR vision, tracking, and shader effects. It includes samples like Color Picker, Object Detection with Unity Sentis, QR Code Tracking with ZXing, Frosted Glass Shader, OpenAI vision model, and WebRTC video streaming. The repository provides detailed instructions on how to run each sample and troubleshoot known issues. Users can explore various functionalities such as converting 3D points to 2D image pixels, detecting objects, tracking QR codes, applying custom shader effects, interacting with OpenAI's vision model, and streaming camera feed over WebRTC.

spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

openvino
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. It provides a common API to deliver inference solutions on various platforms, including CPU, GPU, NPU, and heterogeneous devices. OpenVINO™ supports pre-trained models from Open Model Zoo and popular frameworks like TensorFlow, PyTorch, and ONNX. Key components of OpenVINO™ include the OpenVINO™ Runtime, plugins for different hardware devices, frontends for reading models from native framework formats, and the OpenVINO Model Converter (OVC) for adjusting models for optimal execution on target devices.

peft
PEFT (Parameter-Efficient Fine-Tuning) is a collection of state-of-the-art methods that enable efficient adaptation of large pretrained models to various downstream applications. By only fine-tuning a small number of extra model parameters instead of all the model's parameters, PEFT significantly decreases the computational and storage costs while achieving performance comparable to fully fine-tuned models.

jetson-generative-ai-playground
This repo hosts tutorial documentation for running generative AI models on NVIDIA Jetson devices. The documentation is auto-generated and hosted on GitHub Pages using their CI/CD feature to automatically generate/update the HTML documentation site upon new commits.

emgucv
Emgu CV is a cross-platform .Net wrapper for the OpenCV image-processing library. It allows OpenCV functions to be called from .NET compatible languages. The wrapper can be compiled by Visual Studio, Unity, and "dotnet" command, and it can run on Windows, Mac OS, Linux, iOS, and Android.

MMStar
MMStar is an elite vision-indispensable multi-modal benchmark comprising 1,500 challenge samples meticulously selected by humans. It addresses two key issues in current LLM evaluation: the unnecessary use of visual content in many samples and the existence of unintentional data leakage in LLM and LVLM training. MMStar evaluates 6 core capabilities across 18 detailed axes, ensuring a balanced distribution of samples across all dimensions.