QuestCameraKit

QuestVisionKit is a collection of template and reference projects demonstrating how to use Meta Quest’s new Passthrough Camera API for advanced AR/VR vision, tracking, and shader effects.

Stars: 222

Visit

QuestCameraKit is a collection of template and reference projects demonstrating how to use Meta Quest’s new Passthrough Camera API (PCA) for advanced AR/VR vision, tracking, and shader effects. It includes samples like Color Picker, Object Detection with Unity Sentis, QR Code Tracking with ZXing, Frosted Glass Shader, OpenAI vision model, and WebRTC video streaming. The repository provides detailed instructions on how to run each sample and troubleshoot known issues. Users can explore various functionalities such as converting 3D points to 2D image pixels, detecting objects, tracking QR codes, applying custom shader effects, interacting with OpenAI's vision model, and streaming camera feed over WebRTC.

README:

QuestCameraKit is a collection of template and reference projects demonstrating how to use Meta Quest’s new Passthrough Camera API (PCA) for advanced AR/VR vision, tracking, and shader effects.

Overview
Getting Started with PCA
Running the Samples
General Troubleshooting & Known Issues
Acknowledgements & Credits
Community Contributions
News
License
Contact

Overview

1. 🎨 Color Picker

Purpose: Convert a 3D point in space to its corresponding 2D image pixel.
Description: This sample shows the mapping between 3D space and 2D image coordinates using the Passthrough Camera API. We use MRUK's EnvironmentRaycastManager to determine a 3D point in our environment and map it to the location on our WebcamTexture. We then extract the pixel on that point, to determine the color of a real world object.

2. 🍎 Object Detection with Unity Sentis

Purpose: Convert 2D screen coordinates into their corresponding 3D points in space.
Description: Use the Unity Sentis framework to infer different ML models to detect and track objects. Learn how to convert detected image coordinates (e.g. bounding boxes) back into 3D points for dynamic interaction within your scenes. In this sample you will also see how to filter labels. This means e.g. you can only detect humans and pets, to create a more safe play-area for your VR game. The sample video below is filtered to monitor, person and laptop. The sample is running at around 60 fps.

1. 🎨 Color Picker	2. 🍎 Object Detection

3. 📱 QR Code Tracking with ZXing

Purpose: Detect and track QR codes in real time. Open webviews or log-in to 3rd party services with ease.
Description: Similarly to the object detection sample, get QR code coordinated and projects them into 3D space. Detect QR codes and call their URLs. You can select between a multiple or single QR code mode. The sample is running at around 70 fps for multiple QR codes and a stable 72 fps for a single code.

4. 🪟 Shader Samples

Purpose: Apply a custom shader effect to virtual surfaces.
Description: A shader which takes our camera feed as input to manipulate the content behind it. Right now the project contains a Pixelate, Refract, Water, Zoom, and Blur effect. Frosted Glass shader is work in progress!

3. 📱 QR Code Tracking	4. 🪟 Shader Samples

5. 🧠 OpenAI vision model

Purpose: Ask OpenAI's vision model (or any other multi-modal LLM) for context of your current scene.
Description: We use a the OpenAI Speech to text API to create a coommand. We then send this command together with a screenshot to the Vision model. Lastly, we get the response back and use the Text to speech API to turn the response text into an audio file in Unity to speak the response. The user can select different speakers, models, and speed. For the command we can add additional instructions for the model, as well as select an image, image & text, or just a text mode. The whole loop takes anywhere from 2-6 seconds, depending on the internet connection.

https://github.com/user-attachments/assets/a4cfbfc2-0306-40dc-a9a3-cdccffa7afea

6. 🎥 WebRTC video streaming

Purpose: Stream the Passthrough Camera stream over WebRTC to another client using WebSockets.
Description: This sample uses SimpleWebRTC, which is a Unity-based WebRTC wrapper that facilitates peer-to-peer audio, video, and data communication over WebRTC using Unitys WebRTC package. It leverages NativeWebSocket for signaling and supports both video and audio streaming. You will need to setup your own websocket signaling server beforehand, either online or in LAN. You can find more information about the necessary steps here

6. 🎥 WebRTC video streaming

Getting Started with PCA

Information	Details
Device Requirements	- Only for Meta `Quest 3` and `3s` - `HorizonOS v74` or later
Unity WebcamTexture	- Access through Unity’s WebcamTexture - Only one camera at a time (left or right), a Unity limitation
Android Camera2 API	- Unobstructed forward-facing RGB cameras - Provides camera intrinsics (`camera ID`, `height`, `width`, `lens translation & rotation`) - Android Manifest: `horizonos.permission.HEADSET_CAMERA`
Public Experimental	Apps using PCA are not allowed to be submitted to the Meta Horizon Store yet.
Specifications	- Frame Rate: `30fps` - Image latency: `40-60ms` - Available resolutions per eye: `320x240`, `640x480`, `800x600`, `1280x960`

Prerequisites

Meta Quest Device: Ensure you are runnning on a Quest 3 or Quest 3s and your device is updated to HorizonOS v74 or later.
Unity: Recommended is Unity 6. Also runs on Unity 2022.3. LTS.
Camera Passthrough API does not work in the Editor or XR Simulator.
Get more information from the Meta Quest Developer Documentation

[!CAUTION] Every feature involving accessing the camera has significant impact on your application's performance. Be aware of this and ask yourself if the feature you are trying to implement can be done any other way besides using cameras.

Installation

Clone the Repository:

git clone https://github.com/xrdevrob/QuestCameraKit.git

Open the Project in Unity: Launch Unity and open the cloned project folder.
Configure Dependencies: Follow the instructions in the section below to run one of the samples.

Running the Samples

1. Color Picker

Open the ColorPicker scene.
Build the scene and run the APK on your headset.
Aim the ray onto a surface in your real space and press the A button or pinch your fingers to observe the cube changing it's color to the color in your real environment.

2. Object Detection with Unity Sentis

Open the ObjectDetection scene.
You will need Unity Sentis for this project to run ([email protected]).

Select the labels you would like to track. No label means all objects will be tracked.

Show all available labels

person	bicycle	car	motorbike	aeroplane	bus	train	truck
boat	traffic light	fire hydrant	stop sign	parking meter	bench	bird	cat
dog	horse	sheep	cow	elephant	bear	zebra	giraffe
backpack	umbrella	handbag	tie	suitcase	frisbee	skis	snowboard
sports ball	kite	baseball bat	baseball glove	skateboard	surfboard	tennis racket	bottle
wine glass	cup	fork	knife	spoon	bowl	banana	apple
sandwich	orange	broccoli	carrot	hot dog	pizza	donut	cake
chair	sofa	pottedplant	bed	diningtable	toilet	tvmonitor	laptop
mouse	remote	keyboard	cell phone	microwave	oven	toaster	sink
refrigerator	book	clock	vase	scissors	teddy bear	hair drier	toothbrush

Build the scene and run the APK on your headset. Look around your room and see how tracked objects receive a bounding box in accurate 3D space.

3. QR Code Tracking

Open the QRCodeTracking scene to test real-time QR code detection and tracking.
Install NuGet for Unity
Click on the NuGet menu and then on Manage NuGet Packages. Search for the ZXing.Net package from Michael Jahn and install it.
Make sure in your Player Settings under Scripting Define Symbols you see ZXING_ENABLED. The ZXingDefineSymbolChecker class should automatically detect if ZXing.Net is installed and add the symbol.
In order to see the label of your QR code, you will also need to install TextMeshPro!
Build the scene and run the APK on your headset. Look at a QR code to see the marker in 3D space and URL of the QR code.

4. Shader Samples

Open the Shader Samples scene.
Build the scene and run the APK on your headset.
Look at the spheres from different angles and observe how objects behind it are changing.

[!WARNING]
The Meta Project Setup Tool (PST) will show a warning and tell you to uncheck it, so do not fix this warning.

5. OpenAI vision model & voice commands

Open the ImageLLM scene.
Make sure to create an API key and enter it in the OpenAI Manager prefab.
Select your desired model and optionally give the LLM some instructions.
Make sure your headset is connected to the internet (the faster the better).
Build the scene and run the APK on your headset.

[!NOTE]
File uploads are currently limited to 25 MB and the following input file types are supported: mp3, mp4, mpeg, mpga, m4a, wav, and webm.

You can send commands and receive results in any of these languages:

Show all suppported languages

Afrikaans	Arabic	Armenian	Azerbaijani	Belarusian	Bosnian	Bulgarian	Catalan	Chinese
Croatian	Czech	Danish	Dutch	English	Estonian	Finnish	French	Galician
German	Greek	Hebrew	Hindi	Hungarian	Icelandic	Indonesian	Italian	Japanese
Kannada	Kazakh	Korean	Latvian	Lithuanian	Macedonian	Malay	Marathi	Maori
Nepali	Norwegian	Persian	Polish	Portuguese	Romanian	Russian	Serbian	Slovak
Slovenian	Spanish	Swahili	Swedish	Tagalog	Tamil	Thai	Turkish	Ukrainian
Urdu	Vietnamese	Welsh

6. WebRTC video streaming

Open the Package Manager, click on the + sign in the upper left/right corner.
- Select "Add package from git URL".
- Enter URL: https://github.com/endel/NativeWebSocket.git#upm and click in Install.
- After the installation finished, click on the + sign in the upper left/right corner again.
- Enter URL https://github.com/FireDragonGameStudio/SimpleWebRTC.git?path=/Assets/SimpleWebRTC#upm and click on Install
Open the WebRTC-Quest scene.
Link up your signaling server on the Client-STUNConnection component in the Web Socket Server Address field.
Build and deploy the WebRTC-Quest scene to your Quest3 device.
Open the WebRTC-SingleClient scene on your Editor.
Build and deploy the WebRTC-SingleClient scene to another device or start it from within the Unity Editor. More information can be found here
Start the WebRTC app on your Quest and on your other devices. Quest and client streaming devices should connect automatically to the websocket signaling server.
Perform the Start gesture with your left hand, or press the menu button on your left controller to start streaming from Quest3 to your WebRTC client app.

Troubleshooting:

If there are compiler errors, make sure all packages were imported correctly.
- Open the Package Manager, click on the + sign in the upper left/right corner.
- Select "Add package from git URL".
- Enter URL: https://github.com/endel/NativeWebSocket.git#upm and click in Install.
- After the installation finished, click on the + sign in the upper left/right corner again.
- Enter URL https://github.com/FireDragonGameStudio/SimpleWebRTC.git?path=/Assets/SimpleWebRTC#upm and click on Install
- Use the menu Tools/Update WebRTC Define Symbol to update the scripting define symbols if needed.
Make sure your own websocket signaling server is up and running. You can find more information about the necessary steps here.
If you're going to stream over LAN, make sure the STUN Server Address field on [BuildingBlock] Camera Rig/TrackingSpace/CenterEyeAnchor/Client-STUNConnection is empty, otherwise leave the default value.
Make sure to enable the Web Socket Connection active flag on [BuildingBlock] Camera Rig/TrackingSpace/CenterEyeAnchor/Client-STUNConnection to connect to the websocket server automatically on start.
WebRTC video streaming does NOT work, when the Graphics API is set to Vulkan. Make sure to switch to OpenGLES3 under Project Settings/Player.
Make sure to DISABLE the Low Overhead Mode (GLES) setting for Android in Project Settings/XR Plug-In Management/Oculus. Otherwise this optimization will prevent your Quest from sending the video stream to a receiving client.

[!WARNING] The Meta Project Setup Tool (PST) will show 2 warnings (opaque textures and low overhead mode GLES). Do NOT fix this warnings.

General Troubleshooting & Known Issues

Some users have reported that the app crashes the second and every following time the app is opened. A solution described was to go to the Quest settings under Privacy & Security and toggle the camera permission and then start the app and accept the permission again. If you encounter this problem please open an issue and send me the crash logs. Thank you!
If switching betwenn Unity 6 and other versions such as 2023 or 2022 it can happen that your Android Manifest is getting modified and the app won't run anymore. Should this happen to you make sure to go to Meta > Tools > Update AndroidManifest.xml or Meta > Tools > Create store-compatible AndroidManifest.xml. After that make sure you add back the horizonos.permission.HEADSET_CAMERA manually into your manifest file.

Acknowledgements & Credits

Thanks to Meta for the Passthrough Camera API and Passthrough Camera API Samples.
Thanks to shader wizard Daniel Ilett for helping me in the shader samples.
Thanks to Michael Jahn for the XZing.Net library used for the QR code tracking samples.
Thanks to Julian Triveri for constantly pushing the boundaries with what is possible with Meta Quest hardware and software.
Special thanks to Markus Altenhofer from FireDragonGameStudio for contributing the WebRTC sample scene.
Special thanks to Thomas Ratliff for contributing his shader samples to the repo.

Community Contributions

Tutorials
- XR Dev Rob - XR AI Tutorials, Watch on YouTube
- Dilmer Valecillos, Watch on YouTube
- Skarredghost, Watch on YouTube
- FireDragonGameStudio, Watch on YouTube
- xr masiso, Watch on YouTube
- Urals Technologies, Watch on YouTube
Object Detection
- Udayshankar Ravikumar: Unity Sentis Digit Recognition
- Christoph Spinger: Tracking a real ball and playing some XR football
Shaders
Environment Understanding & Mapping
Light Estimation
- pjchardt on Reddit: Prototype combining real lights and virtual objects using light estimation to affect 3d environment.
Environment Sampling
- Christoph Spinger: Chameleon color picker
- Sid Naik: Copy and paste the lighting in his house
Image to 3D
Image to Image, Diffusion & Generation
Video recording and replay
- Lucas Martinic: Rewind what you saw
OpenCV for Unity
QR Code Tracking
- Christoph Spinger: QR code object tracking

News

(Mar 21 2025) The Mysticle - One of Quests Most Exciting Updates is Now Here!
(Mar 18 2025) Road to VR - Meta Releases Quest Camera Access for Developers, Promising Even More Immersive MR Games
(Mar 17 2025) MIXED Reality News - Quest developers get new powerful API for mixed reality apps
(Mar 14 2025) UploadVR - Quest's Passthrough Camera API Is Out Now, Though Store Apps Can't Yet Use It

License

This project is licensed under the MIT License. See the LICENSE file for details. Feel free to use the samples for your own projects, though I would appreciate if you would leave some credits to this repo in your work ❤️

Contact

For questions, suggestions, or feedback, please open an issue in the repository or contact me on X, LinkedIn, or at [email protected]. Find all my info here or join our growing XR developer community on Discord.

Happy coding and enjoy exploring the possibilities with QuestCameraKit!

For Tasks:

Click tags to check more tools for each tasks

detect objects track qr codes apply shader effects stream video convert 3d points

For Jobs:

augmented reality developer virtual reality developer computer vision engineer unity developer web developer

Alternative AI tools for QuestCameraKit

Similar Open Source Tools

QuestCameraKit

github

: 222

AIWritingCompanion

AIWritingCompanion is a lightweight and versatile browser extension designed to translate text within input fields. It offers universal compatibility, multiple activation methods, and support for various translation providers like Gemini, OpenAI, and WebAI to API. Users can install it via CRX file or Git, set API key, and use it for automatic translation or via shortcut. The tool is suitable for writers, translators, students, researchers, and bloggers. AI keywords include writing assistant, translation tool, browser extension, language translation, and text translator. Users can use it for tasks like translate text, assist in writing, simplify content, check language accuracy, and enhance communication.

github

: 92

llmcord

llmcord is a Discord bot that transforms Discord into a collaborative LLM frontend, allowing users to interact with various LLM models. It features a reply-based chat system that enables branching conversations, supports remote and local LLM models, allows image and text file attachments, offers customizable personality settings, and provides streamed responses. The bot is fully asynchronous, efficient in managing message data, and offers hot reloading config. With just one Python file and around 200 lines of code, llmcord provides a seamless experience for engaging with LLMs on Discord.

github

: 507

DevDocs

DevDocs is a platform designed to simplify the process of digesting technical documentation for software engineers and developers. It automates the extraction and conversion of web content into markdown format, making it easier for users to access and understand the information. By crawling through child pages of a given URL, DevDocs provides a streamlined approach to gathering relevant data and integrating it into various tools for software development. The tool aims to save time and effort by eliminating the need for manual research and content extraction, ultimately enhancing productivity and efficiency in the development process.

github

: 469

discourse-air

Discourse-air is a clean and modern theme for forums, featuring light and dark modes, clickable topics, loading slider, search banner, and category + group boxes. Users need to enable specific settings for the theme components to render properly. It offers customization options for color schemes, search banner placement, and category organization.

github

: 82

llmcord.py

llmcord.py is a tool that allows users to chat with Language Model Models (LLMs) directly in Discord. It supports various LLM providers, both remote and locally hosted, and offers features like reply-based chat system, choosing any LLM, support for image and text file attachments, customizable system prompt, private access via DM, user identity awareness, streamed responses, warning messages, efficient message data caching, and asynchronous operation. The tool is designed to facilitate seamless conversations with LLMs and enhance user experience on Discord.

github

: 335

AirConnect-Synology

AirConnect-Synology is a minimal Synology package that allows users to use AirPlay to stream to UPnP/Sonos & Chromecast devices that do not natively support AirPlay. It is compatible with DSM 7.0 and DSM 7.1, and provides detailed information on installation, configuration, supported devices, troubleshooting, and more. The package automates the installation and usage of AirConnect on Synology devices, ensuring compatibility with various architectures and firmware versions. Users can customize the configuration using the airconnect.conf file and adjust settings for specific speakers like Sonos, Bose SoundTouch, and Pioneer/Phorus/Play-Fi.

github

: 303

rag-chatbot

The RAG ChatBot project combines Lama.cpp, Chroma, and Streamlit to build a Conversation-aware Chatbot and a Retrieval-augmented generation (RAG) ChatBot. The RAG Chatbot works by taking a collection of Markdown files as input and provides answers based on the context provided by those files. It utilizes a Memory Builder component to load Markdown pages, divide them into sections, calculate embeddings, and save them in an embedding database. The chatbot retrieves relevant sections from the database, rewrites questions for optimal retrieval, and generates answers using a local language model. It also remembers previous interactions for more accurate responses. Various strategies are implemented to deal with context overflows, including creating and refining context, hierarchical summarization, and async hierarchical summarization.

github

: 194

TaxHacker

github

: 230

RainbowGPT

RainbowGPT is a versatile tool that offers a range of functionalities, including Stock Analysis for financial decision-making, MySQL Management for database navigation, and integration of AI technologies like GPT-4 and ChatGlm3. It provides a user-friendly interface suitable for all skill levels, ensuring seamless information flow and continuous expansion of emerging technologies. The tool enhances adaptability, creativity, and insight, making it a valuable asset for various projects and tasks.

github

: 86

DiffusionToolkit

Diffusion Toolkit is an image metadata-indexer and viewer for AI-generated images. It helps you organize, search, and sort your ever-growing collection. Key features include: - Scanning images and storing prompts and other metadata (PNGInfo) - Searching for images using simple queries or filters - Viewing images and metadata easily - Tagging images with favorites, ratings, and NSFW flags - Sorting images by date created, aesthetic score, or rating - Auto-tagging NSFW images by keywords - Blurring images tagged as NSFW - Creating and managing albums - Viewing and searching prompts - Drag-and-drop functionality Diffusion Toolkit supports various image formats, including JPG/JPEG, PNG, WebP, and TXT metadata. It also supports metadata formats from popular AI image generators like AUTOMATIC1111, InvokeAI, NovelAI, Stable Diffusion, and more. You can use Diffusion Toolkit even on images without metadata and still enjoy features like rating and album management.

github

: 799

FinAnGPT-Pro

FinAnGPT-Pro is a financial data downloader and AI query system that downloads quarterly and annual financial data for stocks from EOD Historical Data, storing it in MongoDB and Google BigQuery. It includes an AI-powered natural language interface for querying financial data. Users can set up the tool by following the prerequisites and setup instructions provided in the README. The tool allows users to download financial data for all stocks in a watchlist or for a single stock, query financial data using a natural language interface, and receive responses in a structured format. Important considerations include error handling, rate limiting, data validation, BigQuery costs, MongoDB connection, and security measures for API keys and credentials.

github

: 188

tts-generation-webui

TTS Generation WebUI is a comprehensive tool that provides a user-friendly interface for text-to-speech and voice cloning tasks. It integrates various AI models such as Bark, MusicGen, AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, and MAGNeT. The tool offers one-click installers, Google Colab demo, videos for guidance, and extra voices for Bark. Users can generate audio outputs, manage models, caches, and system space for AI projects. The project is open-source and emphasizes ethical and responsible use of AI technology.

github

: 1.6k

ComputerGYM

github

: 68

playword

PlayWord is a tool designed to supercharge web test automation experience with AI. It provides core features such as enabling browser operations and validations using natural language inputs, as well as monitoring interface to record and dry-run test steps. PlayWord supports multiple AI services including Anthropic, Google, and OpenAI, allowing users to select the appropriate provider based on their requirements. The tool also offers features like assertion handling, frame handling, custom variables, test recordings, and an Observer module to track user interactions on web pages. With PlayWord, users can interact with web pages using natural language commands, reducing the need to worry about element locators and providing AI-powered adaptation to UI changes.

github

: 52

xFasterTransformer

xFasterTransformer is an optimized solution for Large Language Models (LLMs) on the X86 platform, providing high performance and scalability for inference on mainstream LLM models. It offers C++ and Python APIs for easy integration, along with example codes and benchmark scripts. Users can prepare models in a different format, convert them, and use the APIs for tasks like encoding input prompts, generating token ids, and serving inference requests. The tool supports various data types and models, and can run in single or multi-rank modes using MPI. A web demo based on Gradio is available for popular LLM models like ChatGLM and Llama2. Benchmark scripts help evaluate model inference performance quickly, and MLServer enables serving with REST and gRPC interfaces.

github

: 247

For similar tasks

QuestCameraKit

github

: 222

AiTreasureBox

AiTreasureBox is a versatile AI tool that provides a collection of pre-trained models and algorithms for various machine learning tasks. It simplifies the process of implementing AI solutions by offering ready-to-use components that can be easily integrated into projects. With AiTreasureBox, users can quickly prototype and deploy AI applications without the need for extensive knowledge in machine learning or deep learning. The tool covers a wide range of tasks such as image classification, text generation, sentiment analysis, object detection, and more. It is designed to be user-friendly and accessible to both beginners and experienced developers, making AI development more efficient and accessible to a wider audience.

github

: 368

react-native-vision-camera

VisionCamera is a powerful, high-performance Camera library for React Native. It features Photo and Video capture, QR/Barcode scanner, Customizable devices and multi-cameras ("fish-eye" zoom), Customizable resolutions and aspect-ratios (4k/8k images), Customizable FPS (30..240 FPS), Frame Processors (JS worklets to run facial recognition, AI object detection, realtime video chats, ...), Smooth zooming (Reanimated), Fast pause and resume, HDR & Night modes, Custom C++/GPU accelerated video pipeline (OpenGL).

github

: 8.2k

InternVL

InternVL scales up the ViT to _**6B parameters**_ and aligns it with LLM. It is a vision-language foundation model that can perform various tasks, including: **Visual Perception** - Linear-Probe Image Classification - Semantic Segmentation - Zero-Shot Image Classification - Multilingual Zero-Shot Image Classification - Zero-Shot Video Classification **Cross-Modal Retrieval** - English Zero-Shot Image-Text Retrieval - Chinese Zero-Shot Image-Text Retrieval - Multilingual Zero-Shot Image-Text Retrieval on XTD **Multimodal Dialogue** - Zero-Shot Image Captioning - Multimodal Benchmarks with Frozen LLM - Multimodal Benchmarks with Trainable LLM - Tiny LVLM InternVL has been shown to achieve state-of-the-art results on a variety of benchmarks. For example, on the MMMU image classification benchmark, InternVL achieves a top-1 accuracy of 51.6%, which is higher than GPT-4V and Gemini Pro. On the DocVQA question answering benchmark, InternVL achieves a score of 82.2%, which is also higher than GPT-4V and Gemini Pro. InternVL is open-sourced and available on Hugging Face. It can be used for a variety of applications, including image classification, object detection, semantic segmentation, image captioning, and question answering.

github

: 6.5k

clarifai-python

The Clarifai Python SDK offers a comprehensive set of tools to integrate Clarifai's AI platform to leverage computer vision capabilities like classification , detection ,segementation and natural language capabilities like classification , summarisation , generation , Q&A ,etc into your applications. With just a few lines of code, you can leverage cutting-edge artificial intelligence to unlock valuable insights from visual and textual content.

github

: 392

ailia-models

The collection of pre-trained, state-of-the-art AI models. ailia SDK is a self-contained, cross-platform, high-speed inference SDK for AI. The ailia SDK provides a consistent C++ API across Windows, Mac, Linux, iOS, Android, Jetson, and Raspberry Pi platforms. It also supports Unity (C#), Python, Rust, Flutter(Dart) and JNI for efficient AI implementation. The ailia SDK makes extensive use of the GPU through Vulkan and Metal to enable accelerated computing. # Supported models 323 models as of April 8th, 2024

github

: 2.2k

edenai-apis

Eden AI aims to simplify the use and deployment of AI technologies by providing a unique API that connects to all the best AI engines. With the rise of **AI as a Service** , a lot of companies provide off-the-shelf trained models that you can access directly through an API. These companies are either the tech giants (Google, Microsoft , Amazon) or other smaller, more specialized companies, and there are hundreds of them. Some of the most known are : DeepL (translation), OpenAI (text and image analysis), AssemblyAI (speech analysis). There are **hundreds of companies** doing that. We're regrouping the best ones **in one place** !

github

: 441

artificial-intelligence

This repository contains a collection of AI projects implemented in Python, primarily in Jupyter notebooks. The projects cover various aspects of artificial intelligence, including machine learning, deep learning, natural language processing, computer vision, and more. Each project is designed to showcase different AI techniques and algorithms, providing a hands-on learning experience for users interested in exploring the field of artificial intelligence.

github

: 176

For similar jobs

sdk-examples

Spectacular AI SDK fuses data from cameras and IMU sensors to output an accurate 6-degree-of-freedom pose of a device, enabling Visual-Inertial SLAM for tracking robots and vehicles, as well as Augmented, Mixed, and Virtual Reality. The SDK includes a Mapping API for real-time and offline 3D reconstruction use cases.

github

: 223

QuestCameraKit

github

: 222

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. It provides a common API to deliver inference solutions on various platforms, including CPU, GPU, NPU, and heterogeneous devices. OpenVINO™ supports pre-trained models from Open Model Zoo and popular frameworks like TensorFlow, PyTorch, and ONNX. Key components of OpenVINO™ include the OpenVINO™ Runtime, plugins for different hardware devices, frontends for reading models from native framework formats, and the OpenVINO Model Converter (OVC) for adjusting models for optimal execution on target devices.

github

: 8.1k

peft

PEFT (Parameter-Efficient Fine-Tuning) is a collection of state-of-the-art methods that enable efficient adaptation of large pretrained models to various downstream applications. By only fine-tuning a small number of extra model parameters instead of all the model's parameters, PEFT significantly decreases the computational and storage costs while achieving performance comparable to fully fine-tuned models.

github

: 18.0k

jetson-generative-ai-playground

This repo hosts tutorial documentation for running generative AI models on NVIDIA Jetson devices. The documentation is auto-generated and hosted on GitHub Pages using their CI/CD feature to automatically generate/update the HTML documentation site upon new commits.

github

: 94

emgucv

Emgu CV is a cross-platform .Net wrapper for the OpenCV image-processing library. It allows OpenCV functions to be called from .NET compatible languages. The wrapper can be compiled by Visual Studio, Unity, and "dotnet" command, and it can run on Windows, Mac OS, Linux, iOS, and Android.

github

: 2.1k

MMStar

MMStar is an elite vision-indispensable multi-modal benchmark comprising 1,500 challenge samples meticulously selected by humans. It addresses two key issues in current LLM evaluation: the unnecessary use of visual content in many samples and the existence of unintentional data leakage in LLM and LVLM training. MMStar evaluates 6 core capabilities across 18 detailed axes, ensuring a balanced distribution of samples across all dimensions.

github

: 84

QuestCameraKit

README:

Table of Contents

Overview

1. 🎨 Color Picker

2. 🍎 Object Detection with Unity Sentis

3. 📱 QR Code Tracking with ZXing

4. 🪟 Shader Samples

5. 🧠 OpenAI vision model

6. 🎥 WebRTC video streaming

Getting Started with PCA

Prerequisites

Installation

Running the Samples

1. Color Picker

2. Object Detection with Unity Sentis

3. QR Code Tracking

4. Shader Samples

5. OpenAI vision model & voice commands

6. WebRTC video streaming

General Troubleshooting & Known Issues

Acknowledgements & Credits

Community Contributions

News

License

Contact

For Tasks:

For Jobs:

Alternative AI tools for QuestCameraKit

Similar Open Source Tools

QuestCameraKit

AIWritingCompanion

llmcord

DevDocs

discourse-air

llmcord.py

AirConnect-Synology

rag-chatbot

TaxHacker

RainbowGPT

DiffusionToolkit

FinAnGPT-Pro

tts-generation-webui

ComputerGYM

playword

xFasterTransformer

For similar tasks

QuestCameraKit

AiTreasureBox

react-native-vision-camera

InternVL

clarifai-python

ailia-models

edenai-apis

artificial-intelligence

For similar jobs

sdk-examples

QuestCameraKit

spear

openvino

peft

jetson-generative-ai-playground

emgucv

MMStar