VisioFirm

VisioFirm: Cross-Platform AI-assisted Annotation Tool for Computer Vision

Stars: 298

Visit

VisioFirm is an open-source, AI-powered image annotation tool designed to accelerate labeling for computer vision tasks like classification, object detection, oriented bounding boxes (OBB), segmentation and video annotation. Built for speed and simplicity, it leverages state-of-the-art models for semi-automated pre-annotations, allowing you to focus on refining rather than starting from scratch. Whether you're preparing datasets for YOLO, SAM, or custom models, VisioFirm streamlines your workflow with an intuitive web interface and powerful backend. Perfect for researchers, data scientists, and ML engineers handling large image datasets—get high-quality annotations in minutes, not hours!

README:

VisioFirm: Fast Almost fully-Automated Image Annotation for Computer Vision

[!IMPORTANT] VisioFirm v1 is now available. VisioFirm has now much more support for computer vision annotation, pushing further the boundaries of efficient, fast, and accurate annotation. Here's what’s New in v1.0.1 ✨

Classification and Preannotation: Predict and pre-suggest image classes using OpenAI CLIP pretrained model, enabling near-automatic labeling.

Video Support & Label Propagation: New VFTracker auto-labeling with frame-to-frame propagation: choose between: (1) SmartPropagator – Leverages SAM2 + pre/post processing for accurate, cumulative tracking. Annotate the first frame, propagate across the sequence. (2) OpenCV Trackers – Full support (CSRT, KCF, Boosting, MIL, TLD, MedianFlow, MOSSE, GOTURN) and (3) Interpolation – Classic propagation between [labeled_start] and [labeled_end].

Ultralytics Model Support: Works with YOLOv12 → YOLOv5, including YOLOv8-world for open-vocab pre-annotation.

Cross-domain annotation: use detection models to pre-generate segmentation masks, or segmentation models to pre-label bounding boxes.

Memory Management Improvements: Optimized GPU usage with better model load/unload behavior for large-scale pre-annotation and tracking.

Backend Migration to FastAPI: Faster performance, async support, and smoother UI interactions.

Python API: Integrate VisioFirm seamlessly into pipelines with the new visiofirm Python API.

VisioFirm is an open-source, AI-powered image annotation tool designed to accelerate labeling for computer vision tasks like classification, object detection, oriented bounding boxes (OBB), segmentation and video annotation. Built for speed and simplicity, it leverages state-of-the-art models for semi-automated pre-annotations, allowing you to focus on refining rather than starting from scratch. Whether you're preparing datasets for YOLO, SAM, or custom models, VisioFirm streamlines your workflow with a intuitive web interface and powerful backend.

Perfect for researchers, data scientists, and ML engineers handling large image datasets—get high-quality annotations in minutes, not hours!

Why VisioFirm?

VisioFirm is majoraly focused on AI-model integration easiness for fast CV tasks annotation.

AI-Driven Pre-Annotation: Automatically detect and segment objects using YOLO, SAM2, and Grounding DINO—saving up to 80% of manual effort.
Multi-Task Support: Handles classification, bounding boxes, oriented bounding boxes, and polygon segmentation and now even videos in one tool.
Browser-Based Editing: Interactive canvas for precise adjustments, with real-time SAM-powered segmentation in the browser.
Offline-Friendly: Models download automatically (or pre-fetch for offline use), with SQLite backend for local projects.
Extensible & Open-Source: Customize with your own ultralytics models or integrate into pipelines—contributions welcome!
SAM2-base webgpu: Insta-drawing of annotations via SAM2 with worker offloading and auto-annotation for faster computing!

Features

Semi-Automated Labeling Kickstart annotations with AI models like YOLO (v5–v12) for detection, SAM2 for segmentation, Grounding DINO for zero-shot object grounding, and CLIP for automated classification.
Flexible Annotation Types
- Axis-aligned bounding boxes for standard detection.
- Oriented bounding boxes for rotated objects (e.g., aerial imagery).
- Polygon segmentation for precise boundaries.
- Image classification with automatic label suggestions.
Video Annotation & Label Propagation Annotate videos with frame-to-frame consistency:
- SmartPropagator (SAM2-powered accurate propagation).
- OpenCV trackers (CSRT, KCF, Boosting, MIL, TLD, MedianFlow, MOSSE, GOTURN).
- Interpolation between annotated start/end frames.
Cross-Domain Annotation
- Use detection models to auto-generate segmentation masks.
- Use segmentation models to pre-label bounding boxes.
Ultralytics Model Support Full support for YOLOv12, v11, v10, v9, v8, v5, plus YOLOv8-world for open-vocab pre-annotations (no GPU required).
Interactive Frontend Draw, edit, and refine labels on a responsive canvas.
- Click-to-segment with browser-based SAM2.
- Hotkeys, undo/redo, and zoom for efficient annotation.
Project Management Organize datasets with SQLite-backed projects.
- Multi-class support.
- Import/export with minimal setup.
Export Formats Export annotations to YOLO, COCO, or custom formats for seamless training.
Performance Optimizations
- GPU memory management for efficient model loading/unloading.
- Cluster overlapping detections, simplify contours, and filter by confidence.
- Multi-threaded uploading and optimized image import.
Cloud/SSH Integration Download images from cloud storage or SSH servers, save annotations remotely, and manage large-scale projects.
Backend Migration to FastAPI Faster response times, async support, and smoother UI performance.
VisioFirm Python API Integrate annotation workflows into custom scripts and ML pipelines.

DEMOs

Detection based on pre-trained/zeroshot models:

Video Segmentation using Smart Propagator:

Installation

VisioFirm was tested with Python 3.10+.

[!NOTE] VisioFirm v1 introduces a new database management logic.
To avoid conflicts with older versions, you need to rename/remove the old cache folder before running the new release:

Linux: ~/.cache/visiofirm_cache

macOS: ~/Library/Caches/visiofirm_cache

Windows: %LOCALAPPDATA%\visiofirm_cache

After deleting the folder, restart VisioFirm — it will automatically recreate the cache directory with the new structure.

pip install -U visiofirm

For development or editable install (from a cloned repo):

git clone https://github.com/OschAI/VisioFirm.git
cd VisioFirm
pip install -e .

Quick Start

Launch VisioFirm with a single command—it auto-starts a local web server and opens in your browser.

visiofirm

Create a new project and upload images.
Define classes (e.g., "car", "person").
For easy-to-detect object run AI pre-annotation (select model: YOLO, Grounding DINO).
Refine labels in the interactive editor.
Export your annotated dataset.

The VisioFirm app uses cache directories to store settings locally.

Usage

Pre-Annotation with AI

VisioFirm uses advanced models for initial labels:

YOLO: All ultralytics based YOLO model are now compatible and can be used.
SAM2: Precise segmentation use in image annotation and video propagation
Grounding DINO: Zero-shot detection via text prompts.

Community & Support

Issues: Report bugs or request features here.
Discord: Coming soon—star the repo for updates!
Roadmap: Multi-user support, custom model integration.

License

Apache 2.0 - See LICENSE for details.

This project uses third-party software and models:

Ultralytics YOLO
https://github.com/ultralytics/ultralytics
License: AGPL-3.0
SAM2 (Segment Anything Model v2)
https://github.com/facebookresearch/sam2
Licenses: Apache 2.0 and BSD 3-Clause
GroundingDINO
https://github.com/IDEA-Research/GroundingDINO
License: Apache 2.0

Built by Safouane El Ghazouali for the research community. Star the repo if it helps your workflow! 🚀

Citation

@misc{ghazouali2025visiofirm,
    title={VisioFirm: Cross-Platform AI-assisted Annotation Tool for Computer Vision},
    author={Safouane El Ghazouali and Umberto Michelucci},
    year={2025},
    eprint={2509.04180},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

SOON:

Documentation website
Discord community

For Tasks:

Click tags to check more tools for each tasks

annotate images detect objects segment images label videos classify images

For Jobs:

data scientist machine learning engineer computer vision engineer research assistant image processing specialist

Alternative AI tools for VisioFirm

Similar Open Source Tools

VisioFirm

github

: 298

transformerlab-app

Transformer Lab is an app that allows users to experiment with Large Language Models by providing features such as one-click download of popular models, finetuning across different hardware, RLHF and Preference Optimization, working with LLMs across different operating systems, chatting with models, using different inference engines, evaluating models, building datasets for training, calculating embeddings, providing a full REST API, running in the cloud, converting models across platforms, supporting plugins, embedded Monaco code editor, prompt editing, inference logs, all through a simple cross-platform GUI.

github

: 4.3k

tensorzero

TensorZero is an open-source platform that helps LLM applications graduate from API wrappers into defensible AI products. It enables a data & learning flywheel for LLMs by unifying inference, observability, optimization, and experimentation. The platform includes a high-performance model gateway, structured schema-based inference, observability, experimentation, and data warehouse for analytics. TensorZero Recipes optimize prompts and models, and the platform supports experimentation features and GitOps orchestration for deployment.

github

: 10.4k

dspy.rb

DSPy.rb is a Ruby framework for building reliable LLM applications using composable, type-safe modules. It enables developers to define typed signatures and compose them into pipelines, offering a more structured approach compared to traditional prompting. The framework embraces Ruby conventions and adds innovations like CodeAct agents and enhanced production instrumentation, resulting in scalable LLM applications that are robust and efficient. DSPy.rb is actively developed, with a focus on stability and real-world feedback through the 0.x series before reaching a stable v1.0 API.

github

: 78

hugo-blox-builder

Hugo Blox Builder is an open-source toolkit designed for building world-class technical and academic websites quickly and efficiently. Users can create blazing-fast, SEO-optimized sites in minutes by customizing templates with drag-and-drop blocks. The tool is built for a technical workflow, allowing users to own their content and brand without any vendor lock-in. With a modern stack featuring Hugo and Tailwind CSS, users can write in Markdown, Jupyter, or BibTeX and auto-sync publications. Hugo Blox is open and extendable, offering a generous MIT-licensed core that can be upgraded with premium templates and blocks or extended with React 'islands' for custom interactivity.

github

: 9.0k

ComfyUI-Copilot

ComfyUI-Copilot is an intelligent assistant built on the Comfy-UI framework that simplifies and enhances the AI algorithm debugging and deployment process through natural language interactions. It offers intuitive node recommendations, workflow building aids, and model querying services to streamline development processes. With features like interactive Q&A bot, natural language node suggestions, smart workflow assistance, and model querying, ComfyUI-Copilot aims to lower the barriers to entry for beginners, boost development efficiency with AI-driven suggestions, and provide real-time assistance for developers.

github

: 949

PageTalk

PageTalk is a browser extension that enhances web browsing by integrating Google's Gemini API. It allows users to select text on any webpage for AI analysis, translation, contextual chat, and customization. The tool supports multi-agent system, image input, rich content rendering, PDF parsing, URL context extraction, personalized settings, chat export, text selection helper, and proxy support. Users can interact with web pages, chat contextually, manage AI agents, and perform various tasks seamlessly.

github

: 292

Lidar_AI_Solution

Lidar AI Solution is a highly optimized repository for self-driving 3D lidar, providing solutions for sparse convolution, BEVFusion, CenterPoint, OSD, and Conversion. It includes CUDA and TensorRT implementations for various tasks such as 3D sparse convolution, BEVFusion, CenterPoint, PointPillars, V2XFusion, cuOSD, cuPCL, and YUV to RGB conversion. The repository offers easy-to-use solutions, high accuracy, low memory usage, and quantization options for different tasks related to self-driving technology.

github

: 1.2k

LynxHub

LynxHub is a platform that allows users to seamlessly install, configure, launch, and manage all their AI interfaces from a single, intuitive dashboard. It offers features like AI interface management, arguments manager, custom run commands, pre-launch actions, extension management, in-app tools like terminal and web browser, AI information dashboard, Discord integration, and additional features like theme options and favorite interface pinning. The platform supports modular design for custom AI modules and upcoming extensions system for complete customization. LynxHub aims to streamline AI workflow and enhance user experience with a user-friendly interface and comprehensive functionalities.

github

: 362

kserve

KServe provides a Kubernetes Custom Resource Definition for serving predictive and generative machine learning (ML) models. It encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU Autoscaling, Scale to Zero, and Canary Rollouts to ML deployments. KServe enables a simple, pluggable, and complete story for Production ML Serving including prediction, pre-processing, post-processing, and explainability. It is a standard, cloud agnostic Model Inference Platform for serving predictive and generative AI models on Kubernetes, built for highly scalable use cases.

github

: 4.6k

FastFlowLM

FastFlowLM is a Python library for efficient and scalable language model inference. It provides a high-performance implementation of language model scoring using n-gram language models. The library is designed to handle large-scale text data and can be easily integrated into natural language processing pipelines for tasks such as text generation, speech recognition, and machine translation. FastFlowLM is optimized for speed and memory efficiency, making it suitable for both research and production environments.

github

: 209

sokuji

Sokuji is a desktop application that provides live speech translation using advanced AI models from OpenAI, Google Gemini, CometAPI, Palabra.ai, and Kizuna AI. It aims to bridge language barriers in live conversations by capturing audio input, processing it through AI models, and delivering real-time translated output. The tool goes beyond basic translation by offering audio routing solutions with virtual device management (Linux only) for seamless integration with other applications. It features a modern interface with real-time audio visualization, comprehensive logging, and support for multiple AI providers and models.

github

: 288

CushyStudio

CushyStudio is a generative AI platform designed for creatives of any level to effortlessly create stunning images, videos, and 3D models. It offers CushyApps, a collection of visual tools tailored for different artistic tasks, and CushyKit, an extensive toolkit for custom apps development and task automation. Users can dive into the AI revolution, unleash their creativity, share projects, and connect with a vibrant community. The platform aims to simplify the AI art creation process and provide a user-friendly environment for designing interfaces, adding custom logic, and accessing various tools.

github

: 641

chunkhound

ChunkHound is a tool that transforms your codebase into a searchable knowledge base for AI assistants using semantic and regex search. It integrates with AI assistants via the Model Context Protocol (MCP) and offers features such as cAST algorithm for semantic code chunking, multi-hop semantic search, natural language queries, regex search without API keys, support for 22 languages, and local-first architecture. It provides intelligent code discovery by following semantic relationships and discovering related implementations. ChunkHound is built on the cAST algorithm from Carnegie Mellon University, ensuring structure-aware chunking that preserves code meaning. It supports universal language parsing and offers efficient updates for large codebases.

github

: 97

chunkhound

ChunkHound is a modern tool for transforming your codebase into a searchable knowledge base for AI assistants. It utilizes semantic search via the cAST algorithm and regex search, integrating with AI assistants through the Model Context Protocol (MCP). With features like cAST Algorithm, Multi-Hop Semantic Search, Regex search, and support for 22 languages, ChunkHound offers a local-first approach to code analysis and discovery. It provides intelligent code discovery, universal language support, and real-time indexing capabilities, making it a powerful tool for developers looking to enhance their coding experience.

github

: 90

replexica

Replexica is an i18n toolkit for React, to ship multi-language apps fast. It doesn't require extracting text into JSON files, and uses AI-powered API for content processing. It comes in two parts: 1. Replexica Compiler - an open-source compiler plugin for React; 2. Replexica API - an i18n API in the cloud that performs translations using LLMs. (Usage based, has a free tier.) Replexica supports several i18n formats: 1. JSON-free Replexica compiler format; 2. .md files for Markdown content; 3. Legacy JSON and YAML-based formats.

github

: 1.3k

For similar tasks

X-AnyLabeling

X-AnyLabeling is a robust annotation tool that seamlessly incorporates an AI inference engine alongside an array of sophisticated features. Tailored for practical applications, it is committed to delivering comprehensive, industrial-grade solutions for image data engineers. This tool excels in swiftly and automatically executing annotations across diverse and intricate tasks.

github

: 6.6k

file-organizer-2000

AI File Organizer 2000 is an Obsidian Plugin that uses AI to transcribe audio, annotate images, and automatically organize files by moving them to the most likely folders. It supports text, audio, and images, with upcoming local-first LLM support. Users can simply place unorganized files into the 'Inbox' folder for automatic organization. The tool renames and moves files quickly, providing a seamless file organization experience. Self-hosting is also possible by running the server and enabling the 'Self-hosted' option in the plugin settings. Join the community Discord server for more information and use the provided iOS shortcut for easy access on mobile devices.

github

: 531

LabelLLM

LabelLLM is an open-source data annotation platform designed to optimize the data annotation process for LLM development. It offers flexible configuration, multimodal data support, comprehensive task management, and AI-assisted annotation. Users can access a suite of annotation tools, enjoy a user-friendly experience, and enhance efficiency. The platform allows real-time monitoring of annotation progress and quality control, ensuring data integrity and timeliness.

github

: 634

awesome-open-data-annotation

At ZenML, we believe in the importance of annotation and labeling workflows in the machine learning lifecycle. This repository showcases a curated list of open-source data annotation and labeling tools that are actively maintained and fit for purpose. The tools cover various domains such as multi-modal, text, images, audio, video, time series, and other data types. Users can contribute to the list and discover tools for tasks like named entity recognition, data annotation for machine learning, image and video annotation, text classification, sequence labeling, object detection, and more. The repository aims to help users enhance their data-centric workflows by leveraging these tools.

github

: 425

anylabeling

AnyLabeling is a tool for effortless data labeling with AI support from YOLO and Segment Anything. It combines features from LabelImg and Labelme with an improved UI and auto-labeling capabilities. Users can annotate images with polygons, rectangles, circles, lines, and points, as well as perform auto-labeling using YOLOv5 and Segment Anything. The tool also supports text detection, recognition, and Key Information Extraction (KIE) labeling, with multiple language options available such as English, Vietnamese, and Chinese.

github

: 2.6k

awesome-object-detection-datasets

This repository is a curated list of awesome public object detection and recognition datasets. It includes a wide range of datasets related to object detection and recognition tasks, such as general detection and recognition datasets, autonomous driving datasets, adverse weather datasets, person detection datasets, anti-UAV datasets, optical aerial imagery datasets, low-light image datasets, infrared image datasets, SAR image datasets, multispectral image datasets, 3D object detection datasets, vehicle-to-everything field datasets, super-resolution field datasets, and face detection and recognition datasets. The repository also provides information on tools for data annotation, data augmentation, and data management related to object detection tasks.

github

: 67

LabelQuick

LabelQuick_V2.0 is a fast image annotation tool designed and developed by the AI Horizon team. This version has been optimized and improved based on the previous version. It provides an intuitive interface and powerful annotation and segmentation functions to efficiently complete dataset annotation work. The tool supports video object tracking annotation, quick annotation by clicking, and various video operations. It introduces the SAM2 model for accurate and efficient object detection in video frames, reducing manual intervention and improving annotation quality. The tool is designed for Windows systems and requires a minimum of 6GB of memory.

github

: 70

VisioFirm

github

: 298

For similar jobs

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. It provides a common API to deliver inference solutions on various platforms, including CPU, GPU, NPU, and heterogeneous devices. OpenVINO™ supports pre-trained models from Open Model Zoo and popular frameworks like TensorFlow, PyTorch, and ONNX. Key components of OpenVINO™ include the OpenVINO™ Runtime, plugins for different hardware devices, frontends for reading models from native framework formats, and the OpenVINO Model Converter (OVC) for adjusting models for optimal execution on target devices.

github

: 8.9k

peft

PEFT (Parameter-Efficient Fine-Tuning) is a collection of state-of-the-art methods that enable efficient adaptation of large pretrained models to various downstream applications. By only fine-tuning a small number of extra model parameters instead of all the model's parameters, PEFT significantly decreases the computational and storage costs while achieving performance comparable to fully fine-tuned models.

github

: 19.7k

jetson-generative-ai-playground

This repo hosts tutorial documentation for running generative AI models on NVIDIA Jetson devices. The documentation is auto-generated and hosted on GitHub Pages using their CI/CD feature to automatically generate/update the HTML documentation site upon new commits.

github

: 94

emgucv

Emgu CV is a cross-platform .Net wrapper for the OpenCV image-processing library. It allows OpenCV functions to be called from .NET compatible languages. The wrapper can be compiled by Visual Studio, Unity, and "dotnet" command, and it can run on Windows, Mac OS, Linux, iOS, and Android.

github

: 2.1k

MMStar

MMStar is an elite vision-indispensable multi-modal benchmark comprising 1,500 challenge samples meticulously selected by humans. It addresses two key issues in current LLM evaluation: the unnecessary use of visual content in many samples and the existence of unintentional data leakage in LLM and LVLM training. MMStar evaluates 6 core capabilities across 18 detailed axes, ensuring a balanced distribution of samples across all dimensions.

github

: 84

VLMEvalKit

VLMEvalKit is an open-source evaluation toolkit of large vision-language models (LVLMs). It enables one-command evaluation of LVLMs on various benchmarks, without the heavy workload of data preparation under multiple repositories. In VLMEvalKit, we adopt generation-based evaluation for all LVLMs, and provide the evaluation results obtained with both exact matching and LLM-based answer extraction.

github

: 3.1k

llava-docker

This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.

github

: 59