recognize

👁 👂 Smart media tagging for Nextcloud: recognizes faces, objects, landscapes, music genres

Stars: 584

Visit

Recognize is a smart media tagging tool for Nextcloud that automatically categorizes photos and music by recognizing faces, animals, landscapes, food, vehicles, buildings, landmarks, monuments, music genres, and human actions in videos. It uses pre-trained models for object detection, landmark recognition, face comparison, music genre classification, and video classification. The tool ensures privacy by processing images locally without sending data to cloud providers. However, it cannot process end-to-end encrypted files. Recognize is rated positively for ethical AI practices in terms of open-source software, freely available models, and training data transparency, except for music genre recognition due to limited access to training data.

README:

Recognize: Smart media tagging for Nextcloud

This app goes through your media collection and adds fitting tags, automatically categorizing your photos and music.

📷 👪 Recognizes faces from contact photos
📷 🏔 Recognizes animals, landscapes, food, vehicles, buildings and other objects
📷 🗼 Recognizes landmarks and monuments
👂 🎵 Recognizes music genres
🎥 🤸 Recognizes human actions on video

⚡ Tagging works via Nextcloud's Collaborative Tags

👂 listen to your tagged music with the audioplayer app
📷 view your tagged photos and videos with the photos app

Model sizes:

Object recognition: 1GB
Landmark recognition: 300MB
Video action recognition: 50MB
Music genre recognition: 50MB

Ethical AI Rating

Rating for Photo object detection: 🟢

Positive:

the software for training and inference of this model is open source
the trained model is freely available, and thus can be run on-premises
the training data is freely available, making it possible to check or correct for bias or optimise the performance and CO2 usage.

Rating for Photo face recognition: 🟢

Positive:

the software for training and inference of this model is open source
the trained model is freely available, and thus can be run on-premises
the training data is freely available, making it possible to check or correct for bias or optimise the performance and CO2 usage.

Rating for Video action recognition: 🟢

Positive:

the software for training and inferencing of this model is open source
the trained model is freely available, and thus can be ran on-premises
the training data is freely available, making it possible to check or correct for bias or optimise the performance and CO2 usage.

Rating Music genre recognition: 🟡

Positive:

the software for training and inference of this model is open source
the trained model is freely available, and thus can be run on-premises

Negative:

the training data is not freely available, limiting the ability of external parties to check and correct for bias or optimise the model’s performance and CO2 usage.

Learn more about the Nextcloud Ethical AI Rating in our blog.

Examples

(Screenshot by _DigitalWriter_)

Privacy

This app does not send any sensitive data to cloud providers or similar services. All image processing is done on your nextcloud machine, using Tensorflow.js running in Node.js, which comes bundled with this app.

Encryption

Note that end-to-end encrypted files are not possible to be processed by recognize, because the server by design cannot read them.

Behind the scenes

Recognize uses

a pre-trained Efficient Net v2 model for ImageNet object detection.
a pre-trained model trained on the Landmarks v1 dataset for landmark recognition.
face-api.js to extract and compare face features.
a Musicnn neural network architecture to classify audio files into music genres. Also see the original musicnn repository.
a pre-trained MoViNet model for video classification

Learn more about what's going on behind the scenes in this wiki article and this forum post.

Install

Requirements

php 8.0 and above
App "collaborative tags" enabled
For native speed:
- Processor: x86 64-bit (with support for AVX instructions)
- System with glibc (usually the norm on Linux; FreeBSD, Alpine linux and thus also Nextcloud AIO are not such systems)
For sub-native speed (using WASM mode)
- Processor: x86 64-bit, arm64, armv7l (no AVX needed)
- System with glibc or musl (incl. Alpine linux and thus also Nextcloud AIO)
~4GB of free RAM (if you're cutting it close, make sure you have some swap available)

Tmp

This app temporarily stores files to be recognized in /tmp. If you're using docker, you might find that adding an additional volume for /tmp speeds things up and eases the burden on your disk:

⚠️⚠️⚠️ Make sure that your RAM is big enough to store big files. Otherwise public uploads will fail.

docker run: Add --mount type=tmpfs,destination=/tmp:exec to command line.

docker compose: Add the following to the volume section docker-compose.yml:

  app:
    image: nextcloud:26
    ...
    volumes:
      - type: tmpfs
        target: /tmp:exec
      ...
    ...

One click

Go to "Apps" in your nextcloud, search for "recognize" and click install.

Help: If one-click install fails

Configuration

Any configuration is done in Settings/Recognize of your Nextcloud instance.

Ignoring directories

If you want path/to/your/folder/* to be excluded from image recognition, add a file path/to/your/folder/.noimage. If you want to exclude it from music genre recognition, add a file path/to/your/folder/.nomusic. If you want to exclude it from video recognition, add a file path/to/your/folder/.novideo. If you want to exclude it from all recognition, add a file path/to/your/folder/.nomedia.

Manual install

Dependencies

Setup

cd /path/to/nextcloud/apps/
git clone https://github.com/marcelklehr/recognize.git
cd recognize
make

Maintainers

Marcel Klehr

🛠️ State of maintenance

While there are some things that could be done to further improve this app, the app is currently maintained with limited effort. This means:

The main functionality works for the majority of the use cases
We will ensure that the app will continue to work like this for future releases and we will fix bugs that we classify as 'critical'
We will not invest further development resources ourselves in advancing the app with new features
We do review and enthusiastically welcome community PR's

We would be more than excited if you would like to collaborate with us. We will merge pull requests for new features and fixes. We also would love to welcome co-maintainers.

If you are a customer of Nextcloud and you have a strong business case for any development of this app, we will consider your wishes for our roadmap. Please contact your account manager to talk about the possibilities.

Contribute

We always welcome contributions. Have an issue or an idea for a feature? Let us know. Additionally, we happily accept pull requests.

In order to make the process run more smoothly, you can make sure of the following things:

Announce that you're working on a feature/bugfix in the relevant issue
Make sure the tests are passing
If you have any questions you can let the maintainers above know privately via email, or simply open an issue on github

Please read the Code of Conduct. This document offers some guidance to ensure Nextcloud participants can cooperate effectively in a positive and inspiring atmosphere, and to explain how together we can strengthen and support each other.

More information on how to contribute: https://nextcloud.com/contribute/

Happy hacking ❤️

License

This software is licensed under the terms of the AGPL written by the Free Software Foundation and available at COPYING.

The recognize logo Smart tag by Xinh Studio from the Noun Project is licensed under a Creative Commons Attribution license.

For Tasks:

Click tags to check more tools for each tasks

tag photos categorize music recognize faces classify videos identify landmarks

For Jobs:

photographer content creator social media manager data analyst digital marketer

Alternative AI tools for recognize

Similar Open Source Tools

recognize

github

: 584

WilmerAI

WilmerAI is a middleware system designed to process prompts before sending them to Large Language Models (LLMs). It categorizes prompts, routes them to appropriate workflows, and generates manageable prompts for local models. It acts as an intermediary between the user interface and LLM APIs, supporting multiple backend LLMs simultaneously. WilmerAI provides API endpoints compatible with OpenAI API, supports prompt templates, and offers flexible connections to various LLM APIs. The project is under heavy development and may contain bugs or incomplete code.

github

: 803

lollms-webui

LoLLMs WebUI (Lord of Large Language Multimodal Systems: One tool to rule them all) is a user-friendly interface to access and utilize various LLM (Large Language Models) and other AI models for a wide range of tasks. With over 500 AI expert conditionings across diverse domains and more than 2500 fine tuned models over multiple domains, LoLLMs WebUI provides an immediate resource for any problem, from car repair to coding assistance, legal matters, medical diagnosis, entertainment, and more. The easy-to-use UI with light and dark mode options, integration with GitHub repository, support for different personalities, and features like thumb up/down rating, copy, edit, and remove messages, local database storage, search, export, and delete multiple discussions, make LoLLMs WebUI a powerful and versatile tool.

github

: 4.8k

sdk

Vikit.ai SDK is a software development kit that enables easy development of video generators using generative AI and other AI models. It serves as a langchain to orchestrate AI models and video editing tools. The SDK allows users to create videos from text prompts with background music and voice-over narration. It also supports generating composite videos from multiple text prompts. The tool requires Python 3.8+, specific dependencies, and tools like FFMPEG and ImageMagick for certain functionalities. Users can contribute to the project by following the contribution guidelines and standards provided.

github

: 55

digma

Digma is a Continuous Feedback platform that provides code-level insights related to performance, errors, and usage during development. It empowers developers to own their code all the way to production, improving code quality and preventing critical issues. Digma integrates with OpenTelemetry traces and metrics to generate insights in the IDE, helping developers analyze code scalability, bottlenecks, errors, and usage patterns.

github

: 396

wave-apps

Wave Apps is a directory of sample applications built on H2O Wave, allowing users to build AI apps faster. The apps cover various use cases such as explainable hotel ratings, human-in-the-loop credit risk assessment, mitigating churn risk, online shopping recommendations, and sales forecasting EDA. Users can download, modify, and integrate these sample apps into their own projects to learn about app development and AI model deployment.

github

: 145

examor

Examor is a website application that allows you to take exams based on your knowledge notes. It helps you to remember what you have learned and written. The application generates a set of questions from the documents you upload, and you can answer them to test your knowledge. Examor also uses GPT to score and validate your answers, and provides you with feedback. The application is still in its early stages of development, but it has the potential to be a valuable tool for learners.

github

: 1.0k

AI-Horde

The AI Horde is an enterprise-level ML-Ops crowdsourced distributed inference cluster for AI Models. This middleware can support both Image and Text generation. It is infinitely scalable and supports seamless drop-in/drop-out of compute resources. The Public version allows people without a powerful GPU to use Stable Diffusion or Large Language Models like Pygmalion/Llama by relying on spare/idle resources provided by the community and also allows non-python clients, such as games and apps, to use AI-provided generations.

github

: 1.1k

morphik-core

Morphik is an AI-native toolset designed to help developers integrate context into their AI applications by providing tools to store, represent, and search unstructured data. It offers features such as multimodal search, fast metadata extraction, and integrations with existing tools. Morphik aims to address the challenges of traditional AI approaches that struggle with visually rich documents and provide a more comprehensive solution for understanding and processing complex data.

github

: 3.5k

promptbuddy

Prompt Buddy is a Microsoft Teams app that provides a central location for teams to share and discover their favorite AI prompts. It comes preloaded with Microsoft Copilot and other categories, but users can also add their own custom prompts. The app is easy to use and allows users to upvote their favorite prompts, which raises them to the top of the leaderboard. Prompt Buddy also supports dark mode and offers a mobile layout for use on phones. It is built on the Power Platform and can be customized and extended by the installer.

github

: 161

LLocalSearch

LLocalSearch is a completely locally running search aggregator using LLM Agents. The user can ask a question and the system will use a chain of LLMs to find the answer. The user can see the progress of the agents and the final answer. No OpenAI or Google API keys are needed.

github

: 5.3k

Aimmy

Aimmy is a universal AI-Based Aim Alignment Mechanism developed by BabyHamsta, MarsQQ & Taylor to make gaming more accessible for users who have difficulty aiming. It utilizes DirectML, ONNX, and YOLOV8 for player detection, offering high accuracy and fast performance. Aimmy features an easy-to-use UI, extensive customizability, and is free of ads and paywalls. It is designed for gamers facing challenges like physical or mental disabilities, poor hand-eye coordination, or aiming difficulties due to environmental factors. Aimmy provides various features like AI detection, customizability, anti-recoil system, mouse movement methods, hotswappability, and a model/configuration store with repository support.

github

: 839

OpenBB

The OpenBB Platform is the first financial platform that is free and fully open source, offering access to equity, options, crypto, forex, macro economy, fixed income, and more. It provides a broad range of extensions to enhance the user experience according to their needs. Users can sign up to the OpenBB Hub to maximize the benefits of the OpenBB ecosystem. Additionally, the platform includes an AI-powered Research and Analytics Workspace for free. There is also an open source AI financial analyst agent available that can access all the data within OpenBB.

github

: 51.9k

gpdb

Greenplum Database (GPDB) is an advanced, fully featured, open source data warehouse, based on PostgreSQL. It provides powerful and rapid analytics on petabyte scale data volumes. Uniquely geared toward big data analytics, Greenplum Database is powered by the world’s most advanced cost-based query optimizer delivering high analytical query performance on large data volumes.

github

: 6.2k

n8n-docs

n8n is an extendable workflow automation tool that enables you to connect anything to everything. It is open-source and can be self-hosted or used as a service. n8n provides a visual interface for creating workflows, which can be used to automate tasks such as data integration, data transformation, and data analysis. n8n also includes a library of pre-built nodes that can be used to connect to a variety of applications and services. This makes it easy to create complex workflows without having to write any code.

github

: 1.1k

nlp-zero-to-hero

This repository provides a comprehensive guide to Natural Language Processing (NLP), covering topics from Tokenization to Transformer Architecture. It aims to equip users with a solid understanding of NLP concepts, evolution, and core intuition. The repository includes practical examples and hands-on experience to facilitate learning and exploration in the field of NLP.

github

: 148

For similar tasks

recognize

github

: 584

react-native-vision-camera

VisionCamera is a powerful, high-performance Camera library for React Native. It features Photo and Video capture, QR/Barcode scanner, Customizable devices and multi-cameras ("fish-eye" zoom), Customizable resolutions and aspect-ratios (4k/8k images), Customizable FPS (30..240 FPS), Frame Processors (JS worklets to run facial recognition, AI object detection, realtime video chats, ...), Smooth zooming (Reanimated), Fast pause and resume, HDR & Night modes, Custom C++/GPU accelerated video pipeline (OpenGL).

github

: 8.2k

MiniAI-Face-Recognition-LivenessDetection-WindowsSDK

This repository contains a C++ application that demonstrates face recognition capabilities using computer vision techniques. The demo utilizes OpenCV and dlib libraries for efficient face detection and recognition with 3D passive face liveness detection (face anti-spoofing). Key Features: Face detection: The SDK utilizes advanced computer vision techniques to detect faces in images or video frames, enabling a wide range of applications. Face recognition: It can recognize known faces by comparing them with a pre-defined database of individuals. Age estimation: It can estimate the age of detected faces. Gender detection: It can determine the gender of detected faces. Liveness detection: It can detect whether a face is from a live person or a static image.

github

: 102

viseron

Viseron is a self-hosted, local-only NVR and AI computer vision software that provides features such as object detection, motion detection, and face recognition. It allows users to monitor their home, office, or any other place they want to keep an eye on. Getting started with Viseron is easy by spinning up a Docker container and editing the configuration file using the built-in web interface. The software's functionality is enabled by components, which can be explored using the Component Explorer. Contributors are welcome to help with implementing open feature requests, improving documentation, and answering questions in issues or discussions. Users can also sponsor Viseron or make a one-time donation.

github

: 1.8k

Awesome-AI-Data-Guided-Projects

A curated list of data science & AI guided projects to start building your portfolio. The repository contains guided projects covering various topics such as large language models, time series analysis, computer vision, natural language processing (NLP), and data science. Each project provides detailed instructions on how to implement specific tasks using different tools and technologies.

github

: 83

CodeProject.AI-Server

CodeProject.AI Server is a standalone, self-hosted, fast, free, and open-source Artificial Intelligence microserver designed for any platform and language. It can be installed locally without the need for off-device or out-of-network data transfer, providing an easy-to-use solution for developers interested in AI programming. The server includes a HTTP REST API server, backend analysis services, and the source code, enabling users to perform various AI tasks locally without relying on external services or cloud computing. Current capabilities include object detection, face detection, scene recognition, sentiment analysis, and more, with ongoing feature expansions planned. The project aims to promote AI development, simplify AI implementation, focus on core use-cases, and leverage the expertise of the developer community.

github

: 645

autonomous-intelligence

Tau is an autonomous robot project inspired by Pi.AI, designed for continual conversation with a single context. It features speech-based interaction, memory management, and integration with vision services. The project aims to create a local AI companion with personality, suitable for experimentation and development. Key components include long and immediate memory, speech-to-text and text-to-speech capabilities, and integration with Nvidia Jetson and Hailo vision services. Tau is open-source and encourages community contributions and experimentation.

github

: 207

For similar jobs

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

daily-poetry-image

Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.

github

: 492

exif-photo-blog

EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.

github

: 1.4k

SillyTavern

SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.

github

: 23.0k

Twitter-Insight-LLM

This project enables you to fetch liked tweets from Twitter (using Selenium), save it to JSON and Excel files, and perform initial data analysis and image captions. This is part of the initial steps for a larger personal project involving Large Language Models (LLMs).

github

: 401

AISuperDomain

Aila Desktop Application is a powerful tool that integrates multiple leading AI models into a single desktop application. It allows users to interact with various AI models simultaneously, providing diverse responses and insights to their inquiries. With its user-friendly interface and customizable features, Aila empowers users to engage with AI seamlessly and efficiently. Whether you're a researcher, student, or professional, Aila can enhance your AI interactions and streamline your workflow.

github

: 1.2k

ChatGPT-On-CS

This project is an intelligent dialogue customer service tool based on a large model, which supports access to platforms such as WeChat, Qianniu, Bilibili, Douyin Enterprise, Douyin, Doudian, Weibo chat, Xiaohongshu professional account operation, Xiaohongshu, Zhihu, etc. You can choose GPT3.5/GPT4.0/ Lazy Treasure Box (more platforms will be supported in the future), which can process text, voice and pictures, and access external resources such as operating systems and the Internet through plug-ins, and support enterprise AI applications customized based on their own knowledge base.

github

: 768

obs-localvocal

LocalVocal is a live-streaming AI assistant plugin for OBS that allows you to transcribe audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). It's privacy-first, with all data staying on your machine, and requires no GPU, cloud costs, network, or downtime.

github

: 248

recognize

README:

Recognize: Smart media tagging for Nextcloud

Ethical AI Rating

Rating for Photo object detection: 🟢

Rating for Photo face recognition: 🟢

Rating for Video action recognition: 🟢

Rating Music genre recognition: 🟡

Examples

Privacy

Encryption

Categories

Behind the scenes

Install

Requirements

Tmp

One click

Configuration

Ignoring directories

Manual install

Dependencies

Setup

Maintainers

🛠️ State of maintenance

Contribute

License

For Tasks:

For Jobs:

Alternative AI tools for recognize

Similar Open Source Tools

recognize

WilmerAI

lollms-webui

sdk

digma

wave-apps

examor

AI-Horde

morphik-core

promptbuddy

LLocalSearch

Aimmy

OpenBB

gpdb

n8n-docs

nlp-zero-to-hero

For similar tasks

recognize

react-native-vision-camera

MiniAI-Face-Recognition-LivenessDetection-WindowsSDK

viseron

Awesome-AI-Data-Guided-Projects

CodeProject.AI-Server

autonomous-intelligence

For similar jobs

LLMStack

daily-poetry-image

exif-photo-blog

SillyTavern

Twitter-Insight-LLM

AISuperDomain

ChatGPT-On-CS

obs-localvocal