Qmedia

Qmedia

An open-source AI content search engine designed specifically for content creators. Supports extraction of text, images, and short videos. Allows full local deployment (web app, RAG server, LLM server). Supports multi-modal RAG content Q&A.

Stars: 421

Visit
 screenshot

QMedia is an open-source multimedia AI content search engine designed specifically for content creators. It provides rich information extraction methods for text, image, and short video content. The tool integrates unstructured text, image, and short video information to build a multimodal RAG content Q&A system. Users can efficiently search for image/text and short video materials, analyze content, provide content sources, and generate customized search results based on user interests and needs. QMedia supports local deployment for offline content search and Q&A for private data. The tool offers features like content cards display, multimodal content RAG search, and pure local multimodal models deployment. Users can deploy different types of models locally, manage language models, feature embedding models, image models, and video models. QMedia aims to spark new ideas for content creation and share AI content creation concepts in an open-source manner.

README:

alt text

QMedia

AI content search engine designed specifically for content creators.

English | 简体中文

Changelog - Report Issues - Request Feature

Twitter License: MIT

Key Features

  • Search for image/text and short video materials.
  • Efficiently analyze image/text and short video content, integrating scattered information.
  • Provide content sources and decompose image/text and short video information, presenting information through content cards.
  • Generate customized search results based on user interests and needs from image/text and short video content.
  • Local deployment, enabling offline content search and Q&A for private data.
Directory

👋🏻 Introduction

QMedia is an open-source multimedia AI content search engine , provides rich information extraction methods for text/image and short video content. It integrates unstructured text/image and short video information to build a multimodal RAG content Q&A system. The aim is to share and exchange ideas on AI content creation in an open-source manner. issues

Share QMedia with your friends.

Spark new ideas for content creation

Join our Discord community!
alt text Join our WeChat group !

💫 Feature Overview

  • Content Cards

    • Display image/text and video content in the form of cards
    • Web Service inspired by XHS web version, implemented using the technology stack of Typescript, Next.js, TailwindCSS, and Shadcn/UI
    • RAG Search/Q&A Service and Image/Text/Video Model Service implemented using the Python framework and LlamaIndex applications
    • Web Service, RAG Search/Q&A Service, and Image/Text/Video Model Service can be deployed separately for flexible deployment based on user resources, and can be embedded into other systems for image/text and video content extraction.
    alt text
  • Multimodal Content RAG

    • Search for image/text and short video materials.
    • Extract useful information from image/text and short video content based on user queries to generate high-quality answers.
    • Present content sources and the breakdown of image/text and short video information through content cards.
    • Retrieval and Q&A rely on the breakdown of image/text and short video content, including image style, text layout, short video transcription, video summaries, etc.
    • Support Google content search.
    alt text
  • Pure Local Multimodal Models

    Deployment of various types of models locally Separation from the RAG application layer, making it easy to replace different models Local model lifecycle management, configurable for manual or automatic release to reduce server load

    Language Models:

    Feature Embedding Models:

    • Image Embedding: CLIP Encoder Convert images to text feature encoding.
    • Text Embedding: BGE Encoder Multilingual embedded model, converting text to feature encoding, with local models aligned to GPT Encoder.

    Image Models:

    • Image Text OCR Recognition: Qanything Local Knowledge Base Q&A System OCR
    • Visual Understanding Models:

      • [ ] llava-llama3: Ollama's locally deployed GPT-4V level visual understanding model.

    Video Models

    • Video Transcription:
      • Faster Whisper: Quickly extract video transcription content, can run on local CPU.
    • LLM-based Short Video Content Summarization
    • [ ] Identification of highlights in short videos
    • [ ] Recognition of short video style types
    • [ ] Analysis and breakdown of short video content

Future Plans

  • [ ] Image/Text Short Video Content Analysis and Viral Content Breakdown
  • [ ] Search for Similar Image/Text/Video
  • [ ] Card Image/Text Content Generation
  • [ ] Short Video Content Editing


🤖 Installation

File Structure Introduction

QMedia services: Depending on resource availability, they can be deployed locally or the model services can be deployed in the cloud

  • Multimodal Model Service mm_server:

    • Multimodal model deployment and API calls

    • Ollama LLM models

    • Image models

    • Video models

    • Feature embedding models


  • Content Search and Q&A Service mmrag_server:

    • Content Card Display and Query

    • Image/Text/Short Video Content Extraction, Embedding, and Storage Service

    • Multimodal Data RAG Retrieval Service

    • Content Q&A Service


  • Web Service qmedia_web: Language: TypeScript Framework: Next.js Styling: Tailwind CSS Components: shadcn/ui


⭐️ Usage

Combined Usage

mm_server + qmedia_web + mmrag_server Web Page Content Display, Content RAG Search and Q&A, Model Service

  1. Service Startup Process:
# Start mm_server service
cd mm_server
source activate qllm
python main.py

# Start mmrag_server service
cd mmrag_server
source activate qmedia
python main.py

# Start qmedia_web service
cd qmedia_web
pnpm dev
  1. Using Functions via the Web Page During the startup phase, mmrag_server will read pseudo data from assets/medias and assets/mm_pseudo_data.json, and call mm_server to extract and structure the information from text/image and short videos into node information, which is then stored in the db. The retrieval and Q&A will be based on the data in the db.

Custom Data

# assets file structure
assets
├── mm_pseudo_data.json # Content card data
└── medias # Image/Video files

Replace the contents in assets and delete the historically stored db file. assets/medias contains image/video files, which can be replaced with your own image/video files. assets/mm_pseudo_data.json contains content card data, which can be replaced with your own content card data. After running the service, the model will automatically extract the information and store it in the db.


Independent Model Service

Can use the mm_server local image/text/video information extraction service independently. It can be used as a standalone image encoding, text encoding, video transcription extraction, and image OCR service, accessible via API in any scenario.

# Start mm_server service independently
cd mm_server
python main.py

# uvicorn main:app --reload --host localhost --port 50110

API Content:

alt text


Pure Python RAG Service/Model Service

Can use mm_server + qmedia_web together to perform content extraction and RAG retrieval in a pure Python environment via APIs.

# Start mmrag_server service independently
cd mmrag_server
python main.py

# uvicorn main:app --reload --host localhost --port 50110

API Content:

alt text


Star History

Star History Chart

License

QMedia is licensed under MIT License

Acknowledgments

Thanks to QAnything for strong OCR models.

Thanks to llava-llama3 for strong llm vision models.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for Qmedia

Similar Open Source Tools

For similar tasks

For similar jobs