Qmedia

An open-source AI content search engine designed specifically for content creators. Supports extraction of text, images, and short videos. Allows full local deployment (web app, RAG server, LLM server). Supports multi-modal RAG content Q&A.

Stars: 537

Visit

QMedia is an open-source multimedia AI content search engine designed specifically for content creators. It provides rich information extraction methods for text, image, and short video content. The tool integrates unstructured text, image, and short video information to build a multimodal RAG content Q&A system. Users can efficiently search for image/text and short video materials, analyze content, provide content sources, and generate customized search results based on user interests and needs. QMedia supports local deployment for offline content search and Q&A for private data. The tool offers features like content cards display, multimodal content RAG search, and pure local multimodal models deployment. Users can deploy different types of models locally, manage language models, feature embedding models, image models, and video models. QMedia aims to spark new ideas for content creation and share AI content creation concepts in an open-source manner.

README:

QMedia

AI content search engine designed specifically for content creators.

English | 简体中文

Changelog - Report Issues - Request Feature

Key Features

Search for image/text and short video materials.
Efficiently analyze image/text and short video content, integrating scattered information.
Provide content sources and decompose image/text and short video information, presenting information through content cards.
Generate customized search results based on user interests and needs from image/text and short video content.
Local deployment, enabling offline content search and Q&A for private data.

👋🏻 Introduction

QMedia is an open-source multimedia AI content search engine , provides rich information extraction methods for text/image and short video content. It integrates unstructured text/image and short video information to build a multimodal RAG content Q&A system. The aim is to share and exchange ideas on AI content creation in an open-source manner. issues

Share QMedia with your friends.

Spark new ideas for content creation

	Join our Discord community！
	Join our WeChat group !

💫 Feature Overview

Content Cards
- Display image/text and video content in the form of cards
- Web Service inspired by XHS web version, implemented using the technology stack of Typescript, Next.js, TailwindCSS, and Shadcn/UI
- RAG Search/Q&A Service and Image/Text/Video Model Service implemented using the Python framework and LlamaIndex applications
- Web Service, RAG Search/Q&A Service, and Image/Text/Video Model Service can be deployed separately for flexible deployment based on user resources, and can be embedded into other systems for image/text and video content extraction.
Multimodal Content RAG
- Search for image/text and short video materials.
- Extract useful information from image/text and short video content based on user queries to generate high-quality answers.
- Present content sources and the breakdown of image/text and short video information through content cards.
- Retrieval and Q&A rely on the breakdown of image/text and short video content, including image style, text layout, short video transcription, video summaries, etc.
- Support Google content search.
Pure Local Multimodal Models

Deployment of various types of models locally Separation from the RAG application layer, making it easy to replace different models Local model lifecycle management, configurable for manual or automatic release to reduce server load

Language Models:
- Support local Ollama model switching.
  - llama3:8b-instruct Lightweight local deployment of LLM models.
  - llama3:70b-instruct Eighth place in open-source LLM models.
Feature Embedding Models:
- Image Embedding: CLIP Encoder Convert images to text feature encoding.
- Text Embedding: BGE Encoder Multilingual embedded model, converting text to feature encoding, with local models aligned to GPT Encoder.
Image Models:
- Image Text OCR Recognition: Qanything Local Knowledge Base Q&A System OCR
- Visual Understanding Models:
  - [ ] llava-llama3: Ollama's locally deployed GPT-4V level visual understanding model.
Video Models
- Video Transcription:
  - Faster Whisper: Quickly extract video transcription content, can run on local CPU.
- LLM-based Short Video Content Summarization
- [ ] Identification of highlights in short videos
- [ ] Recognition of short video style types
- [ ] Analysis and breakdown of short video content

Future Plans

[ ] Image/Text Short Video Content Analysis and Viral Content Breakdown
[ ] Search for Similar Image/Text/Video
[ ] Card Image/Text Content Generation
[ ] Short Video Content Editing

🤖 Installation

File Structure Introduction

QMedia services: Depending on resource availability, they can be deployed locally or the model services can be deployed in the cloud

mm_server Installation

Multimodal Model Service mm_server:
- Multimodal model deployment and API calls
- Ollama LLM models
- Image models
- Video models
- Feature embedding models

mmrag_server Installation

Content Search and Q&A Service mmrag_server:
- Content Card Display and Query
- Image/Text/Short Video Content Extraction, Embedding, and Storage Service
- Multimodal Data RAG Retrieval Service
- Content Q&A Service

qmedia_web Installation

Web Service qmedia_web: Language: TypeScript Framework: Next.js Styling: Tailwind CSS Components: shadcn/ui

⭐️ Usage

Combined Usage

mm_server + qmedia_web + mmrag_server Web Page Content Display, Content RAG Search and Q&A, Model Service

Service Startup Process:

# Start mm_server service
cd mm_server
source activate qllm
python main.py

# Start mmrag_server service
cd mmrag_server
source activate qmedia
python main.py

# Start qmedia_web service
cd qmedia_web
pnpm dev

Using Functions via the Web Page During the startup phase, mmrag_server will read pseudo data from assets/medias and assets/mm_pseudo_data.json, and call mm_server to extract and structure the information from text/image and short videos into node information, which is then stored in the db. The retrieval and Q&A will be based on the data in the db.

Custom Data

# assets file structure
assets
├── mm_pseudo_data.json # Content card data
└── medias # Image/Video files

Replace the contents in assets and delete the historically stored db file. assets/medias contains image/video files, which can be replaced with your own image/video files. assets/mm_pseudo_data.json contains content card data, which can be replaced with your own content card data. After running the service, the model will automatically extract the information and store it in the db.

Independent Model Service

Can use the mm_server local image/text/video information extraction service independently. It can be used as a standalone image encoding, text encoding, video transcription extraction, and image OCR service, accessible via API in any scenario.

# Start mm_server service independently
cd mm_server
python main.py

# uvicorn main:app --reload --host localhost --port 50110

API Content:

http://localhost:50110/docs

Pure Python RAG Service/Model Service

Can use mm_server + qmedia_web together to perform content extraction and RAG retrieval in a pure Python environment via APIs.

# Start mmrag_server service independently
cd mmrag_server
python main.py

# uvicorn main:app --reload --host localhost --port 50110

API Content:

Star History

License

QMedia is licensed under MIT License

Acknowledgments

Thanks to QAnything for strong OCR models.

Thanks to llava-llama3 for strong llm vision models.

Thanks to Ghibli Image Generator for api support.

For Tasks:

Click tags to check more tools for each tasks

search image/text analyze content generate search results deploy local models extract information

For Jobs:

content creator ai content creator multimedia content analyst ai content strategist multimodal content researcher

Alternative AI tools for Qmedia

Similar Open Source Tools

Qmedia

github

: 537

llama-assistant

Llama Assistant is an AI-powered assistant that helps with daily tasks, such as voice recognition, natural language processing, summarizing text, rephrasing sentences, answering questions, and more. It runs offline on your local machine, ensuring privacy by not sending data to external servers. The project is a work in progress with regular feature additions.

github

: 300

NextChat

NextChat is a well-designed cross-platform ChatGPT web UI tool that supports Claude, GPT4, and Gemini Pro. It offers a compact client for Linux, Windows, and MacOS, with features like self-deployed LLMs compatibility, privacy-first data storage, markdown support, responsive design, and fast loading speed. Users can create, share, and debug chat tools with prompt templates, access various prompts, compress chat history, and use multiple languages. The tool also supports enterprise-level privatization and customization deployment, with features like brand customization, resource integration, permission control, knowledge integration, security auditing, private deployment, and continuous updates.

github

: 78.7k

llama-assistant

Llama Assistant is a local AI assistant that respects your privacy. It is an AI-powered assistant that can recognize your voice, process natural language, and perform various actions based on your commands. It can help with tasks like summarizing text, rephrasing sentences, answering questions, writing emails, and more. The assistant runs offline on your local machine, ensuring privacy by not sending data to external servers. It supports voice recognition, natural language processing, and customizable UI with adjustable transparency. The project is a work in progress with new features being added regularly.

github

: 485

educhain

Educhain is a powerful Python package that leverages Generative AI to create engaging and personalized educational content. It enables users to generate multiple-choice questions, create lesson plans, and support various LLM models. Users can export questions to JSON, PDF, and CSV formats, customize prompt templates, and generate questions from text, PDF, URL files, youtube videos, and images. Educhain outperforms traditional methods in content generation speed and quality. It offers advanced configuration options and has a roadmap for future enhancements, including integration with popular Learning Management Systems and a mobile app for content generation on-the-go.

github

: 157

amica

Amica is an application that allows you to easily converse with 3D characters in your browser. You can import VRM files, adjust the voice to fit the character, and generate response text that includes emotional expressions.

github

: 879

ChatGPT-Next-Web

ChatGPT Next Web is a well-designed cross-platform ChatGPT web UI tool that supports Claude, GPT4, and Gemini Pro models. It allows users to deploy their private ChatGPT applications with ease. The tool offers features like one-click deployment, compact client for Linux/Windows/MacOS, compatibility with self-deployed LLMs, privacy-first approach with local data storage, markdown support, responsive design, fast loading speed, prompt templates, awesome prompts, chat history compression, multilingual support, and more.

github

: 78.5k

LongLLaVA

LongLLaVA is a tool for scaling multi-modal LLMs to 1000 images efficiently via hybrid architecture. It includes stages for single-image alignment, instruction-tuning, and multi-image instruction-tuning, with evaluation through a command line interface and model inference. The tool aims to achieve GPT-4V level capabilities and beyond, providing reproducibility of results and benchmarks for efficiency and performance.

github

: 158

fiftyone

FiftyOne is an open-source tool designed for building high-quality datasets and computer vision models. It supercharges machine learning workflows by enabling users to visualize datasets, interpret models faster, and improve efficiency. With FiftyOne, users can explore scenarios, identify failure modes, visualize complex labels, evaluate models, find annotation mistakes, and much more. The tool aims to streamline the process of improving machine learning models by providing a comprehensive set of features for data analysis and model interpretation.

github

: 9.3k

obsei

Obsei is an open-source, low-code, AI powered automation tool that consists of an Observer to collect unstructured data from various sources, an Analyzer to analyze the collected data with various AI tasks, and an Informer to send analyzed data to various destinations. The tool is suitable for scheduled jobs or serverless applications as all Observers can store their state in databases. Obsei is still in alpha stage, so caution is advised when using it in production. The tool can be used for social listening, alerting/notification, automatic customer issue creation, extraction of deeper insights from feedbacks, market research, dataset creation for various AI tasks, and more based on creativity.

github

: 1.2k

DB-GPT

DB-GPT is a personal database administrator that can solve database problems by reading documents, using various tools, and writing analysis reports. It is currently undergoing an upgrade. **Features:** * **Online Demo:** * Import documents into the knowledge base * Utilize the knowledge base for well-founded Q&A and diagnosis analysis of abnormal alarms * Send feedbacks to refine the intermediate diagnosis results * Edit the diagnosis result * Browse all historical diagnosis results, used metrics, and detailed diagnosis processes * **Language Support:** * English (default) * Chinese (add "language: zh" in config.yaml) * **New Frontend:** * Knowledgebase + Chat Q&A + Diagnosis + Report Replay * **Extreme Speed Version for localized llms:** * 4-bit quantized LLM (reducing inference time by 1/3) * vllm for fast inference (qwen) * Tiny LLM * **Multi-path extraction of document knowledge:** * Vector database (ChromaDB) * RESTful Search Engine (Elasticsearch) * **Expert prompt generation using document knowledge** * **Upgrade the LLM-based diagnosis mechanism:** * Task Dispatching -> Concurrent Diagnosis -> Cross Review -> Report Generation * Synchronous Concurrency Mechanism during LLM inference * **Support monitoring and optimization tools in multiple levels:** * Monitoring metrics (Prometheus) * Flame graph in code level * Diagnosis knowledge retrieval (dbmind) * Logical query transformations (Calcite) * Index optimization algorithms (for PostgreSQL) * Physical operator hints (for PostgreSQL) * Backup and Point-in-time Recovery (Pigsty) * **Continuously updated papers and experimental reports** This project is constantly evolving with new features. Don't forget to star ⭐ and watch 👀 to stay up to date.

github

: 480

PromptClip

PromptClip is a tool that allows developers to create video clips using LLM prompts. Users can upload videos from various sources, prompt the video in natural language, use different LLM models, instantly watch the generated clips, finetune the clips, and add music or image overlays. The tool provides a seamless way to extract specific moments from videos based on user queries, making video editing and content creation more efficient and intuitive.

github

: 100

dom-to-semantic-markdown

DOM to Semantic Markdown is a tool that converts HTML DOM to Semantic Markdown for use in Large Language Models (LLMs). It maximizes semantic information, token efficiency, and preserves metadata to enhance LLMs' processing capabilities. The tool captures rich web content structure, including semantic tags, image metadata, table structures, and link destinations. It offers customizable conversion options and supports both browser and Node.js environments.

github

: 708

rig

Rig is a Rust library designed for building scalable, modular, and user-friendly applications powered by large language models (LLMs). It provides full support for LLM completion and embedding workflows, offers simple yet powerful abstractions for LLM providers like OpenAI and Cohere, as well as vector stores such as MongoDB and in-memory storage. With Rig, users can easily integrate LLMs into their applications with minimal boilerplate code.

github

: 3.4k

TempCompass

TempCompass is a benchmark designed to evaluate the temporal perception ability of Video LLMs. It encompasses a diverse set of temporal aspects and task formats to comprehensively assess the capability of Video LLMs in understanding videos. The benchmark includes conflicting videos to prevent models from relying on single-frame bias and language priors. Users can clone the repository, install required packages, prepare data, run inference using examples like Video-LLaVA and Gemini, and evaluate the performance of their models across different tasks such as Multi-Choice QA, Yes/No QA, Caption Matching, and Caption Generation.

github

: 71

Kord-Ai

Kord-Ai is a WhatsApp bot designed to automate interactions on WhatsApp by executing predefined commands or responding to user inputs. It can handle tasks like sending messages, sharing media, and managing group activities, providing convenience and efficiency for users and businesses. The bot offers features for deployment on various platforms, including Heroku, Replit, Koyeb, Glitch, Codespace, Render, Railway, VPS, and PC. Users can deploy the bot by obtaining a session ID, forking the repository, setting configurations in the Config.js file, and starting/stopping the bot using npm commands. It is important to note that Kord-Ai is a bot created by M3264, not affiliated with WhatsApp, and users should be cautious in its usage.

github

: 70

For similar tasks

Open-DocLLM

Open-DocLLM is an open-source project that addresses data extraction and processing challenges using OCR and LLM technologies. It consists of two main layers: OCR for reading document content and LLM for extracting specific content in a structured manner. The project offers a larger context window size compared to JP Morgan's DocLLM and integrates tools like Tesseract OCR and Mistral for efficient data analysis. Users can run the models on-premises using LLM studio or Ollama, and the project includes a FastAPI app for testing purposes.

github

: 124

Awesome-AI

Awesome AI is a repository that collects and shares resources in the fields of large language models (LLM), AI-assisted programming, AI drawing, and more. It explores the application and development of generative artificial intelligence. The repository provides information on various AI tools, models, and platforms, along with tutorials and web products related to AI technologies.

github

: 157

Qmedia

github

: 537

aws-ai-intelligent-document-processing

This repository is part of Intelligent Document Processing with AWS AI Services workshop. It aims to automate the extraction of information from complex content in various document formats such as insurance claims, mortgages, healthcare claims, contracts, and legal contracts using AWS Machine Learning services like Amazon Textract and Amazon Comprehend. The repository provides hands-on labs to familiarize users with these AI services and build solutions to automate business processes that rely on manual inputs and intervention across different file types and formats.

github

: 124

Scrapegraph-LabLabAI-Hackathon

ScrapeGraphAI is a web scraping Python library that utilizes LangChain, LLM, and direct graph logic to create scraping pipelines. Users can specify the information they want to extract, and the library will handle the extraction process. The tool is designed to simplify web scraping tasks by providing a streamlined and efficient approach to data extraction.

github

: 75

parsera

Parsera is a lightweight Python library designed for scraping websites using LLMs. It offers simplicity and efficiency by minimizing token usage, enhancing speed, and reducing costs. Users can easily set up and run the tool to extract specific elements from web pages, generating JSON output with relevant data. Additionally, Parsera supports integration with various chat models, such as Azure, expanding its functionality and customization options for web scraping tasks.

github

: 1.1k

Scrapegraph-demo

ScrapeGraphAI is a web scraping Python library that utilizes LangChain, LLM, and direct graph logic to create scraping pipelines. Users can specify the information they want to extract, and the library will handle the extraction process. This repository contains an official demo/trial for the ScrapeGraphAI library, showcasing its capabilities in web scraping tasks. The tool is designed to simplify the process of extracting data from websites by providing a user-friendly interface and powerful scraping functionalities.

github

: 76

you2txt

You2Txt is a tool developed for the Vercel + Nvidia 2-hour hackathon that converts any YouTube video into a transcribed .txt file. The project won first place in the hackathon and is hosted at you2txt.com. Due to rate limiting issues with YouTube requests, it is recommended to run the tool locally. The project was created using Next.js, Tailwind, v0, and Claude, and can be built and accessed locally for development purposes.

github

: 71

For similar jobs

luna-ai

Luna AI is a virtual streamer driven by a 'brain' composed of ChatterBot, GPT, Claude, langchain, chatglm, text-generation-webui, 讯飞星火, 智谱AI. It can interact with viewers in real-time during live streams on platforms like Bilibili, Douyin, Kuaishou, Douyu, or chat with you locally. Luna AI uses natural language processing and text-to-speech technologies like Edge-TTS, VITS-Fast, elevenlabs, bark-gui, VALL-E-X to generate responses to viewer questions and can change voice using so-vits-svc, DDSP-SVC. It can also collaborate with Stable Diffusion for drawing displays and loop custom texts. This project is completely free, and any identical copycat selling programs are pirated, please stop them promptly.

github

: 1.2k

Dough

Dough is a tool for crafting videos with AI, allowing users to guide video generations with precision using images and example videos. Users can create guidance frames, assemble shots, and animate them by defining parameters and selecting guidance videos. The tool aims to help users make beautiful and unique video creations, providing control over the generation process. Setup instructions are available for Linux and Windows platforms, with detailed steps for installation and running the app.

github

: 395

Qmedia

github

: 537

ai-shifu

AI-Shifu is an AI-led chat flow tool powered by LLM that provides an interactive and immersive experience for users. It allows users to follow a preset chat flow while being able to ask questions and affect the conversation. The tool can make personalized outputs based on user identity, interests, and preferences, making users feel like they are receiving one-on-one service. It is suitable for education, storytelling, product guides, surveys, and game NPC scenarios.

github

: 133

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

daily-poetry-image

Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.

github

: 492

exif-photo-blog

EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.

github

: 992

SillyTavern

SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.

github

: 13.2k

Qmedia

README:

QMedia

AI content search engine designed specifically for content creators.

Key Features

👋🏻 Introduction

💫 Feature Overview

Content Cards

Multimodal Content RAG

Pure Local Multimodal Models

Future Plans

🤖 Installation

File Structure Introduction

mm_server Installation

mmrag_server Installation

qmedia_web Installation

⭐️ Usage

Combined Usage

Custom Data

Independent Model Service

Pure Python RAG Service/Model Service

Star History

License

Acknowledgments

For Tasks:

For Jobs:

Alternative AI tools for Qmedia

Similar Open Source Tools

Qmedia

llama-assistant

NextChat

llama-assistant

educhain

amica

ChatGPT-Next-Web

LongLLaVA

fiftyone

obsei

DB-GPT

PromptClip

dom-to-semantic-markdown

rig

TempCompass

Kord-Ai

For similar tasks

Open-DocLLM

Awesome-AI

Qmedia

aws-ai-intelligent-document-processing

Scrapegraph-LabLabAI-Hackathon

parsera

Scrapegraph-demo

you2txt

For similar jobs

luna-ai

Dough

Qmedia

ai-shifu

LLMStack

daily-poetry-image

exif-photo-blog

SillyTavern