data:image/s3,"s3://crabby-images/74c83/74c83df2ebf176f02fdd6a78b77f5efae33d2d47" alt="markdrop"
markdrop
A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.
Stars: 52
data:image/s3,"s3://crabby-images/29ac5/29ac5cbac6946907ea5bba60b8bce2d8087fb590" alt="screenshot"
Markdrop is a Python package that facilitates the conversion of PDFs to markdown format while extracting images and tables. It also generates descriptive text descriptions for extracted tables and images using various LLM clients. The tool offers additional functionalities such as PDF URL support, AI-powered image and table descriptions, interactive HTML output with downloadable Excel tables, customizable image resolution and UI elements, and a comprehensive logging system. Markdrop aims to simplify the process of handling PDF documents and enhancing their content with AI-generated descriptions.
README:
A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.
- [x] PDF to Markdown conversion with formatting preservation using Docling
- [x] Automatic image extraction with quality preservation using XRef Id
- [x] Table detection using Microsoft's Table Transformer
- [x] PDF URL support for core functionalities
- [x] AI-powered image and table descriptions using multiple LLM providers
- [x] Interactive HTML output with downloadable Excel tables
- [x] Customizable image resolution and UI elements
- [x] Comprehensive logging system
- [ ] Support for other files
- [ ] Streamlit/web interface
pip install markdrop
Python Package Index (PyPI) Page: https://pypi.org/project/markdrop
from markdrop import extract_images, make_markdown, extract_tables_from_pdf
source_pdf = 'url/or/path/to/pdf/file' # Replace with your local PDF file path or a URL
output_dir = 'data/output' # Replace with desired output directory's path
make_markdown(source_pdf, output_dir)
extract_images(source_pdf, output_dir)
extract_tables_from_pdf(source_pdf, output_dir=output_dir)
from markdrop import markdrop, MarkDropConfig, add_downloadable_tables
from pathlib import Path
import logging
# Configure processing options
config = MarkDropConfig(
image_resolution_scale=2.0, # Scale factor for image resolution
download_button_color='#444444', # Color for download buttons in HTML
log_level=logging.INFO, # Logging detail level
log_dir='logs', # Directory for log files
excel_dir='markdropped-excel-tables' # Directory for Excel table exports
)
# Process PDF document
input_doc_path = "path/to/input.pdf"
output_dir = Path('output_directory')
# Convert PDF and generate HTML with images and tables
html_path = markdrop(input_doc_path, output_dir, config)
# Add interactive table download functionality
downloadable_html = add_downloadable_tables(html_path, config)
from markdrop import setup_keys, process_markdown, ProcessorConfig, AIProvider, logger
from pathlib import Path
# Set up API keys for AI providers
setup_apikeys(key='gemini') # or setup_keys(key='openai')
# Configure AI processing options
config = ProcessorConfig(
input_path="path/to/markdown/file.md", # Input markdown file path
output_dir=Path("output_directory"), # Output directory
ai_provider=AIProvider.GEMINI, # AI provider (GEMINI or OPENAI)
remove_images=False, # Keep or remove original images
remove_tables=False, # Keep or remove original tables
table_descriptions=True, # Generate table descriptions
image_descriptions=True, # Generate image descriptions
max_retries=3, # Number of API call retries
retry_delay=2, # Delay between retries in seconds
gemini_model_name="gemini-1.5-flash", # Gemini model for images
gemini_text_model_name="gemini-pro", # Gemini model for text
image_prompt=DEFAULT_IMAGE_PROMPT, # Custom prompt for image analysis
table_prompt=DEFAULT_TABLE_PROMPT # Custom prompt for table analysis
)
# Process markdown with AI descriptions
output_path = process_markdown(config)
from markdrop import generate_descriptions
prompt = "Give textual highly detailed descriptions from this image ONLY, nothing else."
input_path = 'path/to/img_file/or/dir'
output_dir = 'data/output'
llm_clients = ['gemini', 'llama-vision'] # Available: ['qwen', 'gemini', 'openai', 'llama-vision', 'molmo', 'pixtral']
generate_descriptions(
input_path=input_path,
output_dir=output_dir,
prompt=prompt,
llm_client=llm_clients
)
Converts PDF to markdown and HTML with enhanced features.
Parameters:
-
input_doc_path
(str): Path to input PDF file -
output_dir
(str): Output directory path -
config
(MarkDropConfig, optional): Configuration options for processing
Adds interactive table download functionality to HTML output.
Parameters:
-
html_path
(Path): Path to HTML file -
config
(MarkDropConfig, optional): Configuration options
Configuration for PDF processing:
-
image_resolution_scale
(float): Scale factor for image resolution (default: 2.0) -
download_button_color
(str): HTML color code for download buttons (default: '#444444') -
log_level
(int): Logging level (default: logging.INFO) -
log_dir
(str): Directory for log files (default: 'logs') -
excel_dir
(str): Directory for Excel table exports (default: 'markdropped-excel-tables')
Configuration for AI processing:
-
input_path
(str): Path to markdown file -
output_dir
(str): Output directory path -
ai_provider
(AIProvider): AI provider selection (GEMINI or OPENAI) -
remove_images
(bool): Whether to remove original images -
remove_tables
(bool): Whether to remove original tables -
table_descriptions
(bool): Generate table descriptions -
image_descriptions
(bool): Generate image descriptions -
max_retries
(int): Maximum API call retries -
retry_delay
(int): Delay between retries in seconds -
gemini_model_name
(str): Gemini model for image processing -
gemini_text_model_name
(str): Gemini model for text processing -
image_prompt
(str): Custom prompt for image analysis -
table_prompt
(str): Custom prompt for table analysis
Legacy function for basic PDF to markdown conversion.
Parameters:
-
source
(str): Path to input PDF or URL -
output_dir
(str): Output directory path -
verbose
(bool): Enable detailed logging
Legacy function for basic image extraction.
Parameters:
-
source
(str): Path to input PDF or URL -
output_dir
(str): Output directory path -
verbose
(bool): Enable detailed logging
Legacy function for basic table extraction.
Parameters:
-
pdf_path
(str): Path to input PDF or URL -
start_page
(int, optional): Starting page number -
end_page
(int, optional): Ending page number -
threshold
(float, optional): Detection confidence threshold -
output_dir
(str): Output directory path
Check an example in run.py
We welcome contributions! Please see our Contributing Guidelines for details.
- Clone the repository:
git clone https://github.com/shoryasethia/markdrop.git
cd markdrop
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install development dependencies:
pip install -r requirements.txt
markdrop/
├── LICENSE
├── README.md
├── CONTRIBUTING.md
├── CHANGELOG.md
├── requirements.txt
├── setup.py
└── markdrop/
├── __init__.py
├── src
| └── markdrop-logo.png
├── main.py
├── process.py
├── api_setup.py
├── parse.py
├── utils.py
├── helper.py
├── ignore_warnings.py
├── run.py
└── models/
├── __init__.py
├── .env
├── img_descriptions.py
├── logger.py
├── model_loader.py
├── responder.py
└── setup_keys.py
This project is licensed under the MIT License - see the LICENSE file for details.
See CHANGELOG.md for version history.
Please note that this project follows our Code of Conduct.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for markdrop
Similar Open Source Tools
data:image/s3,"s3://crabby-images/29ac5/29ac5cbac6946907ea5bba60b8bce2d8087fb590" alt="markdrop Screenshot"
markdrop
Markdrop is a Python package that facilitates the conversion of PDFs to markdown format while extracting images and tables. It also generates descriptive text descriptions for extracted tables and images using various LLM clients. The tool offers additional functionalities such as PDF URL support, AI-powered image and table descriptions, interactive HTML output with downloadable Excel tables, customizable image resolution and UI elements, and a comprehensive logging system. Markdrop aims to simplify the process of handling PDF documents and enhancing their content with AI-generated descriptions.
data:image/s3,"s3://crabby-images/6be67/6be67c2a8c94c03e7032ecdbeb2ce151240cb995" alt="llama_ros Screenshot"
llama_ros
This repository provides a set of ROS 2 packages to integrate llama.cpp into ROS 2. By using the llama_ros packages, you can easily incorporate the powerful optimization capabilities of llama.cpp into your ROS 2 projects by running GGUF-based LLMs and VLMs.
data:image/s3,"s3://crabby-images/ccd7d/ccd7dd8d369613e5e50c0957740841f7a77a0569" alt="educhain Screenshot"
educhain
Educhain is a powerful Python package that leverages Generative AI to create engaging and personalized educational content. It enables users to generate multiple-choice questions, create lesson plans, and support various LLM models. Users can export questions to JSON, PDF, and CSV formats, customize prompt templates, and generate questions from text, PDF, URL files, youtube videos, and images. Educhain outperforms traditional methods in content generation speed and quality. It offers advanced configuration options and has a roadmap for future enhancements, including integration with popular Learning Management Systems and a mobile app for content generation on-the-go.
data:image/s3,"s3://crabby-images/7ac11/7ac118460727861ed1f4f754e32cd83e845530c1" alt="langcheck Screenshot"
langcheck
LangCheck is a Python library that provides a suite of metrics and tools for evaluating the quality of text generated by large language models (LLMs). It includes metrics for evaluating text fluency, sentiment, toxicity, factual consistency, and more. LangCheck also provides tools for visualizing metrics, augmenting data, and writing unit tests for LLM applications. With LangCheck, you can quickly and easily assess the quality of LLM-generated text and identify areas for improvement.
data:image/s3,"s3://crabby-images/91b6b/91b6bb4a625bdfb94df684a38dedd26dee852067" alt="agentops Screenshot"
agentops
AgentOps is a toolkit for evaluating and developing robust and reliable AI agents. It provides benchmarks, observability, and replay analytics to help developers build better agents. AgentOps is open beta and can be signed up for here. Key features of AgentOps include: - Session replays in 3 lines of code: Initialize the AgentOps client and automatically get analytics on every LLM call. - Time travel debugging: (coming soon!) - Agent Arena: (coming soon!) - Callback handlers: AgentOps works seamlessly with applications built using Langchain and LlamaIndex.
data:image/s3,"s3://crabby-images/f2ac1/f2ac1c3bf5daebdad0fadbaa255d6b32ad71d21b" alt="obsei Screenshot"
obsei
Obsei is an open-source, low-code, AI powered automation tool that consists of an Observer to collect unstructured data from various sources, an Analyzer to analyze the collected data with various AI tasks, and an Informer to send analyzed data to various destinations. The tool is suitable for scheduled jobs or serverless applications as all Observers can store their state in databases. Obsei is still in alpha stage, so caution is advised when using it in production. The tool can be used for social listening, alerting/notification, automatic customer issue creation, extraction of deeper insights from feedbacks, market research, dataset creation for various AI tasks, and more based on creativity.
data:image/s3,"s3://crabby-images/6f4d9/6f4d9ed2408871b58c03558911377908ccdbcfdd" alt="LongLLaVA Screenshot"
LongLLaVA
LongLLaVA is a tool for scaling multi-modal LLMs to 1000 images efficiently via hybrid architecture. It includes stages for single-image alignment, instruction-tuning, and multi-image instruction-tuning, with evaluation through a command line interface and model inference. The tool aims to achieve GPT-4V level capabilities and beyond, providing reproducibility of results and benchmarks for efficiency and performance.
data:image/s3,"s3://crabby-images/0dbe4/0dbe48254b7a96ef410713e59347e107492d2328" alt="ChatGPT-Next-Web Screenshot"
ChatGPT-Next-Web
ChatGPT Next Web is a well-designed cross-platform ChatGPT web UI tool that supports Claude, GPT4, and Gemini Pro models. It allows users to deploy their private ChatGPT applications with ease. The tool offers features like one-click deployment, compact client for Linux/Windows/MacOS, compatibility with self-deployed LLMs, privacy-first approach with local data storage, markdown support, responsive design, fast loading speed, prompt templates, awesome prompts, chat history compression, multilingual support, and more.
data:image/s3,"s3://crabby-images/a0a84/a0a8454a661076d32938293245c60170bb8906b8" alt="evalplus Screenshot"
evalplus
EvalPlus is a rigorous evaluation framework for LLM4Code, providing HumanEval+ and MBPP+ tests to evaluate large language models on code generation tasks. It offers precise evaluation and ranking, coding rigorousness analysis, and pre-generated code samples. Users can use EvalPlus to generate code solutions, post-process code, and evaluate code quality. The tool includes tools for code generation and test input generation using various backends.
data:image/s3,"s3://crabby-images/f1861/f186199cec8b2d26e6c6e37ce6112036d8971273" alt="e2m Screenshot"
e2m
E2M is a Python library that can parse and convert various file types into Markdown format. It supports the conversion of multiple file formats, including doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, and m4a. The ultimate goal of the E2M project is to provide high-quality data for Retrieval-Augmented Generation (RAG) and model training or fine-tuning. The core architecture consists of a Parser responsible for parsing various file types into text or image data, and a Converter responsible for converting text or image data into Markdown format.
data:image/s3,"s3://crabby-images/9ddf1/9ddf1d59f6fefc4e7952bd56e9514f8163da35cc" alt="ollama4j Screenshot"
ollama4j
Ollama4j is a Java library that serves as a wrapper or binding for the Ollama server. It facilitates communication with the Ollama server and provides models for deployment. The tool requires Java 11 or higher and can be installed locally or via Docker. Users can integrate Ollama4j into Maven projects by adding the specified dependency. The tool offers API specifications and supports various development tasks such as building, running unit tests, and integration tests. Releases are automated through GitHub Actions CI workflow. Areas of improvement include adhering to Java naming conventions, updating deprecated code, implementing logging, using lombok, and enhancing request body creation. Contributions to the project are encouraged, whether reporting bugs, suggesting enhancements, or contributing code.
data:image/s3,"s3://crabby-images/05b32/05b32eba002f18152cb3d127d194e4d1d1b078a0" alt="openlrc Screenshot"
openlrc
Open-Lyrics is a Python library that transcribes voice files using faster-whisper and translates/polishes the resulting text into `.lrc` files in the desired language using LLM, e.g. OpenAI-GPT, Anthropic-Claude. It offers well preprocessed audio to reduce hallucination and context-aware translation to improve translation quality. Users can install the library from PyPI or GitHub and follow the installation steps to set up the environment. The tool supports GUI usage and provides Python code examples for transcription and translation tasks. It also includes features like utilizing context and glossary for translation enhancement, pricing information for different models, and a list of todo tasks for future improvements.
data:image/s3,"s3://crabby-images/505bd/505bddc9550563425b92c2e977944d446c22a5fb" alt="mediapipe-rs Screenshot"
mediapipe-rs
MediaPipe-rs is a Rust library designed for MediaPipe tasks on WasmEdge WASI-NN. It offers easy-to-use low-code APIs similar to mediapipe-python, with low overhead and flexibility for custom media input. The library supports various tasks like object detection, image classification, gesture recognition, and more, including TfLite models, TF Hub models, and custom models. Users can create task instances, run sessions for pre-processing, inference, and post-processing, and speed up processing by reusing sessions. The library also provides support for audio tasks using audio data from symphonia, ffmpeg, or raw audio. Users can choose between CPU, GPU, or TPU devices for processing.
data:image/s3,"s3://crabby-images/960f0/960f0d7d5923f43bb4c74fef02a39d19c6b9b32b" alt="yomitoku Screenshot"
yomitoku
YomiToku is a Japanese-focused AI document image analysis engine that provides full-text OCR and layout analysis capabilities for images. It recognizes, extracts, and converts text information and figures in images. It includes 4 AI models trained on Japanese datasets for tasks such as detecting text positions, recognizing text strings, analyzing layouts, and recognizing table structures. The models are specialized for Japanese document images, supporting recognition of over 7000 Japanese characters and analyzing layout structures specific to Japanese documents. It offers features like layout analysis, table structure analysis, and reading order estimation to extract information from document images without disrupting their semantic structure. YomiToku supports various output formats such as HTML, markdown, JSON, and CSV, and can also extract figures, tables, and images from documents. It operates efficiently in GPU environments, enabling fast and effective analysis of document transcriptions without requiring high-end GPUs.
data:image/s3,"s3://crabby-images/e9aaa/e9aaa79a34556b00ce977e95cd72db342cac8b71" alt="acte Screenshot"
acte
Acte is a framework designed to build GUI-like tools for AI Agents. It aims to address the issues of cognitive load and freedom degrees when interacting with multiple APIs in complex scenarios. By providing a graphical user interface (GUI) for Agents, Acte helps reduce cognitive load and constraints interaction, similar to how humans interact with computers through GUIs. The tool offers APIs for starting new sessions, executing actions, and displaying screens, accessible via HTTP requests or the SessionManager class.
data:image/s3,"s3://crabby-images/9217a/9217a73d7e1fb1a1c5ef78bb72ce03d0f42549ed" alt="airdrop-tools Screenshot"
airdrop-tools
Airdrop-tools is a repository containing tools for all Telegram bots. Users can join the Telegram group for support and access various bot apps like Moonbix, Blum, Major, Memefi, and more. The setup requires Node.js and Python, with instructions on creating data directories and installing extensions. Users can run different tools like Blum, Major, Moonbix, Yescoin, Matchain, Fintopio, Agent301, IAMDOG, Banana, Cats, Wonton, and Xkucoin by following specific commands. The repository also provides contact information and options for supporting the creator.
For similar tasks
data:image/s3,"s3://crabby-images/8b77b/8b77bf13279f69e67fae132f1793bb5455579d13" alt="Open-DocLLM Screenshot"
Open-DocLLM
Open-DocLLM is an open-source project that addresses data extraction and processing challenges using OCR and LLM technologies. It consists of two main layers: OCR for reading document content and LLM for extracting specific content in a structured manner. The project offers a larger context window size compared to JP Morgan's DocLLM and integrates tools like Tesseract OCR and Mistral for efficient data analysis. Users can run the models on-premises using LLM studio or Ollama, and the project includes a FastAPI app for testing purposes.
data:image/s3,"s3://crabby-images/32626/32626084a24eadcfca9a9969787e4c43d7867e65" alt="Awesome-AI Screenshot"
Awesome-AI
Awesome AI is a repository that collects and shares resources in the fields of large language models (LLM), AI-assisted programming, AI drawing, and more. It explores the application and development of generative artificial intelligence. The repository provides information on various AI tools, models, and platforms, along with tutorials and web products related to AI technologies.
data:image/s3,"s3://crabby-images/3b143/3b14317366a87db4759f2a8b8e1da1878837773e" alt="Qmedia Screenshot"
Qmedia
QMedia is an open-source multimedia AI content search engine designed specifically for content creators. It provides rich information extraction methods for text, image, and short video content. The tool integrates unstructured text, image, and short video information to build a multimodal RAG content Q&A system. Users can efficiently search for image/text and short video materials, analyze content, provide content sources, and generate customized search results based on user interests and needs. QMedia supports local deployment for offline content search and Q&A for private data. The tool offers features like content cards display, multimodal content RAG search, and pure local multimodal models deployment. Users can deploy different types of models locally, manage language models, feature embedding models, image models, and video models. QMedia aims to spark new ideas for content creation and share AI content creation concepts in an open-source manner.
data:image/s3,"s3://crabby-images/2f987/2f9870273250b3417ecd7ac39db0311dbd03fc73" alt="aws-ai-intelligent-document-processing Screenshot"
aws-ai-intelligent-document-processing
This repository is part of Intelligent Document Processing with AWS AI Services workshop. It aims to automate the extraction of information from complex content in various document formats such as insurance claims, mortgages, healthcare claims, contracts, and legal contracts using AWS Machine Learning services like Amazon Textract and Amazon Comprehend. The repository provides hands-on labs to familiarize users with these AI services and build solutions to automate business processes that rely on manual inputs and intervention across different file types and formats.
data:image/s3,"s3://crabby-images/1000e/1000e49bb51992428e633179a501889d21a69977" alt="Scrapegraph-LabLabAI-Hackathon Screenshot"
Scrapegraph-LabLabAI-Hackathon
ScrapeGraphAI is a web scraping Python library that utilizes LangChain, LLM, and direct graph logic to create scraping pipelines. Users can specify the information they want to extract, and the library will handle the extraction process. The tool is designed to simplify web scraping tasks by providing a streamlined and efficient approach to data extraction.
data:image/s3,"s3://crabby-images/6425a/6425a1078f6f87ddaf4f760e7982338ab85f52f6" alt="parsera Screenshot"
parsera
Parsera is a lightweight Python library designed for scraping websites using LLMs. It offers simplicity and efficiency by minimizing token usage, enhancing speed, and reducing costs. Users can easily set up and run the tool to extract specific elements from web pages, generating JSON output with relevant data. Additionally, Parsera supports integration with various chat models, such as Azure, expanding its functionality and customization options for web scraping tasks.
data:image/s3,"s3://crabby-images/24855/248557bd6754bb8a3d65bee694c22750fc9dc701" alt="Scrapegraph-demo Screenshot"
Scrapegraph-demo
ScrapeGraphAI is a web scraping Python library that utilizes LangChain, LLM, and direct graph logic to create scraping pipelines. Users can specify the information they want to extract, and the library will handle the extraction process. This repository contains an official demo/trial for the ScrapeGraphAI library, showcasing its capabilities in web scraping tasks. The tool is designed to simplify the process of extracting data from websites by providing a user-friendly interface and powerful scraping functionalities.
data:image/s3,"s3://crabby-images/0f0c8/0f0c899454407d2df1302b5a17542e44d10fd940" alt="you2txt Screenshot"
you2txt
You2Txt is a tool developed for the Vercel + Nvidia 2-hour hackathon that converts any YouTube video into a transcribed .txt file. The project won first place in the hackathon and is hosted at you2txt.com. Due to rate limiting issues with YouTube requests, it is recommended to run the tool locally. The project was created using Next.js, Tailwind, v0, and Claude, and can be built and accessed locally for development purposes.
For similar jobs
data:image/s3,"s3://crabby-images/10ae7/10ae70fb544e4cb1ced622d6de4a6da32e2f9150" alt="LLMStack Screenshot"
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
data:image/s3,"s3://crabby-images/51668/516682b35ab381904577f0aaa613bf1c4e25cafb" alt="daily-poetry-image Screenshot"
daily-poetry-image
Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.
data:image/s3,"s3://crabby-images/be78a/be78a4c8c8ebd2faa26188d9d21ce5207d99b1e2" alt="exif-photo-blog Screenshot"
exif-photo-blog
EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.
data:image/s3,"s3://crabby-images/f9868/f986872fd9e32bdbb47b9f9f47dd2f410de377ec" alt="SillyTavern Screenshot"
SillyTavern
SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.
data:image/s3,"s3://crabby-images/f0bf2/f0bf256f0d6a7af59df087017ac3944985a8c8e7" alt="Twitter-Insight-LLM Screenshot"
Twitter-Insight-LLM
This project enables you to fetch liked tweets from Twitter (using Selenium), save it to JSON and Excel files, and perform initial data analysis and image captions. This is part of the initial steps for a larger personal project involving Large Language Models (LLMs).
data:image/s3,"s3://crabby-images/dc0ea/dc0ea5ac43500c8c7b552679e672554482af0904" alt="AISuperDomain Screenshot"
AISuperDomain
Aila Desktop Application is a powerful tool that integrates multiple leading AI models into a single desktop application. It allows users to interact with various AI models simultaneously, providing diverse responses and insights to their inquiries. With its user-friendly interface and customizable features, Aila empowers users to engage with AI seamlessly and efficiently. Whether you're a researcher, student, or professional, Aila can enhance your AI interactions and streamline your workflow.
data:image/s3,"s3://crabby-images/bbd09/bbd0937a8a8a5123afd369fbc24d343f1dc5a4b4" alt="ChatGPT-On-CS Screenshot"
ChatGPT-On-CS
This project is an intelligent dialogue customer service tool based on a large model, which supports access to platforms such as WeChat, Qianniu, Bilibili, Douyin Enterprise, Douyin, Doudian, Weibo chat, Xiaohongshu professional account operation, Xiaohongshu, Zhihu, etc. You can choose GPT3.5/GPT4.0/ Lazy Treasure Box (more platforms will be supported in the future), which can process text, voice and pictures, and access external resources such as operating systems and the Internet through plug-ins, and support enterprise AI applications customized based on their own knowledge base.
data:image/s3,"s3://crabby-images/d1dda/d1ddada65d1b8783d02830f44a5023f7e56a6e88" alt="obs-localvocal Screenshot"
obs-localvocal
LocalVocal is a live-streaming AI assistant plugin for OBS that allows you to transcribe audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). It's privacy-first, with all data staying on your machine, and requires no GPU, cloud costs, network, or downtime.