
ddddocr
ddddocr rust 版本,ocr_api_server rust 版本,二进制版本,验证码识别,不依赖 opencv 库,跨平台运行,a simple OCR API server, very easy to deploy。
Stars: 80

ddddocr is a Rust version of a simple OCR API server that provides easy deployment for captcha recognition without relying on the OpenCV library. It offers a user-friendly general-purpose captcha recognition Rust library. The tool supports recognizing various types of captchas, including single-line text, transparent black PNG images, target detection, and slider matching algorithms. Users can also import custom OCR training models and utilize the OCR API server for flexible OCR result control and range limitation. The tool is cross-platform and can be easily deployed.
README:
ddddocr rust 版本。
ocr_api_server rust 版本。
二进制版本,验证码识别,不依赖 opencv 库,跨平台运行。
a simple OCR API server, very easy to deploy。
一个容易使用的通用验证码识别 rust 库
·
报告Bug
·
提出新特性
系统 | CPU | GPU | 备注 |
---|---|---|---|
Windows 64位 | √ | ? | 部分版本 Windows 需要安装 vc 运行库 |
Windows 32位 | √ | ? | 不支持静态链接,部分版本 Windows 需要安装 vc 运行库 |
Linux 64 / ARM64 | √ | ? | 可能需要升级 glibc 版本, 升级 glibc 版本 |
Linux 32 | × | ? | |
Macos X64 | √ | ? | M1/M2/M3 ... 芯片参考 #67 |
lib.rs
实现了 ddddocr
。
main.rs
实现了 ocr_api_server
。
model
目录是模型与字符集。
依赖本库 ddddocr = {git = "https://github.com/86maid/ddddocr.git", branch = "master"}
开启 cuda
特性 ddddocr = { git = "https://github.com/86maid/ddddocr.git", branch = "master", features = ["cuda"] }
支持静态和动态链接,默认使用静态链接,构建时将会自动下载链接库,请设置好代理,cuda
特性不支持静态链接(会自己下载动态链接库)。
如有更多问题,请跳转至疑难杂症部分。
如果你不想从源代码构建,这里有编译好的二进制版本。
旧版本。
主要用于识别单行文字,即文字部分占据图片的主体部分,例如常见的英数验证码等,本项目可以对中文、英文(随机大小写or通过设置结果范围圈定大小写)、数字以及部分特殊字符。
let image = std::fs::read("target.png").unwrap();
let mut ocr = ddddocr::ddddocr_classification().unwrap();
let res = ocr.classification(image, false).unwrap();
println!("{:?}", res);
let image = std::fs::read("target.png").unwrap();
let mut ocr = ddddocr::ddddocr_classification_old().unwrap();
let res = ocr.classification(image, false).unwrap();
println!("{:?}", res);
classification(image, true);
let image = std::fs::read("target.png").unwrap();
let mut det = ddddocr::ddddocr_detection().unwrap();
let res = det.detection(image).unwrap();
println!("{:?}", res);
以上只是目前我能找到的点选验证码图片,做了一个简单的测试。
算法非深度神经网络实现。
小滑块为单独的png图片,背景是透明图,如下图:
然后背景为带小滑块坑位的,如下图:
let target_bytes = std::fs::read("target.png").unwrap();
let background_bytes = std::fs::read("background.png").unwrap();
let res = ddddocr::slide_match(target_bytes, background_bytes).unwrap();
println!("{:?}", res);
如果小图无过多背景部分,则可以使用 simple_slide_match,通常为 jpg 或者 bmp 格式的图片
let target_bytes = std::fs::read("target.png").unwrap();
let background_bytes = std::fs::read("background.png").unwrap();
let res = ddddocr::simple_slide_match(target_bytes, background_bytes).unwrap();
println!("{:?}", res);
一张图为带坑位的原图,如下图:
一张图为原图,如下图:
let target_bytes = std::fs::read("target.png").unwrap();
let background_bytes = std::fs::read("background.png").unwrap();
let res = ddddocr::slide_comparison(target_bytes, background_bytes).unwrap();
println!("{:?}", res);
为了提供更灵活的 ocr 结果控制与范围限定,项目支持对ocr结果进行范围限定。
可以通过在调用 classification_probability
返回全字符表的概率。
当然也可以通过 set_ranges
设置输出字符范围来限定返回的结果。
参数值 | 意义 |
---|---|
0 | 纯整数 0-9 |
1 | 纯小写字母 a-z |
2 | 纯大写字母 A-Z |
3 | 小写字母 a-z + 大写字母 A-Z |
4 | 小写字母 a-z + 整数 0-9 |
5 | 大写字母 A-Z + 整数 0-9 |
6 | 小写字母 a-z + 大写字母A-Z + 整数0-9 |
7 | 默认字符库 - 小写字母a-z - 大写字母A-Z - 整数0-9 |
如果值为 string 类型,请传入一段不包含空格的文本,其中的每个字符均为一个待选词,例如:"0123456789+-x/="
let image = std::fs::read("image.png").unwrap();
let mut ocr = ddddocr::ddddocr_classification().unwrap();
// 数字 3 对应枚举 CharsetRange::LowercaseUppercase,不用写枚举
// ocr.set_ranges(3);
// 自定义字符集
ocr.set_ranges("0123456789+-x/=");
let result = ocr.classification_probability(image, false).unwrap();
// 哦呀,看来数据有点儿太多了,小心卡死哦!
println!("概率: {}", result.json());
println!("识别结果: {}", result.get_text());
支持导入 dddd_trainer 训练后的自定义模型。
use ddddocr::*;
let mut ocr = Ddddocr::with_model_charset(
"myproject_0.984375_139_13000_2022-02-26-15-34-13.onnx",
"charsets.json",
)
.unwrap();
let image_bytes = std::fs::read("888e28774f815b01e871d474e5c84ff2.jpg").unwrap();
let res = ocr.classification(&image_bytes).unwrap();
println!("{:?}", res);
Usage: ddddocr.exe [OPTIONS]
Options:
-a, --address <ADDRESS>
监听地址 [default: 127.0.0.1]
-p, --port <PORT>
监听端口 [default: 9898]
-f, --full
开启所有选项
--jsonp
开启跨域,需要一个 query 指定回调函数的名字,不能使用 file (multipart) 传递参数, 例如 http://127.0.0.1:9898/ocr/b64/text?callback=handle&image=xxx
--ocr
开启内容识别,支持新旧模型共存
--old
开启旧版模型内容识别,支持新旧模型共存
--det
开启目标检测
--ocr-probability <OCR_PROBABILITY>
开启内容概率识别,支持新旧模型共存,只能使用官方模型, 如果参数是 0 到 7,对应内置的字符集, 如果参数为空字符串,表示默认字符集, 除此之外的参数,表示自定义字符集,例如 "0123456789+-x/="
--old-probability <OLD_PROBABILITY>
开启旧版模型内容概率识别,支持新旧模型共存,只能使用官方模型, 如果参数是 0 到 7,对应内置的字符集, 如果参数为空字符串,表示默认字符集, 除此之外的参数,表示自定义字符集,例如 "0123456789+-x/="
--ocr-path <OCR_PATH>
内容识别模型以及字符集路径, 通过哈希值判断是否为自定义模型, 使用自定义模型会使 old 选项失效, 路径 model/common 对应模型 model/common.onnx 和字符集 model/common.json [default: model/common]
--det-path <DET_PATH>
目标检测模型路径 [default: model/common_det.onnx]
--slide-match
开启滑块识别
--simple-slide-match
开启简单滑块识别
--slide-compare
开启坑位识别
-h, --help
Print help
测试是否启动成功,可以通过直接 GET/POST
访问 http://{host}:{port}/ping
来测试,如果返回 pong
则启动成功。
http://{host}:{port}/{opt}/{img_type}/{ret_type}
opt:
ocr 内容识别
old 旧版模型内容识别
det 目标检测
ocr_probability 内容概率识别
old_probability 旧版模型内容概率识别
match 滑块匹配
simple_match 简单滑块匹配
compare 坑位匹配
img_type:
file 文件,即 multipart/form-data
b64 base64,即 {"a": encode(bytes), "b": encode(bytes)}
ret_type:
json json,成功 {"status": 200, "result": object},失败 {"status": 404, "msg": "失败原因"}
text 文本,失败返回空文本
import requests
import base64
host = "http://127.0.0.1:9898"
file = open('./image/3.png', 'rb').read()
# 测试 jsonp,只能使用 b64,不能使用 file
api_url = f"{host}/ocr/b64/text"
resp = requests.get(api_url, params = {
"callback": "handle",
"image": base64.b64encode(file).decode(),
})
print(f"jsonp, api_url={api_url}, resp.text={resp.text}")
# 测试 ocr
api_url = f"{host}/ocr/file/text"
resp = requests.post(api_url, files={'image': file})
print(f"api_url={api_url}, resp.text={resp.text}")
cuda
和 cuDNN
都需要安装好。
CUDA 12
构建需要 cuDNN 9.x
。
CUDA 11
构建需要 cuDNN 8.x
。
不确定 cuda 10
是否有效。
默认使用静态链接,构建时将会自动下载链接库,请设置好代理,cuda
特性不支持静态链接(会自己下载动态链接库)。
如果要指定静态链接库的路径,可以设置环境变量 ORT_LIB_LOCATION
,设置后将不会自动下载链接库。
例如,库路径为 onnxruntime\build\Windows\Release\Release\onnxruntime.lib
,则 ORT_LIB_LOCATION
设置为 onnxruntime\build\Windows\Release
。
默认开启 download-binaries
特性,自动下载链接库。
自动下载的链接库存放在 C:\Users\<用户名>\AppData\ort.pyke.io
。
开启动态链接特性 ddddocr = { git = "https://github.com/86maid/ddddocr.git", branch = "master", features = ["load-dynamic"] }
开启 load-dynamic
特性后,可以使用 Ddddocr::set_onnxruntime_path
指定 onnxruntime 动态链接库的路径。
开启 load-dynamic
特性后,构建时将不会自动下载 onnxruntime 链接库。
请手动下载 onnxruntime 链接库,并将其放置在程序运行目录下(或系统 API 目录),这样无需再次调用 Ddddocr::set_onnxruntime_path
。
windows 静态链接失败,请安装 vs2022。
linux x86-64 静态链接失败,请安装 gcc11 和 g++11,ubuntu ≥ 20.04。
linux arm64 静态链接失败,需要 glibc ≥ 2.35 (Ubuntu ≥ 22.04)。
macOS 静态链接失败,需要 macOS ≥ 10.15。
cuda 在执行 cargo test
的时候可能会 painc (exit code: 0xc000007b)
,这是因为自动生成的动态链接库是在 target/debug
目录下,需要手动复制到 target/debug/deps
目录下(cuda 目前不支持静态链接)。
动态链接需要 1.18.x 版本的 onnxruntime。
更多疑难杂症,请跳转至 ort.pyke.io。
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for ddddocr
Similar Open Source Tools

ddddocr
ddddocr is a Rust version of a simple OCR API server that provides easy deployment for captcha recognition without relying on the OpenCV library. It offers a user-friendly general-purpose captcha recognition Rust library. The tool supports recognizing various types of captchas, including single-line text, transparent black PNG images, target detection, and slider matching algorithms. Users can also import custom OCR training models and utilize the OCR API server for flexible OCR result control and range limitation. The tool is cross-platform and can be easily deployed.

bce-qianfan-sdk
The Qianfan SDK provides best practices for large model toolchains, allowing AI workflows and AI-native applications to access the Qianfan large model platform elegantly and conveniently. The core capabilities of the SDK include three parts: large model reasoning, large model training, and general and extension: * `Large model reasoning`: Implements interface encapsulation for reasoning of Yuyan (ERNIE-Bot) series, open source large models, etc., supporting dialogue, completion, Embedding, etc. * `Large model training`: Based on platform capabilities, it supports end-to-end large model training process, including training data, fine-tuning/pre-training, and model services. * `General and extension`: General capabilities include common AI development tools such as Prompt/Debug/Client. The extension capability is based on the characteristics of Qianfan to adapt to common middleware frameworks.

Streamer-Sales
Streamer-Sales is a large model for live streamers that can explain products based on their characteristics and inspire users to make purchases. It is designed to enhance sales efficiency and user experience, whether for online live sales or offline store promotions. The model can deeply understand product features and create tailored explanations in vivid and precise language, sparking user's desire to purchase. It aims to revolutionize the shopping experience by providing detailed and unique product descriptions to engage users effectively.

Y2A-Auto
Y2A-Auto is an automation tool that transfers YouTube videos to AcFun. It automates the entire process from downloading, translating subtitles, content moderation, intelligent tagging, to partition recommendation and upload. It also includes a web management interface and YouTube monitoring feature. The tool supports features such as downloading videos and covers using yt-dlp, AI translation and embedding of subtitles, AI generation of titles/descriptions/tags, content moderation using Aliyun Green, uploading to AcFun, task management, manual review, and forced upload. It also offers settings for automatic mode, concurrency, proxies, subtitles, login protection, brute force lock, YouTube monitoring, channel/trend capturing, scheduled tasks, history records, optional GPU/hardware acceleration, and Docker deployment or local execution.

TelegramForwarder
Telegram Forwarder is a message forwarding tool that allows you to forward messages from specified chats to other chats without the need for a bot to enter the corresponding channels/groups to listen. It can be used for information stream integration filtering, message reminders, content archiving, and more. The tool supports multiple sources forwarding, keyword filtering in whitelist and blacklist modes, regular expression matching, message content modification, AI processing using major vendors' AI interfaces, media file filtering, and synchronization with a universal forum blocking plugin to achieve three-end blocking.

solon-ai
Solon-AI is a Java AI & MCP application development framework that supports various AI development capabilities. It is designed to be versatile, efficient, and open for integration with frameworks like SpringBoot, jFinal, and Vert.x. The framework provides examples of embedding solon-ai(& mcp) and showcases interfaces for chat models, function calling, vision, RAG (EmbeddingModel, Repository, DocumentLoader, RerankingModel), Ai Flow, MCP server, MCP client, and MCP Proxy. Solon-AI is part of the Solon project ecosystem, which includes other repositories for different functionalities.

HuaTuoAI
HuaTuoAI is an artificial intelligence image classification system specifically designed for traditional Chinese medicine. It utilizes deep learning techniques, such as Convolutional Neural Networks (CNN), to accurately classify Chinese herbs and ingredients based on input images. The project aims to unlock the secrets of plants, depict the unknown realm of Chinese medicine using technology and intelligence, and perpetuate ancient cultural heritage.

meet-libai
The 'meet-libai' project aims to promote and popularize the cultural heritage of the Chinese poet Li Bai by constructing a knowledge graph of Li Bai and training a professional AI intelligent body using large models. The project includes features such as data preprocessing, knowledge graph construction, question-answering system development, and visualization exploration of the graph structure. It also provides code implementations for large models and RAG retrieval enhancement.

langchain4j-aideepin-web
The langchain4j-aideepin-web repository is the frontend project of langchain4j-aideepin, an open-source, offline deployable retrieval enhancement generation (RAG) project based on large language models such as ChatGPT and application frameworks such as Langchain4j. It includes features like registration & login, multi-sessions (multi-roles), image generation (text-to-image, image editing, image-to-image), suggestions, quota control, knowledge base (RAG) based on large models, model switching, and search engine switching.

LLM-TPU
LLM-TPU project aims to deploy various open-source generative AI models on the BM1684X chip, with a focus on LLM. Models are converted to bmodel using TPU-MLIR compiler and deployed to PCIe or SoC environments using C++ code. The project has deployed various open-source models such as Baichuan2-7B, ChatGLM3-6B, CodeFuse-7B, DeepSeek-6.7B, Falcon-40B, Phi-3-mini-4k, Qwen-7B, Qwen-14B, Qwen-72B, Qwen1.5-0.5B, Qwen1.5-1.8B, Llama2-7B, Llama2-13B, LWM-Text-Chat, Mistral-7B-Instruct, Stable Diffusion, Stable Diffusion XL, WizardCoder-15B, Yi-6B-chat, Yi-34B-chat. Detailed model deployment information can be found in the 'models' subdirectory of the project. For demonstrations, users can follow the 'Quick Start' section. For inquiries about the chip, users can contact SOPHGO via the official website.

LangChain-SearXNG
LangChain-SearXNG is an open-source AI search engine built on LangChain and SearXNG. It supports faster and more accurate search and question-answering functionalities. Users can deploy SearXNG and set up Python environment to run LangChain-SearXNG. The tool integrates AI models like OpenAI and ZhipuAI for search queries. It offers two search modes: Searxng and ZhipuWebSearch, allowing users to control the search workflow based on input parameters. LangChain-SearXNG v2 version enhances response speed and content quality compared to the previous version, providing a detailed configuration guide and showcasing the effectiveness of different search modes through comparisons.

focusany
FocusAny is a desktop toolbar system that supports one-click startup of market plugins and local plugins, quickly expands functionality, and improves work efficiency. It features customizable keyboard shortcuts, plugin management, command management, quick file launching, global shortcut launching, data center for file synchronization, support for dark mode, and various plugins available in the market. The tool is built using Electron, Vue3, and TypeScript.

kirara-ai
Kirara AI is a chatbot that supports mainstream large language models and chat platforms. It provides features such as image sending, keyword-triggered replies, multi-account support, personality settings, and support for various chat platforms like QQ, Telegram, Discord, and WeChat. The tool also supports HTTP server for Web API, popular large models like OpenAI and DeepSeek, plugin mechanism, conditional triggers, admin commands, drawing models, voice replies, multi-turn conversations, cross-platform message sending, custom workflows, web management interface, and built-in Frpc intranet penetration.

GalTransl
GalTransl is an automated translation tool for Galgames that combines minor innovations in several basic functions with deep utilization of GPT prompt engineering. It is used to create embedded translation patches. The core of GalTransl is a set of automated translation scripts that solve most known issues when using ChatGPT for Galgame translation and improve overall translation quality. It also integrates with other projects to streamline the patch creation process, reducing the learning curve to some extent. Interested users can more easily build machine-translated patches of a certain quality through this project and may try to efficiently build higher-quality localization patches based on this framework.

k8m
k8m is an AI-driven Mini Kubernetes AI Dashboard lightweight console tool designed to simplify cluster management. It is built on AMIS and uses 'kom' as the Kubernetes API client. k8m has built-in Qwen2.5-Coder-7B model interaction capabilities and supports integration with your own private large models. Its key features include miniaturized design for easy deployment, user-friendly interface for intuitive operation, efficient performance with backend in Golang and frontend based on Baidu AMIS, pod file management for browsing, editing, uploading, downloading, and deleting files, pod runtime management for real-time log viewing, log downloading, and executing shell commands within pods, CRD management for automatic discovery and management of CRD resources, and intelligent translation and diagnosis based on ChatGPT for YAML property translation, Describe information interpretation, AI log diagnosis, and command recommendations, providing intelligent support for managing k8s. It is cross-platform compatible with Linux, macOS, and Windows, supporting multiple architectures like x86 and ARM for seamless operation. k8m's design philosophy is 'AI-driven, lightweight and efficient, simplifying complexity,' helping developers and operators quickly get started and easily manage Kubernetes clusters.

Thor
Thor is a powerful AI model management tool designed for unified management and usage of various AI models. It offers features such as user, channel, and token management, data statistics preview, log viewing, system settings, external chat link integration, and Alipay account balance purchase. Thor supports multiple AI models including OpenAI, Kimi, Starfire, Claudia, Zhilu AI, Ollama, Tongyi Qianwen, AzureOpenAI, and Tencent Hybrid models. It also supports various databases like SqlServer, PostgreSql, Sqlite, and MySql, allowing users to choose the appropriate database based on their needs.
For similar tasks

ddddocr
ddddocr is a Rust version of a simple OCR API server that provides easy deployment for captcha recognition without relying on the OpenCV library. It offers a user-friendly general-purpose captcha recognition Rust library. The tool supports recognizing various types of captchas, including single-line text, transparent black PNG images, target detection, and slider matching algorithms. Users can also import custom OCR training models and utilize the OCR API server for flexible OCR result control and range limitation. The tool is cross-platform and can be easily deployed.

ToolNeuron
ToolNeuron is a secure, offline AI ecosystem for Android devices that allows users to run private AI models and dynamic plugins fully offline, with hardware-grade encryption ensuring maximum privacy. It enables users to have an offline-first experience, add capabilities without app updates through pluggable tools, and ensures security by design with strict plugin validation and sandboxing.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.