ddddocr

ddddocr rust 版本，ocr_api_server rust 版本，二进制版本，验证码识别，不依赖 opencv 库，跨平台运行，a simple OCR API server, very easy to deploy。

Stars: 80

Visit

ddddocr is a Rust version of a simple OCR API server that provides easy deployment for captcha recognition without relying on the OpenCV library. It offers a user-friendly general-purpose captcha recognition Rust library. The tool supports recognizing various types of captchas, including single-line text, transparent black PNG images, target detection, and slider matching algorithms. Users can also import custom OCR training models and utilize the OCR API server for flexible OCR result control and range limitation. The tool is cross-platform and can be easily deployed.

README:

简介

ddddocr rust 版本。

ocr_api_server rust 版本。

二进制版本，验证码识别，不依赖 opencv 库，跨平台运行。

a simple OCR API server, very easy to deploy。

一个容易使用的通用验证码识别 rust 库
· 报告Bug · 提出新特性

环境支持

系统	CPU	GPU	备注
Windows 64位	√	?	部分版本 Windows 需要安装 vc 运行库
Windows 32位	√	?	不支持静态链接，部分版本 Windows 需要安装 vc 运行库
Linux 64 / ARM64	√	?	可能需要升级 glibc 版本，升级 glibc 版本
Linux 32	×	?
Macos X64	√	?	M1/M2/M3 ... 芯片参考 #67

安装步骤

我们很高兴的宣布，从这个版本开始，我们不再需要依赖笨重的 DLL 啦！

lib.rs 实现了 ddddocr。

main.rs 实现了 ocr_api_server。

model 目录是模型与字符集。

依赖本库 ddddocr = {git = "https://github.com/86maid/ddddocr.git", branch = "master"}

开启 cuda 特性 ddddocr = { git = "https://github.com/86maid/ddddocr.git", branch = "master", features = ["cuda"] }

支持静态和动态链接，默认使用静态链接，构建时将会自动下载链接库，请设置好代理，cuda 特性不支持静态链接（会自己下载动态链接库）。

如有更多问题，请跳转至疑难杂症部分。

如果你不想从源代码构建，这里有编译好的二进制版本。

旧版本。

使用文档

OCR 识别

内容识别

主要用于识别单行文字，即文字部分占据图片的主体部分，例如常见的英数验证码等，本项目可以对中文、英文（随机大小写or通过设置结果范围圈定大小写）、数字以及部分特殊字符。

let image = std::fs::read("target.png").unwrap();
let mut ocr = ddddocr::ddddocr_classification().unwrap();
let res = ocr.classification(image, false).unwrap();
println!("{:?}", res);

旧模型

let image = std::fs::read("target.png").unwrap();
let mut ocr = ddddocr::ddddocr_classification_old().unwrap();
let res = ocr.classification(image, false).unwrap();
println!("{:?}", res);

支持识别透明黑色 png 格式的图片，使用 png_fix 参数

classification(image, true);

参考例图

目标检测

let image = std::fs::read("target.png").unwrap();
let mut det = ddddocr::ddddocr_detection().unwrap();
let res = det.detection(image).unwrap();
println!("{:?}", res);

参考例图

以上只是目前我能找到的点选验证码图片，做了一个简单的测试。

滑块匹配

算法非深度神经网络实现。

算法1

小滑块为单独的png图片，背景是透明图，如下图：

然后背景为带小滑块坑位的，如下图：

let target_bytes = std::fs::read("target.png").unwrap();
let background_bytes = std::fs::read("background.png").unwrap();
let res = ddddocr::slide_match(target_bytes, background_bytes).unwrap();
println!("{:?}", res);

如果小图无过多背景部分，则可以使用 simple_slide_match，通常为 jpg 或者 bmp 格式的图片

let target_bytes = std::fs::read("target.png").unwrap();
let background_bytes = std::fs::read("background.png").unwrap();
let res = ddddocr::simple_slide_match(target_bytes, background_bytes).unwrap();
println!("{:?}", res);

算法2

一张图为带坑位的原图，如下图：

一张图为原图，如下图：

let target_bytes = std::fs::read("target.png").unwrap();
let background_bytes = std::fs::read("background.png").unwrap();
let res = ddddocr::slide_comparison(target_bytes, background_bytes).unwrap();
println!("{:?}", res);

OCR 概率输出

为了提供更灵活的 ocr 结果控制与范围限定，项目支持对ocr结果进行范围限定。

可以通过在调用 classification_probability 返回全字符表的概率。

当然也可以通过 set_ranges 设置输出字符范围来限定返回的结果。

参数值	意义
0	纯整数 0-9
1	纯小写字母 a-z
2	纯大写字母 A-Z
3	小写字母 a-z + 大写字母 A-Z
4	小写字母 a-z + 整数 0-9
5	大写字母 A-Z + 整数 0-9
6	小写字母 a-z + 大写字母A-Z + 整数0-9
7	默认字符库 - 小写字母a-z - 大写字母A-Z - 整数0-9

如果值为 string 类型，请传入一段不包含空格的文本，其中的每个字符均为一个待选词，例如："0123456789+-x/="

let image = std::fs::read("image.png").unwrap();
let mut ocr = ddddocr::ddddocr_classification().unwrap();

// 数字 3 对应枚举 CharsetRange::LowercaseUppercase，不用写枚举
// ocr.set_ranges(3);

// 自定义字符集
ocr.set_ranges("0123456789+-x/=");

let result = ocr.classification_probability(image, false).unwrap();

// 哦呀，看来数据有点儿太多了，小心卡死哦！
println!("概率: {}", result.json());

println!("识别结果: {}", result.get_text());

自定义 OCR 训练模型导入

支持导入 dddd_trainer 训练后的自定义模型。

use ddddocr::*;

let mut ocr = Ddddocr::with_model_charset(
    "myproject_0.984375_139_13000_2022-02-26-15-34-13.onnx",
    "charsets.json",
)
.unwrap();
let image_bytes = std::fs::read("888e28774f815b01e871d474e5c84ff2.jpg").unwrap();
let res = ocr.classification(&image_bytes).unwrap();
println!("{:?}", res);

ocr_api_server 例子

运行方式

Usage: ddddocr.exe [OPTIONS]

Options:
  -a, --address <ADDRESS>
          监听地址 [default: 127.0.0.1]
  -p, --port <PORT>
          监听端口 [default: 9898]
  -f, --full
          开启所有选项
      --jsonp
          开启跨域，需要一个 query 指定回调函数的名字，不能使用 file (multipart) 传递参数， 例如 http://127.0.0.1:9898/ocr/b64/text?callback=handle&image=xxx
      --ocr
          开启内容识别，支持新旧模型共存
      --old
          开启旧版模型内容识别，支持新旧模型共存
      --det
          开启目标检测
      --ocr-probability <OCR_PROBABILITY>
          开启内容概率识别，支持新旧模型共存，只能使用官方模型， 如果参数是 0 到 7，对应内置的字符集， 如果参数为空字符串，表示默认字符集， 除此之外的参数，表示自定义字符集，例如 "0123456789+-x/="
      --old-probability <OLD_PROBABILITY>
          开启旧版模型内容概率识别，支持新旧模型共存，只能使用官方模型， 如果参数是 0 到 7，对应内置的字符集， 如果参数为空字符串，表示默认字符集， 除此之外的参数，表示自定义字符集，例如 "0123456789+-x/="
      --ocr-path <OCR_PATH>
          内容识别模型以及字符集路径， 通过哈希值判断是否为自定义模型， 使用自定义模型会使 old 选项失效， 路径 model/common 对应模型 model/common.onnx 和字符集 model/common.json [default: model/common]
      --det-path <DET_PATH>
          目标检测模型路径 [default: model/common_det.onnx]
      --slide-match
          开启滑块识别
      --simple-slide-match
          开启简单滑块识别
      --slide-compare
          开启坑位识别
  -h, --help
          Print help

接口

测试是否启动成功，可以通过直接 GET/POST 访问 http://{host}:{port}/ping 来测试，如果返回 pong 则启动成功。

http://{host}:{port}/{opt}/{img_type}/{ret_type}

opt:
  ocr               内容识别
  old               旧版模型内容识别
  det               目标检测
  ocr_probability   内容概率识别
  old_probability   旧版模型内容概率识别
  match             滑块匹配
  simple_match      简单滑块匹配
  compare           坑位匹配

img_type:
  file          文件，即 multipart/form-data
  b64           base64，即 {"a": encode(bytes), "b": encode(bytes)}

ret_type:
  json          json，成功 {"status": 200, "result": object}，失败 {"status": 404, "msg": "失败原因"}
  text          文本，失败返回空文本

接口测试例子，完整的测试请看 `test_api.py` 文件

import requests
import base64

host = "http://127.0.0.1:9898"
file = open('./image/3.png', 'rb').read()

# 测试 jsonp，只能使用 b64，不能使用 file
api_url = f"{host}/ocr/b64/text" 
resp = requests.get(api_url, params = {
  "callback": "handle",
  "image": base64.b64encode(file).decode(),
})
print(f"jsonp, api_url={api_url}, resp.text={resp.text}")

# 测试 ocr
api_url = f"{host}/ocr/file/text"
resp = requests.post(api_url, files={'image': file})
print(f"api_url={api_url}, resp.text={resp.text}")

疑难杂症

cuda 和 cuDNN 都需要安装好。

CUDA 12 构建需要 cuDNN 9.x。

CUDA 11 构建需要 cuDNN 8.x。

不确定 cuda 10 是否有效。

默认使用静态链接，构建时将会自动下载链接库，请设置好代理，cuda 特性不支持静态链接（会自己下载动态链接库）。

如果要指定静态链接库的路径，可以设置环境变量 ORT_LIB_LOCATION，设置后将不会自动下载链接库。

例如，库路径为 onnxruntime\build\Windows\Release\Release\onnxruntime.lib，则 ORT_LIB_LOCATION 设置为 onnxruntime\build\Windows\Release。

默认开启 download-binaries 特性，自动下载链接库。

自动下载的链接库存放在 C:\Users\<用户名>\AppData\ort.pyke.io。

开启动态链接特性 ddddocr = { git = "https://github.com/86maid/ddddocr.git", branch = "master", features = ["load-dynamic"] }

开启 load-dynamic 特性后，可以使用 Ddddocr::set_onnxruntime_path 指定 onnxruntime 动态链接库的路径。

开启 load-dynamic 特性后，构建时将不会自动下载 onnxruntime 链接库。

请手动下载 onnxruntime 链接库，并将其放置在程序运行目录下（或系统 API 目录），这样无需再次调用 Ddddocr::set_onnxruntime_path。

windows 静态链接失败，请安装 vs2022。

linux x86-64 静态链接失败，请安装 gcc11 和 g++11，ubuntu ≥ 20.04。

linux arm64 静态链接失败，需要 glibc ≥ 2.35 （Ubuntu ≥ 22.04）。

macOS 静态链接失败，需要 macOS ≥ 10.15。

cuda 在执行 cargo test 的时候可能会 painc (exit code: 0xc000007b)，这是因为自动生成的动态链接库是在 target/debug 目录下，需要手动复制到 target/debug/deps 目录下（cuda 目前不支持静态链接）。

动态链接需要 1.18.x 版本的 onnxruntime。

更多疑难杂症，请跳转至 ort.pyke.io。

For Tasks:

Click tags to check more tools for each tasks

recognize captchas detect targets match sliders import custom models control ocr results

For Jobs:

software developer data analyst ai engineer web developer machine learning engineer

Alternative AI tools for ddddocr

Similar Open Source Tools

ddddocr

github

: 80

bce-qianfan-sdk

The Qianfan SDK provides best practices for large model toolchains, allowing AI workflows and AI-native applications to access the Qianfan large model platform elegantly and conveniently. The core capabilities of the SDK include three parts: large model reasoning, large model training, and general and extension: * `Large model reasoning`: Implements interface encapsulation for reasoning of Yuyan (ERNIE-Bot) series, open source large models, etc., supporting dialogue, completion, Embedding, etc. * `Large model training`: Based on platform capabilities, it supports end-to-end large model training process, including training data, fine-tuning/pre-training, and model services. * `General and extension`: General capabilities include common AI development tools such as Prompt/Debug/Client. The extension capability is based on the characteristics of Qianfan to adapt to common middleware frameworks.

github

: 342

nekro-agent

Nekro Agent is an AI chat plugin and proxy execution bot that is highly scalable, offers high freedom, and has minimal deployment requirements. It features context-aware chat for group/private chats, custom character settings, sandboxed execution environment, interactive image resource handling, customizable extension development interface, easy deployment with docker-compose, integration with Stable Diffusion for AI drawing capabilities, support for various file types interaction, hot configuration updates and command control, native multimodal understanding, visual application management control panel, CoT (Chain of Thought) support, self-triggered timers and holiday greetings, event notification understanding, and more. It allows for third-party extensions and AI-generated extensions, and includes features like automatic context trigger based on LLM, and a variety of basic commands for bot administrators.

github

: 141

api-for-open-llm

This project provides a unified backend interface for open large language models (LLMs), offering a consistent experience with OpenAI's ChatGPT API. It supports various open-source LLMs, enabling developers to seamlessly integrate them into their applications. The interface features streaming responses, text embedding capabilities, and support for LangChain, a tool for developing LLM-based applications. By modifying environment variables, developers can easily use open-source models as alternatives to ChatGPT, providing a cost-effective and customizable solution for various use cases.

github

: 2.3k

mediapipe-rs

MediaPipe-rs is a Rust library designed for MediaPipe tasks on WasmEdge WASI-NN. It offers easy-to-use low-code APIs similar to mediapipe-python, with low overhead and flexibility for custom media input. The library supports various tasks like object detection, image classification, gesture recognition, and more, including TfLite models, TF Hub models, and custom models. Users can create task instances, run sessions for pre-processing, inference, and post-processing, and speed up processing by reusing sessions. The library also provides support for audio tasks using audio data from symphonia, ffmpeg, or raw audio. Users can choose between CPU, GPU, or TPU devices for processing.

github

: 143

grps_trtllm

The grps-trtllm repository is a C++ implementation of a high-performance OpenAI LLM service, combining GRPS and TensorRT-LLM. It supports functionalities like Chat, Ai-agent, and Multi-modal. The repository offers advantages over triton-trtllm, including a complete LLM service implemented in pure C++, integrated tokenizer supporting huggingface and sentencepiece, custom HTTP functionality for OpenAI interface, support for different LLM prompt styles and result parsing styles, integration with tensorrt backend and opencv library for multi-modal LLM, and stable performance improvement compared to triton-trtllm.

github

: 122

excel-spring-boot-starter

The excel-spring-boot-starter project is based on Easyexcel to implement reading and writing Excel files. EasyExcel is an open-source project for simple and memory-efficient reading and writing of Excel files in Java. It supports reading and writing Excel files up to 75M (46W rows 25 columns) in 1 minute with 64M memory, and there is a fast mode for even quicker performance but with slightly more memory consumption.

github

: 457

PureChat

PureChat is a chat application integrated with ChatGPT, featuring efficient application building with Vite5, screenshot generation and copy support for chat records, IM instant messaging SDK for sessions, automatic light and dark mode switching based on system theme, Markdown rendering, code highlighting, and link recognition support, seamless social experience with GitHub quick login, integration of large language models like ChatGPT Ollama for streaming output, preset prompts, and context, Electron desktop app versions for macOS and Windows, ongoing development of more features. Environment setup requires Node.js 18.20+. Clone code with 'git clone https://github.com/Hyk260/PureChat.git', install dependencies with 'pnpm install', start project with 'pnpm dev', and build with 'pnpm build'.

github

: 207

MCP-Chinese-Getting-Started-Guide

github

: 513

gollama

Gollama is a tool designed for managing Ollama models through a Text User Interface (TUI). Users can list, inspect, delete, copy, and push Ollama models, as well as link them to LM Studio. The application offers interactive model selection, sorting by various criteria, and actions using hotkeys. It provides features like sorting and filtering capabilities, displaying model metadata, model linking, copying, pushing, and more. Gollama aims to be user-friendly and useful for managing models, especially for cleaning up old models.

github

: 912

agentops

AgentOps is a toolkit for evaluating and developing robust and reliable AI agents. It provides benchmarks, observability, and replay analytics to help developers build better agents. AgentOps is open beta and can be signed up for here. Key features of AgentOps include: - Session replays in 3 lines of code: Initialize the AgentOps client and automatically get analytics on every LLM call. - Time travel debugging: (coming soon!) - Agent Arena: (coming soon!) - Callback handlers: AgentOps works seamlessly with applications built using Langchain and LlamaIndex.

github

: 4.1k

herc.ai

Herc.ai is a powerful library for interacting with the Herc.ai API. It offers free access to users and supports all languages. Users can benefit from Herc.ai's features unlimitedly with a one-time subscription and API key. The tool provides functionalities for question answering and text-to-image generation, with support for various models and customization options. Herc.ai can be easily integrated into CLI, CommonJS, TypeScript, and supports beta models for advanced usage. Developed by FiveSoBes and Luppux Development.

github

: 62

ChatPilot

ChatPilot is a chat agent tool that enables AgentChat conversations, supports Google search, URL conversation (RAG), and code interpreter functionality, replicates Kimi Chat (file, drag and drop; URL, send out), and supports OpenAI/Azure API. It is based on LangChain and implements ReAct and OpenAI Function Call for agent Q&A dialogue. The tool supports various automatic tools such as online search using Google Search API, URL parsing tool, Python code interpreter, and enhanced RAG file Q&A with query rewriting support. It also allows front-end and back-end service separation using Svelte and FastAPI, respectively. Additionally, it supports voice input/output, image generation, user management, permission control, and chat record import/export.

github

: 523

langchain4j-aideepin-web

The langchain4j-aideepin-web repository is the frontend project of langchain4j-aideepin, an open-source, offline deployable retrieval enhancement generation (RAG) project based on large language models such as ChatGPT and application frameworks such as Langchain4j. It includes features like registration & login, multi-sessions (multi-roles), image generation (text-to-image, image editing, image-to-image), suggestions, quota control, knowledge base (RAG) based on large models, model switching, and search engine switching.

github

: 59

botgroup.chat

botgroup.chat is a multi-person AI chat application based on React and Cloudflare Pages for free one-click deployment. It supports multiple AI roles participating in conversations simultaneously, providing an interactive experience similar to group chat. The application features real-time streaming responses, customizable AI roles and personalities, group management functionality, AI role mute function, Markdown format support, mathematical formula display with KaTeX, aesthetically pleasing UI design, and responsive design for mobile devices.

github

: 1.1k

cool-admin-midway

Cool-admin (midway version) is a cool open-source backend permission management system that supports modular, plugin-based, rapid CRUD development. It facilitates the quick construction and iteration of backend management systems, deployable in various ways such as serverless, docker, and traditional servers. It features AI coding for generating APIs and frontend pages, flow orchestration for drag-and-drop functionality, modular and plugin-based design for clear and maintainable code. The tech stack includes Node.js, Midway.js, Koa.js, TypeScript for backend, and Vue.js, Element-Plus, JSX, Pinia, Vue Router for frontend. It offers friendly technology choices for both frontend and backend developers, with TypeScript syntax similar to Java and PHP for backend developers. The tool is suitable for those looking for a modern, efficient, and fast development experience.

github

: 2.8k

For similar tasks

ddddocr

github

: 80

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k

ddddocr

README:

简介

目录

环境支持

安装步骤

我们很高兴的宣布，从这个版本开始，我们不再需要依赖笨重的 DLL 啦！

如果你不想从源代码构建，这里有编译好的二进制版本。

旧版本。

使用文档

OCR 识别

内容识别

旧模型

支持识别透明黑色 png 格式的图片，使用 png_fix 参数

参考例图

目标检测

参考例图

滑块匹配

算法1

算法2

OCR 概率输出

自定义 OCR 训练模型导入

ocr_api_server 例子

运行方式

接口

接口测试例子，完整的测试请看 test_api.py 文件

疑难杂症

For Tasks:

For Jobs:

Alternative AI tools for ddddocr

Similar Open Source Tools

ddddocr

bce-qianfan-sdk

nekro-agent

api-for-open-llm

mediapipe-rs

grps_trtllm

excel-spring-boot-starter

PureChat

MCP-Chinese-Getting-Started-Guide

gollama

agentops

herc.ai

ChatPilot

langchain4j-aideepin-web

botgroup.chat

cool-admin-midway

For similar tasks

ddddocr

For similar jobs

sweep

teams-ai

ai-guide

classifai

chatbot-ui

BricksLLM

uAgents

griptape

接口测试例子，完整的测试请看 `test_api.py` 文件