caddy-defender

Caddy module to block or manipulate requests originating from AIs or cloud services trying to train on your websites

Stars: 333

Visit

The Caddy Defender plugin is a middleware for Caddy that allows you to block or manipulate requests based on the client's IP address. It provides features such as IP range filtering, predefined IP ranges for popular AI services, custom IP ranges configuration, and multiple responder backends for different actions like blocking, custom responses, dropping connections, returning garbage data, redirecting, and tarpitting to stall bots. The plugin can be easily installed using Docker or built with `xcaddy`. Configuration is done through the Caddyfile syntax with various options for responders, IP ranges, custom messages, and URLs.

README:

Caddy Defender Plugin

The Caddy Defender plugin is a middleware for Caddy that allows you to block or manipulate requests based on the client's IP address. It is particularly useful for preventing unwanted traffic or polluting AI training data by returning garbage responses.

Features

IP Range Filtering: Block or manipulate requests from specific IP ranges.
Embedded IP Ranges: Predefined IP ranges for popular AI services (e.g., OpenAI, DeepSeek, GitHub Copilot).
Custom IP Ranges: Add your own IP ranges via Caddyfile configuration.
Multiple Responder Backends:
- Block: Return a 403 Forbidden response.
- Custom: Return a custom message.
- Drop: Drops the connection.
- Garbage: Return garbage data to pollute AI training.
- Redirect: Return a 308 Permanent Redirect response with a custom URL.
- Tarpit: Stream data at a slow, but configurable rate to stall bots and pollute AI training.

Installation

Using Docker

The easiest way to use the Caddy Defender plugin is by using the pre-built Docker image.

Pull the Docker Image:

docker pull ghcr.io/jasonlovesdoggo/caddy-defender:latest

Run the Container: Use the following command to run the container with your Caddyfile:

docker run -d \
  --name caddy \
  -v /path/to/Caddyfile:/etc/caddy/Caddyfile \
  -p 80:80 -p 443:443 \
  ghcr.io/jasonlovesdoggo/caddy-defender:latest

Replace /path/to/Caddyfile with the path to your Caddyfile.

Using `xcaddy`

You can also build Caddy with the Caddy Defender plugin using xcaddy, a tool for building custom Caddy binaries.

Install xcaddy:

go install github.com/caddyserver/xcaddy/cmd/xcaddy@latest

Build Caddy with the Plugin: Run the following command to build Caddy with the Caddy Defender plugin:
```
xcaddy build --with github.com/jasonlovesdoggo/caddy-defender
```
This will produce a caddy binary in the current directory.
Run Caddy: Use the built binary to run Caddy with your configuration:
```
./caddy run --config Caddyfile
```

Configuration

Caddyfile Syntax

The defender directive is used to configure the Caddy Defender plugin. It has the following syntax:

defender <responder> {
    message <custom message>
    ranges <ip_ranges...>
    url <url>
}

<responder>: The responder backend to use. Supported values are:
- block: Returns a 403 Forbidden response.
- custom: Returns a custom message (requires message).
- drop: Drops the connection.
- garbage: Returns garbage data to pollute AI training.
- redirect: Returns a 308 Permanent Redirect response (requires url).
- ratelimit: Marks requests for rate limiting (requires Caddy-Ratelimit to be installed as well ).
- tarpit: Stream data at a slow, but configurable rate to stall bots and pollute AI training.
<ip_ranges...>: An optional list of CIDR ranges or predefined range keys to match against the client's IP. Defaults to aws azurepubliccloud deepseek gcloud githubcopilot openai.
<custom message>: A custom message to return when using the custom responder.
<url>: The URI that the redirect responder would redirect to.

For examples, check out docs/examples.md

Embedded IP Ranges

The plugin includes predefined IP ranges for popular AI services. These ranges are embedded in the binary and can be used without additional configuration.

Service	Key	IP Ranges
Alibaba Cloud	aliyun	aliyun.go
VPNs	vpn	vpn.go
AWS	aws	aws.go
AWS Region	aws-us-east-1, aws-us-west-1, aws-eu-west-1	aws_region.go
DeepSeek	deepseek	deepseek.go
GitHub Copilot	githubcopilot	github.go
Google Cloud Platform	gcloud	gcloud.go
Oracle Cloud Infrastructure	oci	oracle.go
Microsoft Azure	azurepubliccloud	azure.go
OpenAI	openai	openai.go
Mistral	mistral	mistral.go
Vultr	vultr	vultr.go
Cloudflare	cloudflare	cloudflare.go
Digital Ocean	digitalocean	digitalocean.go
Linode	linode	linode.go
Private	private	private.go
All IP addresses	all	all.go

Disabled by default (require manual inclusion at build time)

Service	Key	IP Ranges
Tor Exit Nodes	tor	tor.go
ASN (Autonomous System Numbers)	asn	asn.go

More are welcome! for a precompiled list, see the embedded results

Contributing

We welcome contributions! To get started, see CONTRIBUTING.md.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

The inspiration for this project.
bart - Karl Gaissmaier's efficient routing table implementation (Balanced ART adaptation) enabling our high-performance IP matching
Built with ❤️ using Caddy.

Star History

For Tasks:

Click tags to check more tools for each tasks

block unwanted traffic manipulate requests prevent ai data pollution redirect requests stall bots

For Jobs:

devops engineer security analyst web developer system administrator network engineer

Alternative AI tools for caddy-defender

Similar Open Source Tools

caddy-defender

github

: 333

paperless-gpt

paperless-gpt is a tool designed to generate accurate and meaningful document titles and tags for paperless-ngx using Large Language Models (LLMs). It supports multiple LLM providers, including OpenAI and Ollama. With paperless-gpt, you can streamline your document management by automatically suggesting appropriate titles and tags based on the content of your scanned documents. The tool offers features like multiple LLM support, customizable prompts, easy integration with paperless-ngx, user-friendly interface for reviewing and applying suggestions, dockerized deployment, automatic document processing, and an experimental OCR feature.

github

: 724

StableToolBench

StableToolBench is a new benchmark developed to address the instability of Tool Learning benchmarks. It aims to balance stability and reality by introducing features like Virtual API System, Solvable Queries, and Stable Evaluation System. The benchmark ensures consistency through a caching system and API simulators, filters queries based on solvability using LLMs, and evaluates model performance using GPT-4 with metrics like Solvable Pass Rate and Solvable Win Rate.

github

: 135

StableToolBench

StableToolBench is a new benchmark developed to address the instability of Tool Learning benchmarks. It aims to balance stability and reality by introducing features such as a Virtual API System with caching and API simulators, a new set of solvable queries determined by LLMs, and a Stable Evaluation System using GPT-4. The Virtual API Server can be set up either by building from source or using a prebuilt Docker image. Users can test the server using provided scripts and evaluate models with Solvable Pass Rate and Solvable Win Rate metrics. The tool also includes model experiments results comparing different models' performance.

github

: 59

eko

Eko is a lightweight and flexible command-line tool for managing environment variables in your projects. It allows you to easily set, get, and delete environment variables for different environments, making it simple to manage configurations across development, staging, and production environments. With Eko, you can streamline your workflow and ensure consistency in your application settings without the need for complex setup or configuration files.

github

: 1.1k

sktime

sktime is a Python library for time series analysis that provides a unified interface for various time series learning tasks such as classification, regression, clustering, annotation, and forecasting. It offers time series algorithms and tools compatible with scikit-learn for building, tuning, and validating time series models. sktime aims to enhance the interoperability and usability of the time series analysis ecosystem by empowering users to apply algorithms across different tasks and providing interfaces to related libraries like scikit-learn, statsmodels, tsfresh, PyOD, and fbprophet.

github

: 8.3k

rpaframework

RPA Framework is an open-source collection of libraries and tools for Robotic Process Automation (RPA), designed to be used with Robot Framework and Python. It offers well-documented core libraries for Software Robot Developers, optimized for Robocorp Control Room and Developer Tools, and accepts external contributions. The project includes various libraries for tasks like archiving, browser automation, date/time manipulations, cloud services integration, encryption operations, database interactions, desktop automation, document processing, email operations, Excel manipulation, file system operations, FTP interactions, web API interactions, image manipulation, AI services, and more. The development of the repository is Python-based and requires Python version 3.8+, with tooling based on poetry and invoke for compiling, building, and running the package. The project is licensed under the Apache License 2.0.

github

: 1.1k

Pake

Pake is a tool that allows users to turn any webpage into a desktop app with ease. It is lightweight, fast, and supports Mac, Windows, and Linux. Pake provides a battery-included package with shortcut pass-through, immersive windows, and minimalist customization. Users can explore popular packages like WeRead, Twitter, Grok, DeepSeek, ChatGPT, Gemini, YouTube Music, YouTube, LiZhi, ProgramMusic, Excalidraw, and XiaoHongShu. The tool is suitable for beginners, developers, and hackers, offering command-line packaging and advanced usage options. Pake is developed by a community of contributors and offers support through various channels like GitHub, Twitter, and Telegram.

github

: 35.4k

skpro

skpro is a library for supervised probabilistic prediction in python. It provides `scikit-learn`-like, `scikit-base` compatible interfaces to: * tabular **supervised regressors for probabilistic prediction** \- interval, quantile and distribution predictions * tabular **probabilistic time-to-event and survival prediction** \- instance-individual survival distributions * **metrics to evaluate probabilistic predictions** , e.g., pinball loss, empirical coverage, CRPS, survival losses * **reductions** to turn `scikit-learn` regressors into probabilistic `skpro` regressors, such as bootstrap or conformal * building **pipelines and composite models** , including tuning via probabilistic performance metrics * symbolic **probability distributions** with value domain of `pandas.DataFrame`-s and `pandas`-like interface

github

: 259

gollama

Gollama is a delightful tool that brings Ollama, your offline conversational AI companion, directly into your terminal. It provides a fun and interactive way to generate responses from various models without needing internet connectivity. Whether you're brainstorming ideas, exploring creative writing, or just looking for inspiration, Gollama is here to assist you. The tool offers an interactive interface, customizable prompts, multiple models selection, and visual feedback to enhance user experience. It can be installed via different methods like downloading the latest release, using Go, running with Docker, or building from source. Users can interact with Gollama through various options like specifying a custom base URL, prompt, model, and enabling raw output mode. The tool supports different modes like interactive, piped, CLI with image, and TUI with image. Gollama relies on third-party packages like bubbletea, glamour, huh, and lipgloss. The roadmap includes implementing piped mode, support for extracting codeblocks, copying responses/codeblocks to clipboard, GitHub Actions for automated releases, and downloading models directly from Ollama using the rest API. Contributions are welcome, and the project is licensed under the MIT License.

github

: 80

mistral.rs

Mistral.rs is a fast LLM inference platform written in Rust. We support inference on a variety of devices, quantization, and easy-to-use application with an Open-AI API compatible HTTP server and Python bindings.

github

: 5.4k

graphrag-visualizer

GraphRAG Visualizer is an application designed to visualize Microsoft GraphRAG artifacts by uploading parquet files generated from the GraphRAG indexing pipeline. Users can view and analyze data in 2D or 3D graphs, display data tables, search for specific nodes or relationships, and process artifacts locally for data security and privacy.

github

: 301

Starmoon

Starmoon is an affordable, compact AI-enabled device that can understand and respond to your emotions with empathy. It offers supportive conversations and personalized learning assistance. The device is cost-effective, voice-enabled, open-source, compact, and aims to reduce screen time. Users can assemble the device themselves using off-the-shelf components and deploy it locally for data privacy. Starmoon integrates various APIs for AI language models, speech-to-text, text-to-speech, and emotion intelligence. The hardware setup involves components like ESP32S3, microphone, amplifier, speaker, LED light, and button, along with software setup instructions for developers. The project also includes a web app, backend API, and background task dashboard for monitoring and management.

github

: 457

spark-nlp

Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. It provides simple, performant, and accurate NLP annotations for machine learning pipelines that scale easily in a distributed environment. Spark NLP comes with 36000+ pretrained pipelines and models in more than 200+ languages. It offers tasks such as Tokenization, Word Segmentation, Part-of-Speech Tagging, Named Entity Recognition, Dependency Parsing, Spell Checking, Text Classification, Sentiment Analysis, Token Classification, Machine Translation, Summarization, Question Answering, Table Question Answering, Text Generation, Image Classification, Image to Text (captioning), Automatic Speech Recognition, Zero-Shot Learning, and many more NLP tasks. Spark NLP is the only open-source NLP library in production that offers state-of-the-art transformers such as BERT, CamemBERT, ALBERT, ELECTRA, XLNet, DistilBERT, RoBERTa, DeBERTa, XLM-RoBERTa, Longformer, ELMO, Universal Sentence Encoder, Llama-2, M2M100, BART, Instructor, E5, Google T5, MarianMT, OpenAI GPT2, Vision Transformers (ViT), OpenAI Whisper, and many more not only to Python and R, but also to JVM ecosystem (Java, Scala, and Kotlin) at scale by extending Apache Spark natively.

github

: 3.9k

MooER

MooER (摩耳) is an LLM-based speech recognition and translation model developed by Moore Threads. It allows users to transcribe speech into text (ASR) and translate speech into other languages (AST) in an end-to-end manner. The model was trained using 5K hours of data and is now also available with an 80K hours version. MooER is the first LLM-based speech model trained and inferred using domestic GPUs. The repository includes pretrained models, inference code, and a Gradio demo for a better user experience.

github

: 124

airunner

AI Runner is a multi-modal AI interface that allows users to run open-source large language models and AI image generators on their own hardware. The tool provides features such as voice-based chatbot conversations, text-to-speech, speech-to-text, vision-to-text, text generation with large language models, image generation capabilities, image manipulation tools, utility functions, and more. It aims to provide a stable and user-friendly experience with security updates, a new UI, and a streamlined installation process. The application is designed to run offline on users' hardware without relying on a web server, offering a smooth and responsive user experience.

github

: 307

For similar tasks

caddy-defender

github

: 333

For similar jobs

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

ai-on-gke

This repository contains assets related to AI/ML workloads on Google Kubernetes Engine (GKE). Run optimized AI/ML workloads with Google Kubernetes Engine (GKE) platform orchestration capabilities. A robust AI/ML platform considers the following layers: Infrastructure orchestration that support GPUs and TPUs for training and serving workloads at scale Flexible integration with distributed computing and data processing frameworks Support for multiple teams on the same infrastructure to maximize utilization of resources

github

: 280

tidb

TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

github

: 37.1k

nvidia_gpu_exporter

Nvidia GPU exporter for prometheus, using `nvidia-smi` binary to gather metrics.

github

: 1.1k

tracecat

Tracecat is an open-source automation platform for security teams. It's designed to be simple but powerful, with a focus on AI features and a practitioner-obsessed UI/UX. Tracecat can be used to automate a variety of tasks, including phishing email investigation, evidence collection, and remediation plan generation.

github

: 2.6k

openinference

OpenInference is a set of conventions and plugins that complement OpenTelemetry to enable tracing of AI applications. It provides a way to capture and analyze the performance and behavior of AI models, including their interactions with other components of the application. OpenInference is designed to be language-agnostic and can be used with any OpenTelemetry-compatible backend. It includes a set of instrumentations for popular machine learning SDKs and frameworks, making it easy to add tracing to your AI applications.

github

: 362

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

kong

Kong, or Kong API Gateway, is a cloud-native, platform-agnostic, scalable API Gateway distinguished for its high performance and extensibility via plugins. It also provides advanced AI capabilities with multi-LLM support. By providing functionality for proxying, routing, load balancing, health checking, authentication (and more), Kong serves as the central layer for orchestrating microservices or conventional API traffic with ease. Kong runs natively on Kubernetes thanks to its official Kubernetes Ingress Controller.

github

: 40.4k

caddy-defender

README:

Caddy Defender Plugin

Features

Installation

Using Docker

Using xcaddy

Configuration

Caddyfile Syntax

For examples, check out docs/examples.md

Embedded IP Ranges

Disabled by default (require manual inclusion at build time)

Contributing

License

Acknowledgments

Star History

For Tasks:

For Jobs:

Alternative AI tools for caddy-defender

Similar Open Source Tools

caddy-defender

paperless-gpt

StableToolBench

StableToolBench

eko

sktime

rpaframework

Pake

skpro

gollama

mistral.rs

graphrag-visualizer

Starmoon

spark-nlp

MooER

airunner

For similar tasks

caddy-defender

For similar jobs

kaito

ai-on-gke

tidb

nvidia_gpu_exporter

tracecat

openinference

BricksLLM

kong

Using `xcaddy`