galah

Galah: An LLM-powered web honeypot. Wasting attackers' time with faker-than-ever HTTP responses!

Stars: 331

Visit

Galah is an LLM-powered web honeypot designed to mimic various applications and dynamically respond to arbitrary HTTP requests. It supports multiple LLM providers, including OpenAI. Unlike traditional web honeypots, Galah dynamically crafts responses for any HTTP request, caching them to reduce repetitive generation and API costs. The honeypot's configuration is crucial, directing the LLM to produce responses in a specified JSON format. Note that Galah is a weekend project exploring LLM capabilities and not intended for production use, as it may be identifiable through network fingerprinting and non-standard responses.

README:

TL;DR: Galah (/ɡəˈlɑː/ - pronounced ‘guh-laa’) is an LLM-powered web honeypot designed to mimic various applications and dynamically respond to arbitrary HTTP requests. Galah supports major LLM providers, including OpenAI, GoogleAI, GCP's Vertex AI, Anthropic, Cohere, and Ollama.

Unlike traditional web honeypots that manually emulate specific web applications or vulnerabilities, Galah dynamically crafts relevant responses—including HTTP headers and body content—to any HTTP request. Responses generated by the LLM are cached for a configurable period to prevent repetitive generation for identical requests, reducing API costs. The caching is port-specific, ensuring that responses generated for a particular port will not be reused for the same request on a different port.

The prompt configuration is key in this honeypot. While you can update the prompt in the configuration file, it is crucial to maintain the segment directing the LLM to produce responses in the specified JSON format.

Note: Galah was developed as a fun weekend project to explore the capabilities of LLMs in crafting HTTP messages and is not intended for production use. The honeypot may be identifiable through various methods such as network fingerprinting techniques, prolonged response times depending on the LLM provider and model, and non-standard responses. To protect against Denial of Wallet attacks, be sure to set usage limits on your LLM API.

Getting Started

Local Deployment

Ensure you have Go version 1.22+ installed.
Depending on your LLM provider, create an API key (e.g., from here for OpenAI and here for GoogleAI Studio) or set up authentication credentials (e.g., Application Default Credentials for GCP's Vertex AI).
If you want to serve HTTPS ports, generate TLS certificates.
Clone the repo and install the dependencies.
Update the config.yaml file if needed.
Build and run the Go binary!

% git clone [email protected]:0x4D31/galah.git
% cd galah
% go mod download
% go build -o galah ./cmd/galah
% export LLM_API_KEY=your-api-key
% ./galah --help

 ██████   █████  ██       █████  ██   ██ 
██       ██   ██ ██      ██   ██ ██   ██ 
██   ███ ███████ ██      ███████ ███████ 
██    ██ ██   ██ ██      ██   ██ ██   ██ 
 ██████  ██   ██ ███████ ██   ██ ██   ██ 
  llm-based web honeypot // version 1.0
        author: Adel "0x4D31" Karimi

Usage: galah --provider PROVIDER --model MODEL [--server-url SERVER-URL] [--temperature TEMPERATURE] [--api-key API-KEY] [--cloud-location CLOUD-LOCATION] [--cloud-project CLOUD-PROJECT] [--interface INTERFACE] [--config-file CONFIG-FILE] [--event-log-file EVENT-LOG-FILE] [--cache-db-file CACHE-DB-FILE] [--cache-duration CACHE-DURATION] [--log-level LOG-LEVEL]

Options:
  --provider PROVIDER, -p PROVIDER
                         LLM provider (openai, googleai, gcp-vertex, anthropic, cohere, ollama) [env: LLM_PROVIDER]
  --model MODEL, -m MODEL
                         LLM model (e.g. gpt-3.5-turbo-1106, gemini-1.5-pro-preview-0409) [env: LLM_MODEL]
  --server-url SERVER-URL, -u SERVER-URL
                         LLM Server URL (required for Ollama) [env: LLM_SERVER_URL]
  --temperature TEMPERATURE, -t TEMPERATURE
                         LLM sampling temperature (0-2). Higher values make the output more random [default: 1, env: LLM_TEMPERATURE]
  --api-key API-KEY, -k API-KEY
                         LLM API Key [env: LLM_API_KEY]
  --cloud-location CLOUD-LOCATION
                         LLM cloud location region (required for GCP's Vertex AI) [env: LLM_CLOUD_LOCATION]
  --cloud-project CLOUD-PROJECT
                         LLM cloud project ID (required for GCP's Vertex AI) [env: LLM_CLOUD_PROJECT]
  --interface INTERFACE, -i INTERFACE
                         interface to serve on
  --config-file CONFIG-FILE, -c CONFIG-FILE
                         Path to config file [default: config/config.yaml]
  --event-log-file EVENT-LOG-FILE, -o EVENT-LOG-FILE
                         Path to event log file [default: event_log.json]
  --cache-db-file CACHE-DB-FILE, -f CACHE-DB-FILE
                         Path to database file for response caching [default: cache.db]
  --cache-duration CACHE-DURATION, -d CACHE-DURATION
                         Cache duration for generated responses (in hours). Use 0 to disable caching, and -1 for unlimited caching (no expiration). [default: 24]
  --log-level LOG-LEVEL, -l LOG-LEVEL
                         Log level (debug, info, error, fatal) [default: info]
  --help, -h             display this help and exit

Run in Docker

Ensure you have Docker CE or EE installed locally.
Clone the repo and build the docker image.
You can mount a local directory to the container to store the logs.
Run the docker container.

% git clone [email protected]:0x4D31/galah.git
% cd galah
% mkdir logs
% export LLM_API_KEY=your-api-key
% docker build -t galah-image .
% docker run -d --name galah-container -p 8080:8080 -v $(pwd)/logs:/galah/logs -e LLM_API_KEY galah-image -o logs/galah.json -p openai -m gpt-3.5-turbo-1106

Example Usage

./galah -p gcp-vertex -m gemini-1.0-pro-002 --cloud-project galah-test --cloud-location us-central1 --temperature 0.2 --cache-duration 0

% curl -i http://localhost:8080/.aws/credentials
HTTP/1.1 200 OK
Date: Sun, 26 May 2024 16:37:26 GMT
Content-Length: 116
Content-Type: text/plain; charset=utf-8

[default]
aws_access_key_id = AKIAIOSFODNN7EXAMPLE
aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

JSON event log:

{
  "eventTime": "2024-05-26T18:37:26.742418+02:00",
  "httpRequest": {
    "body": "",
    "bodySha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    "headers": "User-Agent: [curl/7.71.1], Accept: [*/*]",
    "headersSorted": "Accept,User-Agent",
    "headersSortedSha256": "cf69e186169279bd51769f29d122b07f1f9b7e51bf119c340b66fbd2a1128bc9",
    "method": "GET",
    "protocolVersion": "HTTP/1.1",
    "request": "/.aws/credentials",
    "userAgent": "curl/7.71.1"
  },
  "httpResponse": {
    "headers": {
      "Content-Length": "127",
      "Content-Type": "text/plain"
    },
    "body": "[default]\naws_access_key_id = AKIAIOSFODNN7EXAMPLE\naws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY\n"
  },
  "level": "info",
  "llm": {
    "model": "gemini-1.0-pro-002",
    "provider": "gcp-vertex",
    "temperature": 0.2
  },
  "msg": "successfulResponse",
  "port": "8080",
  "sensorName": "mbp.local",
  "srcHost": "localhost",
  "srcIP": "::1",
  "srcPort": "51725",
  "tags": null,
  "time": "2024-05-26T18:37:26.742447+02:00"
}

See more examples here.

For Tasks:

Click tags to check more tools for each tasks

detect malicious activity simulate web applications respond to http requests analyze network traffic explore llm capabilities

For Jobs:

cybersecurity analyst security engineer penetration tester incident responder security consultant

Alternative AI tools for galah

Similar Open Source Tools

galah

github

: 331

llm2vec

LLM2Vec is a simple recipe to convert decoder-only LLMs into text encoders. It consists of 3 simple steps: 1) enabling bidirectional attention, 2) training with masked next token prediction, and 3) unsupervised contrastive learning. The model can be further fine-tuned to achieve state-of-the-art performance.

github

: 1.2k

$fractl Screenshot$

fractl

Fractl is a programming language designed for generative AI, making it easier for developers to work with AI-generated code. It features a data-oriented and declarative syntax, making it a better fit for generative AI-powered code generation. Fractl also bridges the gap between traditional programming and visual building, allowing developers to use multiple ways of building, including traditional coding, visual development, and code generation with generative AI. Key concepts in Fractl include a graph-based hierarchical data model, zero-trust programming, declarative dataflow, resolvers, interceptors, and entity-graph-database mapping.

github

: 117

redis-vl-python

The Python Redis Vector Library (RedisVL) is a tailor-made client for AI applications leveraging Redis. It enhances applications with Redis' speed, flexibility, and reliability, incorporating capabilities like vector-based semantic search, full-text search, and geo-spatial search. The library bridges the gap between the emerging AI-native developer ecosystem and the capabilities of Redis by providing a lightweight, elegant, and intuitive interface. It abstracts the features of Redis into a grammar that is more aligned to the needs of today's AI/ML Engineers or Data Scientists.

github

: 253

empower-functions

Empower Functions is a family of large language models (LLMs) that provide GPT-4 level capabilities for real-world 'tool using' use cases. These models offer compatibility support to be used as drop-in replacements, enabling interactions with external APIs by recognizing when a function needs to be called and generating JSON containing necessary arguments based on user inputs. This capability is crucial for building conversational agents and applications that convert natural language into API calls, facilitating tasks such as weather inquiries, data extraction, and interactions with knowledge bases. The models can handle multi-turn conversations, choose between tools or standard dialogue, ask for clarification on missing parameters, integrate responses with tool outputs in a streaming fashion, and efficiently execute multiple functions either in parallel or sequentially with dependencies.

github

: 202

superagent-py

Superagent is an open-source framework that enables developers to integrate production-ready AI assistants into any application quickly and easily. It provides a Python SDK for interacting with the Superagent API, allowing developers to create, manage, and invoke AI agents. The SDK simplifies the process of building AI-powered applications, making it accessible to developers of all skill levels.

github

: 99

TheoremExplainAgent

TheoremExplainAgent is an AI system that generates long-form Manim videos to visually explain theorems, proving its deep understanding while uncovering reasoning flaws that text alone often hides. The codebase for the paper 'TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding' is available in this repository. It provides a tool for creating multimodal explanations for theorem understanding using AI technology.

github

: 966

otto-m8

otto-m8 is a flowchart based automation platform designed to run deep learning workloads with minimal to no code. It provides a user-friendly interface to spin up a wide range of AI models, including traditional deep learning models and large language models. The tool deploys Docker containers of workflows as APIs for integration with existing workflows, building AI chatbots, or standalone applications. Otto-m8 operates on an Input, Process, Output paradigm, simplifying the process of running AI models into a flowchart-like UI.

github

: 65

xFinder

xFinder is a model specifically designed for key answer extraction from large language models (LLMs). It addresses the challenges of unreliable evaluation methods by optimizing the key answer extraction module. The model achieves high accuracy and robustness compared to existing frameworks, enhancing the reliability of LLM evaluation. It includes a specialized dataset, the Key Answer Finder (KAF) dataset, for effective training and evaluation. xFinder is suitable for researchers and developers working with LLMs to improve answer extraction accuracy.

github

: 153

structured-logprobs

This Python library enhances OpenAI chat completion responses by providing detailed information about token log probabilities. It works with OpenAI Structured Outputs to ensure model-generated responses adhere to a JSON Schema. Developers can analyze and incorporate token-level log probabilities to understand the reliability of structured data extracted from OpenAI models.

github

: 155

trapster-community

Trapster Community is a low-interaction honeypot designed for internal networks or credential capture. It monitors and detects suspicious activities, providing deceptive security layer. Features include mimicking network services, asynchronous framework, easy configuration, expandable services, and HTTP honeypot engine with AI capabilities. Supported protocols include DNS, HTTP/HTTPS, FTP, LDAP, MSSQL, POSTGRES, RDP, SNMP, SSH, TELNET, VNC, and RSYNC. The tool generates various types of logs and offers HTTP engine with AI capabilities to emulate websites using YAML configuration. Contributions are welcome under AGPLv3+ license.

github

: 107

ModelCache

Codefuse-ModelCache is a semantic cache for large language models (LLMs) that aims to optimize services by introducing a caching mechanism. It helps reduce the cost of inference deployment, improve model performance and efficiency, and provide scalable services for large models. The project facilitates sharing and exchanging technologies related to large model semantic cache through open-source collaboration.

github

: 902

LLM-Blender

LLM-Blender is a framework for ensembling large language models (LLMs) to achieve superior performance. It consists of two modules: PairRanker and GenFuser. PairRanker uses pairwise comparisons to distinguish between candidate outputs, while GenFuser merges the top-ranked candidates to create an improved output. LLM-Blender has been shown to significantly surpass the best LLMs and baseline ensembling methods across various metrics on the MixInstruct benchmark dataset.

github

: 786

npi

NPi is an open-source platform providing Tool-use APIs to empower AI agents with the ability to take action in the virtual world. It is currently under active development, and the APIs are subject to change in future releases. NPi offers a command line tool for installation and setup, along with a GitHub app for easy access to repositories. The platform also includes a Python SDK and examples like Calendar Negotiator and Twitter Crawler. Join the NPi community on Discord to contribute to the development and explore the roadmap for future enhancements.

github

: 211

GPT-4V-Act

GPT-4V-Act is a multimodal AI assistant that combines GPT-4V(ision) with a web browser to mirror human operator input and output. It facilitates human-computer operations, boosts UI accessibility, aids workflow automation, and enables automated UI testing through AI labeling and set-of-marks prompting.

github

: 951

aws-mcp

AWS MCP is a Model Context Protocol (MCP) server that facilitates interactions between AI assistants and AWS environments. It allows for natural language querying and management of AWS resources during conversations. The server supports multiple AWS profiles, SSO authentication, multi-region operations, and secure credential handling. Users can locally execute commands with their AWS credentials, enhancing the conversational experience with AWS resources.

github

: 118

For similar tasks

galah

github

: 331

StratosphereLinuxIPS

Slips is a powerful endpoint behavioral intrusion prevention and detection system that uses machine learning to detect malicious behaviors in network traffic. It can work with network traffic in real-time, PCAP files, and network flows from tools like Suricata, Zeek/Bro, and Argus. Slips threat detection is based on machine learning models, threat intelligence feeds, and expert heuristics. It gathers evidence of malicious behavior and triggers alerts when enough evidence is accumulated. The tool is Python-based and supported on Linux and MacOS, with blocking features only on Linux. Slips relies on Zeek network analysis framework and Redis for interprocess communication. It offers a graphical user interface for easy monitoring and analysis.

github

: 691

For similar jobs

ciso-assistant-community

CISO Assistant is a tool that helps organizations manage their cybersecurity posture and compliance. It provides a centralized platform for managing security controls, threats, and risks. CISO Assistant also includes a library of pre-built frameworks and tools to help organizations quickly and easily implement best practices.

github

: 2.8k

PurpleLlama

Purple Llama is an umbrella project that aims to provide tools and evaluations to support responsible development and usage of generative AI models. It encompasses components for cybersecurity and input/output safeguards, with plans to expand in the future. The project emphasizes a collaborative approach, borrowing the concept of purple teaming from cybersecurity, to address potential risks and challenges posed by generative AI. Components within Purple Llama are licensed permissively to foster community collaboration and standardize the development of trust and safety tools for generative AI.

github

: 2.9k

vpnfast.github.io

VPNFast is a lightweight and fast VPN service provider that offers secure and private internet access. With VPNFast, users can protect their online privacy, bypass geo-restrictions, and secure their internet connection from hackers and snoopers. The service provides high-speed servers in multiple locations worldwide, ensuring a reliable and seamless VPN experience for users. VPNFast is easy to use, with a user-friendly interface and simple setup process. Whether you're browsing the web, streaming content, or accessing sensitive information, VPNFast helps you stay safe and anonymous online.

github

: 80

taranis-ai

Taranis AI is an advanced Open-Source Intelligence (OSINT) tool that leverages Artificial Intelligence to revolutionize information gathering and situational analysis. It navigates through diverse data sources like websites to collect unstructured news articles, utilizing Natural Language Processing and Artificial Intelligence to enhance content quality. Analysts then refine these AI-augmented articles into structured reports that serve as the foundation for deliverables such as PDF files, which are ultimately published.

github

: 358

NightshadeAntidote

Nightshade Antidote is an image forensics tool used to analyze digital images for signs of manipulation or forgery. It implements several common techniques used in image forensics including metadata analysis, copy-move forgery detection, frequency domain analysis, and JPEG compression artifacts analysis. The tool takes an input image, performs analysis using the above techniques, and outputs a report summarizing the findings.

github

: 163

h4cker

This repository is a comprehensive collection of cybersecurity-related references, scripts, tools, code, and other resources. It is carefully curated and maintained by Omar Santos. The repository serves as a supplemental material provider to several books, video courses, and live training created by Omar Santos. It encompasses over 10,000 references that are instrumental for both offensive and defensive security professionals in honing their skills.

github

: 20.4k

AIMr

AIMr is an AI aimbot tool written in Python that leverages modern technologies to achieve an undetected system with a pleasing appearance. It works on any game that uses human-shaped models. To optimize its performance, users should build OpenCV with CUDA. For Valorant, additional perks in the Discord and an Arduino Leonardo R3 are required.

github

: 229

admyral

Admyral is an open-source Cybersecurity Automation & Investigation Assistant that provides a unified console for investigations and incident handling, workflow automation creation, automatic alert investigation, and next step suggestions for analysts. It aims to tackle alert fatigue and automate security workflows effectively by offering features like workflow actions, AI actions, case management, alert handling, and more. Admyral combines security automation and case management to streamline incident response processes and improve overall security posture. The tool is open-source, transparent, and community-driven, allowing users to self-host, contribute, and collaborate on integrations and features.

github

: 293