AIL-framework

AIL framework - Analysis Information Leak framework. Project moved to https://github.com/ail-project

Stars: 1346

Visit

AIL framework is a modular framework to analyze potential information leaks from unstructured data sources like pastes from Pastebin or similar services or unstructured data streams. AIL framework is flexible and can be extended to support other functionalities to mine or process sensitive information (e.g. data leak prevention).

README:

AIL framework

Latest Release
CI
Gitter
Contributors
License

AIL framework - Framework for Analysis of Information Leaks

AIL is a modular framework to analyse potential information leaks from unstructured data sources like pastes from Pastebin or similar services or unstructured data streams. AIL framework is flexible and can be extended to support other functionalities to mine or process sensitive information (e.g. data leak prevention).

AIL V5.0 Version:

AIL v5.0 introduces significant improvements and new features:

Codebase Rewrite: The codebase has undergone a substantial rewrite, resulting in enhanced performance and speed improvements.
Database Upgrade: The database has been migrated from ARDB to Kvrocks.
New Correlation Engine: AIL v5.0 introduces a new powerful correlation engine with two new correlation types: CVE and Title.
Enhanced Logging: The logging system has been improved to provide better troubleshooting capabilities.
Tagging Support: AIL objects now support tagging, allowing users to categorize and label extracted information for easier analysis and organization.
Trackers: Improved objects filtering, PGP and decoded tracking added.
UI Content Visualization: The user interface has been upgraded to visualize extracted and tracked information.
New Crawler Lacus: improve crawling capabilities.
Modular Importers and Exporters: New importers (ZMQ, AIL Feeders) and exporters (MISP, Mail, TheHive) modular design. Allow easy creation and customization by extending an abstract class.
Module Queues: improved the queuing mechanism between detection modules.
New Object CVE and Title: Extract an correlate CVE IDs and web page titles.

Features

Modular architecture to handle streams of unstructured or structured information
Default support for external ZMQ feeds, such as provided by CIRCL or other providers
Multiple Importers and feeds support
Each module can process and reprocess the information already analyzed by AIL
Detecting and extracting URLs including their geographical location (e.g. IP address location)
Extracting and validating potential leaks of credit card numbers, credentials, ...
Extracting and validating leaked email addresses, including DNS MX validation
Module for extracting Tor .onion addresses for further analysis
Keep tracks of credentials duplicates (and diffing between each duplicate found)
Extracting and validating potential hostnames (e.g. to feed Passive DNS systems)
A full-text indexer module to index unstructured information
Terms, Set of terms, Regex, typo squatting and YARA tracking and occurrence
YARA Retro Hunt
Many more modules for extracting phone numbers, credentials, and more
Alerting to MISP to share found leaks within a threat intelligence platform using MISP standard
Detecting and decoding encoded file (Base64, hex encoded or your own decoding scheme) and storing files
Detecting Amazon AWS and Google API keys
Detecting Bitcoin address and Bitcoin private keys
Detecting private keys, certificate, keys (including SSH, OpenVPN)
Detecting IBAN bank accounts
Tagging system with MISP Galaxy and MISP Taxonomies tags
UI submission
Create events on MISP and cases on The Hive
Automatic export on detection with MISP (events) and The Hive (alerts) on selected tags
Extracted and decoded files can be searched by date range, type of file (mime-type) and encoding discovered
Correlations engine and Graph to visualize relationships between decoded files (hashes), PGP UIDs, domains, username, and cryptocurrencies addresses
Websites, Forums and Tor Hidden-Services hidden services crawler to crawl and parse output
Domain availability monitoring to detect up and down of websites and hidden services
Browsed hidden services are automatically captured and integrated into the analyzed output, including a blurring screenshot interface (to avoid "burning the eyes" of security analysts with sensitive content)
Tor hidden services is part of the standard framework, all the AIL modules are available to the crawled hidden services
Crawler scheduler to trigger crawling on demand or at regular intervals for URLs or Tor hidden services

Trackers:

Trackers are user-defined rules or patterns that automatically detect, tag and notify about relevant information collected by AIL.

Trackers types: Documentation

word or set of words
YARA rules
Regex
Typo Squatting

Installation

To install the AIL framework, run the following commands:

# Clone the repo first
git clone https://github.com/ail-project/ail-framework.git
git submodule update --init --recursive
cd ail-framework

# For Debian and Ubuntu based distributions
./installing_deps.sh

# Launch ail
cd ~/ail-framework/
cd bin/
./LAUNCH.sh -l

The default installing_deps.sh is for Debian and Ubuntu based distributions.

Requirement:

Python 3.8+

Installation Notes

For Lacus Crawler and LibreTranslate installation instructions (if you want to use those features), refer to the HOWTO

Starting AIL

To start AIL, use the following commands:

cd bin/
./LAUNCH.sh -l

You can access the AIL framework web interface at the following URL:

https://localhost:7000/

The default credentials for the web interface are located in the DEFAULT_PASSWORDfile, which is deleted when you change your password.

Training

CIRCL organises training on how to use or extend the AIL framework. AIL training materials are available at https://github.com/ail-project/ail-training.

Documentation

The documentation is available in doc/README.md

API

The API documentation is available in doc/api.md

HOWTO

HOWTO are available in HOWTO.md

Privacy and GDPR

For information on AIL's compliance with GDPR and privacy considerations, refer to the AIL information leaks analysis and the GDPR in the context of collection, analysis and sharing information leaks document.

this document provides an overview how to use AIL in a lawfulness context especially in the scope of General Data Protection Regulation.

Research using AIL

If you use or reference AIL in an academic paper, you can cite it using the following BibTeX:

@inproceedings{mokaddem2018ail,
  title={AIL-The design and implementation of an Analysis Information Leak framework},
  author={Mokaddem, Sami and Wagener, G{\'e}rard and Dulaunoy, Alexandre},
  booktitle={2018 IEEE International Conference on Big Data (Big Data)},
  pages={5049--5057},
  year={2018},
  organization={IEEE}
}

Screenshots

Websites, Forums and Tor Hidden-Services

Login protected, pre-recorded session cookies:

Extracted encoded files from items

Correlation Engine

Investigation

Tagging system

MISP Export

MISP and The Hive, automatic events and alerts creation

UI submission

Trackers

License

    Copyright (C) 2014 Jules Debra
    Copyright (c) 2021 Olivier Sagit
    Copyright (C) 2014-2024 CIRCL - Computer Incident Response Center Luxembourg (c/o smile, security made in Lëtzebuerg, Groupement d'Intérêt Economique)
    Copyright (c) 2014-2024 Raphaël Vinot
    Copyright (c) 2014-2024 Alexandre Dulaunoy
    Copyright (c) 2016-2024 Sami Mokaddem
    Copyright (c) 2018-2024 Thirion Aurélien

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU Affero General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU Affero General Public License for more details.

    You should have received a copy of the GNU Affero General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.

For Tasks:

Click tags to check more tools for each tasks

detect data leaks extract sensitive information analyze unstructured data

For Jobs:

security analyst data scientist threat intelligence analyst information security analyst cybersecurity engineer

Alternative AI tools for AIL-framework

Similar Open Source Tools

AIL-framework

github

: 1.3k

ail-framework

github

: 793

swiftide

Swiftide is a fast, streaming indexing and query library tailored for Retrieval Augmented Generation (RAG) in AI applications. It is built in Rust, utilizing parallel, asynchronous streams for blazingly fast performance. With Swiftide, users can easily build AI applications from idea to production in just a few lines of code. The tool addresses frustrations around performance, stability, and ease of use encountered while working with Python-based tooling. It offers features like fast streaming indexing pipeline, experimental query pipeline, integrations with various platforms, loaders, transformers, chunkers, embedders, and more. Swiftide aims to provide a platform for data indexing and querying to advance the development of automated Large Language Model (LLM) applications.

github

: 572

StratosphereLinuxIPS

Slips is a powerful endpoint behavioral intrusion prevention and detection system that uses machine learning to detect malicious behaviors in network traffic. It can work with network traffic in real-time, PCAP files, and network flows from tools like Suricata, Zeek/Bro, and Argus. Slips threat detection is based on machine learning models, threat intelligence feeds, and expert heuristics. It gathers evidence of malicious behavior and triggers alerts when enough evidence is accumulated. The tool is Python-based and supported on Linux and MacOS, with blocking features only on Linux. Slips relies on Zeek network analysis framework and Redis for interprocess communication. It offers a graphical user interface for easy monitoring and analysis.

github

: 691

DocsGPT

DocsGPT is an open-source documentation assistant powered by GPT models. It simplifies the process of searching for information in project documentation by allowing developers to ask questions and receive accurate answers. With DocsGPT, users can say goodbye to manual searches and quickly find the information they need. The tool aims to revolutionize project documentation experiences and offers features like live previews, Discord community, guides, and contribution opportunities. It consists of a Flask app, Chrome extension, similarity search index creation script, and a frontend built with Vite and React. Users can quickly get started with DocsGPT by following the provided setup instructions and can contribute to its development by following the guidelines in the CONTRIBUTING.md file. The project follows a Code of Conduct to ensure a harassment-free community environment for all participants. DocsGPT is licensed under MIT and is built with LangChain.

github

: 17.1k

SillyTavern

SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.

github

: 18.8k

fluid

Fluid is an open source Kubernetes-native Distributed Dataset Orchestrator and Accelerator for data-intensive applications, such as big data and AI applications. It implements dataset abstraction, scalable cache runtime, automated data operations, elasticity and scheduling, and is runtime platform agnostic. Key concepts include Dataset and Runtime. Prerequisites include Kubernetes version > 1.16, Golang 1.18+, and Helm 3. The tool offers features like accelerating remote file accessing, machine learning, accelerating PVC, preloading dataset, and on-the-fly dataset cache scaling. Contributions are welcomed, and the project is under the Apache 2.0 license with a vendor-neutral approach.

github

: 1.8k

Stable-Diffusion-Android

Stable Diffusion AI is an easy-to-use app for generating images from text or other images. It allows communication with servers powered by various AI technologies like AI Horde, Hugging Face Inference API, OpenAI, StabilityAI, and LocalDiffusion. The app supports Txt2Img and Img2Img modes, positive and negative prompts, dynamic size and sampling methods, unique seed input, and batch image generation. Users can also inpaint images, select faces from gallery or camera, and export images. The app offers settings for server URL, SD Model selection, auto-saving images, and clearing cache.

github

: 667

semantic-router

The Semantic Router is an intelligent routing tool that utilizes a Mixture-of-Models (MoM) approach to direct OpenAI API requests to the most suitable models based on semantic understanding. It enhances inference accuracy by selecting models tailored to different types of tasks. The tool also automatically selects relevant tools based on the prompt to improve tool selection accuracy. Additionally, it includes features for enterprise security such as PII detection and prompt guard to protect user privacy and prevent misbehavior. The tool implements similarity caching to reduce latency. The comprehensive documentation covers setup instructions, architecture guides, and API references.

github

: 1.5k

repromodel

ReproModel is an open-source toolbox designed to boost AI research efficiency by enabling researchers to reproduce, compare, train, and test AI models faster. It provides standardized models, dataloaders, and processing procedures, allowing researchers to focus on new datasets and model development. With a no-code solution, users can access benchmark and SOTA models and datasets, utilize training visualizations, extract code for publication, and leverage an LLM-powered automated methodology description writer. The toolbox helps researchers modularize development, compare pipeline performance reproducibly, and reduce time for model development, computation, and writing. Future versions aim to facilitate building upon state-of-the-art research by loading previously published study IDs with verified code, experiments, and results stored in the system.

github

: 151

clearml

ClearML is a suite of tools designed to streamline the machine learning workflow. It includes an experiment manager, MLOps/LLMOps, data management, and model serving capabilities. ClearML is open-source and offers a free tier hosting option. It supports various ML/DL frameworks and integrates with Jupyter Notebook and PyCharm. ClearML provides extensive logging capabilities, including source control info, execution environment, hyper-parameters, and experiment outputs. It also offers automation features, such as remote job execution and pipeline creation. ClearML is designed to be easy to integrate, requiring only two lines of code to add to existing scripts. It aims to improve collaboration, visibility, and data transparency within ML teams.

github

: 5.8k

mlflow

MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. MLflow offers a set of lightweight APIs that can be used with any existing machine learning application or library (TensorFlow, PyTorch, XGBoost, etc), wherever you currently run ML code (e.g. in notebooks, standalone applications or the cloud). MLflow's current components are: * `MLflow Tracking `_: An API to log parameters, code, and results in machine learning experiments and compare them using an interactive UI. * `MLflow Projects `_: A code packaging format for reproducible runs using Conda and Docker, so you can share your ML code with others. * `MLflow Models `_: A model packaging format and tools that let you easily deploy the same model (from any ML library) to batch and real-time scoring on platforms such as Docker, Apache Spark, Azure ML and AWS SageMaker. * `MLflow Model Registry `_: A centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of MLflow Models.

github

: 22.3k

DataFlow

DataFlow is a data preparation and training system designed to parse, generate, process, and evaluate high-quality data from noisy sources, improving the performance of large language models in specific domains. It constructs diverse operators and pipelines, validated to enhance domain-oriented LLM's performance in fields like healthcare, finance, and law. DataFlow also features an intelligent DataFlow-agent capable of dynamically assembling new pipelines by recombining existing operators on demand.

github

: 1.3k

HAMi

HAMi is a Heterogeneous AI Computing Virtualization Middleware designed to manage Heterogeneous AI Computing Devices in a Kubernetes cluster. It allows for device sharing, device memory control, device type specification, and device UUID specification. The tool is easy to use and does not require modifying task YAML files. It includes features like hard limits on device memory, partial device allocation, streaming multiprocessor limits, and core usage specification. HAMi consists of components like a mutating webhook, scheduler extender, device plugins, and in-container virtualization techniques. It is suitable for scenarios requiring device sharing, specific device memory allocation, GPU balancing, low utilization optimization, and scenarios needing multiple small GPUs. The tool requires prerequisites like NVIDIA drivers, CUDA version, nvidia-docker, Kubernetes version, glibc version, and helm. Users can install, upgrade, and uninstall HAMi, submit tasks, and monitor cluster information. The tool's roadmap includes supporting additional AI computing devices, video codec processing, and Multi-Instance GPUs (MIG).

github

: 2.1k

DataDreamer

DataDreamer is a powerful open-source Python library designed for prompting, synthetic data generation, and training workflows. It is simple, efficient, and research-grade, allowing users to create prompting workflows, generate synthetic datasets, and train models with ease. The library is built for researchers, by researchers, focusing on correctness, best practices, and reproducibility. It offers features like aggressive caching, resumability, support for bleeding-edge techniques, and easy sharing of datasets and models. DataDreamer enables users to run multi-step prompting workflows, generate synthetic datasets for various tasks, and train models by aligning, fine-tuning, instruction-tuning, and distilling them using existing or synthetic data.

github

: 897

cleanlab

Cleanlab helps you **clean** data and **lab** els by automatically detecting issues in a ML dataset. To facilitate **machine learning with messy, real-world data** , this data-centric AI package uses your _existing_ models to estimate dataset problems that can be fixed to train even _better_ models.

github

: 10.2k

For similar tasks

AIL-framework

github

: 1.3k

ail-framework

github

: 793

For similar jobs

Copilot-For-Security

Microsoft Copilot for Security is a generative AI-powered assistant for daily operations in security and IT that empowers teams to protect at the speed and scale of AI.

github

: 404

AIL-framework

github

: 1.3k

beelzebub

Beelzebub is an advanced honeypot framework designed to provide a highly secure environment for detecting and analyzing cyber attacks. It offers a low code approach for easy implementation and utilizes virtualization techniques powered by OpenAI Generative Pre-trained Transformer. Key features include OpenAI Generative Pre-trained Transformer acting as Linux virtualization, SSH Honeypot, HTTP Honeypot, TCP Honeypot, Prometheus openmetrics integration, Docker integration, RabbitMQ integration, and kubernetes support. Beelzebub allows easy configuration for different services and ports, enabling users to create custom honeypot scenarios. The roadmap includes developing Beelzebub into a robust PaaS platform. The project welcomes contributions and encourages adherence to the Code of Conduct for a supportive and respectful community.

github

: 1.4k

hackingBuddyGPT

hackingBuddyGPT is a framework for testing LLM-based agents for security testing. It aims to create common ground truth by creating common security testbeds and benchmarks, evaluating multiple LLMs and techniques against those, and publishing prototypes and findings as open-source/open-access reports. The initial focus is on evaluating the efficiency of LLMs for Linux privilege escalation attacks, but the framework is being expanded to evaluate the use of LLMs for web penetration-testing and web API testing. hackingBuddyGPT is released as open-source to level the playing field for blue teams against APTs that have access to more sophisticated resources.

github

: 374

awesome-business-of-cybersecurity

The 'Awesome Business of Cybersecurity' repository is a comprehensive resource exploring the cybersecurity market, focusing on publicly traded companies, industry strategy, and AI capabilities. It provides insights into how cybersecurity companies operate, compete, and evolve across 18 solution categories and beyond. The repository offers structured information on the cybersecurity market snapshot, specialists vs. multiservice cybersecurity companies, cybersecurity stock lists, endpoint protection and threat detection, network security, identity and access management, cloud and application security, data protection and governance, security analytics and threat intelligence, non-US traded cybersecurity companies, cybersecurity ETFs, blogs and newsletters, podcasts, market insights and research, and cybersecurity solutions categories.

github

: 77

mcp-scan

MCP-Scan is a security scanning tool designed to detect common security vulnerabilities in Model Context Protocol (MCP) servers. It can auto-discover various MCP configurations, scan both local and remote servers for security issues like prompt injection attacks, tool poisoning attacks, and toxic flows. The tool operates in two main modes - 'scan' for static scanning of installed servers and 'proxy' for real-time monitoring and guardrailing of MCP connections. It offers features like scanning for specific attacks, enforcing guardrailing policies, auditing MCP traffic, and detecting changes to MCP tools. MCP-Scan does not store or log usage data and can be used to enhance the security of MCP environments.

github

: 1.1k

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 980

agentcloud

AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.

github

: 583