
AIL-framework
AIL framework - Analysis Information Leak framework. Project moved to https://github.com/ail-project
Stars: 1315

AIL framework is a modular framework to analyze potential information leaks from unstructured data sources like pastes from Pastebin or similar services or unstructured data streams. AIL framework is flexible and can be extended to support other functionalities to mine or process sensitive information (e.g. data leak prevention).
README:
Latest Release | |
CI | |
Gitter | |
Contributors | |
License |
AIL framework - Framework for Analysis of Information Leaks
AIL is a modular framework to analyse potential information leaks from unstructured data sources like pastes from Pastebin or similar services or unstructured data streams. AIL framework is flexible and can be extended to support other functionalities to mine or process sensitive information (e.g. data leak prevention).
AIL v5.0 introduces significant improvements and new features:
- Codebase Rewrite: The codebase has undergone a substantial rewrite, resulting in enhanced performance and speed improvements.
- Database Upgrade: The database has been migrated from ARDB to Kvrocks.
- New Correlation Engine: AIL v5.0 introduces a new powerful correlation engine with two new correlation types: CVE and Title.
- Enhanced Logging: The logging system has been improved to provide better troubleshooting capabilities.
- Tagging Support: AIL objects now support tagging, allowing users to categorize and label extracted information for easier analysis and organization.
- Trackers: Improved objects filtering, PGP and decoded tracking added.
- UI Content Visualization: The user interface has been upgraded to visualize extracted and tracked information.
- New Crawler Lacus: improve crawling capabilities.
- Modular Importers and Exporters: New importers (ZMQ, AIL Feeders) and exporters (MISP, Mail, TheHive) modular design. Allow easy creation and customization by extending an abstract class.
- Module Queues: improved the queuing mechanism between detection modules.
- New Object CVE and Title: Extract an correlate CVE IDs and web page titles.
- Modular architecture to handle streams of unstructured or structured information
- Default support for external ZMQ feeds, such as provided by CIRCL or other providers
- Multiple Importers and feeds support
- Each module can process and reprocess the information already analyzed by AIL
- Detecting and extracting URLs including their geographical location (e.g. IP address location)
- Extracting and validating potential leaks of credit card numbers, credentials, ...
- Extracting and validating leaked email addresses, including DNS MX validation
- Module for extracting Tor .onion addresses for further analysis
- Keep tracks of credentials duplicates (and diffing between each duplicate found)
- Extracting and validating potential hostnames (e.g. to feed Passive DNS systems)
- A full-text indexer module to index unstructured information
- Terms, Set of terms, Regex, typo squatting and YARA tracking and occurrence
- YARA Retro Hunt
- Many more modules for extracting phone numbers, credentials, and more
- Alerting to MISP to share found leaks within a threat intelligence platform using MISP standard
- Detecting and decoding encoded file (Base64, hex encoded or your own decoding scheme) and storing files
- Detecting Amazon AWS and Google API keys
- Detecting Bitcoin address and Bitcoin private keys
- Detecting private keys, certificate, keys (including SSH, OpenVPN)
- Detecting IBAN bank accounts
- Tagging system with MISP Galaxy and MISP Taxonomies tags
- UI submission
- Create events on MISP and cases on The Hive
- Automatic export on detection with MISP (events) and The Hive (alerts) on selected tags
- Extracted and decoded files can be searched by date range, type of file (mime-type) and encoding discovered
- Correlations engine and Graph to visualize relationships between decoded files (hashes), PGP UIDs, domains, username, and cryptocurrencies addresses
- Websites, Forums and Tor Hidden-Services hidden services crawler to crawl and parse output
- Domain availability monitoring to detect up and down of websites and hidden services
- Browsed hidden services are automatically captured and integrated into the analyzed output, including a blurring screenshot interface (to avoid "burning the eyes" of security analysts with sensitive content)
- Tor hidden services is part of the standard framework, all the AIL modules are available to the crawled hidden services
- Crawler scheduler to trigger crawling on demand or at regular intervals for URLs or Tor hidden services
To install the AIL framework, run the following commands:
# Clone the repo first
git clone https://github.com/ail-project/ail-framework.git
cd ail-framework
# For Debian and Ubuntu based distributions
./installing_deps.sh
# Launch ail
cd ~/ail-framework/
cd bin/
./LAUNCH.sh -l
The default installing_deps.sh is for Debian and Ubuntu based distributions.
Requirement:
- Python 3.8+
For Lacus Crawler and LibreTranslate installation instructions (if you want to use those features), refer to the HOWTO
To start AIL, use the following commands:
cd bin/
./LAUNCH.sh -l
You can access the AIL framework web interface at the following URL:
https://localhost:7000/
The default credentials for the web interface are located in the DEFAULT_PASSWORD
file, which is deleted when you change your password.
CIRCL organises training on how to use or extend the AIL framework. AIL training materials are available at https://github.com/ail-project/ail-training.
The API documentation is available in doc/api.md
HOWTO are available in HOWTO.md
For information on AIL's compliance with GDPR and privacy considerations, refer to the AIL information leaks analysis and the GDPR in the context of collection, analysis and sharing information leaks document.
this document provides an overview how to use AIL in a lawfulness context especially in the scope of General Data Protection Regulation.
If you use or reference AIL in an academic paper, you can cite it using the following BibTeX:
@inproceedings{mokaddem2018ail,
title={AIL-The design and implementation of an Analysis Information Leak framework},
author={Mokaddem, Sami and Wagener, G{\'e}rard and Dulaunoy, Alexandre},
booktitle={2018 IEEE International Conference on Big Data (Big Data)},
pages={5049--5057},
year={2018},
organization={IEEE}
}
Websites, Forums and Tor Hidden-Services
Copyright (C) 2014 Jules Debra
Copyright (c) 2021 Olivier Sagit
Copyright (C) 2014-2024 CIRCL - Computer Incident Response Center Luxembourg (c/o smile, security made in Lëtzebuerg, Groupement d'Intérêt Economique)
Copyright (c) 2014-2024 Raphaël Vinot
Copyright (c) 2014-2024 Alexandre Dulaunoy
Copyright (c) 2016-2024 Sami Mokaddem
Copyright (c) 2018-2024 Thirion Aurélien
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for AIL-framework
Similar Open Source Tools

AIL-framework
AIL framework is a modular framework to analyze potential information leaks from unstructured data sources like pastes from Pastebin or similar services or unstructured data streams. AIL framework is flexible and can be extended to support other functionalities to mine or process sensitive information (e.g. data leak prevention).

ail-framework
AIL framework is a modular framework to analyze potential information leaks from unstructured data sources like pastes from Pastebin or similar services or unstructured data streams. AIL framework is flexible and can be extended to support other functionalities to mine or process sensitive information (e.g. data leak prevention).

swiftide
Swiftide is a fast, streaming indexing and query library tailored for Retrieval Augmented Generation (RAG) in AI applications. It is built in Rust, utilizing parallel, asynchronous streams for blazingly fast performance. With Swiftide, users can easily build AI applications from idea to production in just a few lines of code. The tool addresses frustrations around performance, stability, and ease of use encountered while working with Python-based tooling. It offers features like fast streaming indexing pipeline, experimental query pipeline, integrations with various platforms, loaders, transformers, chunkers, embedders, and more. Swiftide aims to provide a platform for data indexing and querying to advance the development of automated Large Language Model (LLM) applications.

DocsGPT
DocsGPT is an open-source documentation assistant powered by GPT models. It simplifies the process of searching for information in project documentation by allowing developers to ask questions and receive accurate answers. With DocsGPT, users can say goodbye to manual searches and quickly find the information they need. The tool aims to revolutionize project documentation experiences and offers features like live previews, Discord community, guides, and contribution opportunities. It consists of a Flask app, Chrome extension, similarity search index creation script, and a frontend built with Vite and React. Users can quickly get started with DocsGPT by following the provided setup instructions and can contribute to its development by following the guidelines in the CONTRIBUTING.md file. The project follows a Code of Conduct to ensure a harassment-free community environment for all participants. DocsGPT is licensed under MIT and is built with LangChain.

StratosphereLinuxIPS
Slips is a powerful endpoint behavioral intrusion prevention and detection system that uses machine learning to detect malicious behaviors in network traffic. It can work with network traffic in real-time, PCAP files, and network flows from tools like Suricata, Zeek/Bro, and Argus. Slips threat detection is based on machine learning models, threat intelligence feeds, and expert heuristics. It gathers evidence of malicious behavior and triggers alerts when enough evidence is accumulated. The tool is Python-based and supported on Linux and MacOS, with blocking features only on Linux. Slips relies on Zeek network analysis framework and Redis for interprocess communication. It offers a graphical user interface for easy monitoring and analysis.

mlflow
MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. MLflow offers a set of lightweight APIs that can be used with any existing machine learning application or library (TensorFlow, PyTorch, XGBoost, etc), wherever you currently run ML code (e.g. in notebooks, standalone applications or the cloud). MLflow's current components are:
* `MLflow Tracking

cleanlab
Cleanlab helps you **clean** data and **lab** els by automatically detecting issues in a ML dataset. To facilitate **machine learning with messy, real-world data** , this data-centric AI package uses your _existing_ models to estimate dataset problems that can be fixed to train even _better_ models.

DataDreamer
DataDreamer is a powerful open-source Python library designed for prompting, synthetic data generation, and training workflows. It is simple, efficient, and research-grade, allowing users to create prompting workflows, generate synthetic datasets, and train models with ease. The library is built for researchers, by researchers, focusing on correctness, best practices, and reproducibility. It offers features like aggressive caching, resumability, support for bleeding-edge techniques, and easy sharing of datasets and models. DataDreamer enables users to run multi-step prompting workflows, generate synthetic datasets for various tasks, and train models by aligning, fine-tuning, instruction-tuning, and distilling them using existing or synthetic data.

DataFlow
DataFlow is a data preparation and training system designed to parse, generate, process, and evaluate high-quality data from noisy sources, improving the performance of large language models in specific domains. It constructs diverse operators and pipelines, validated to enhance domain-oriented LLM's performance in fields like healthcare, finance, and law. DataFlow also features an intelligent DataFlow-agent capable of dynamically assembling new pipelines by recombining existing operators on demand.

repromodel
ReproModel is an open-source toolbox designed to boost AI research efficiency by enabling researchers to reproduce, compare, train, and test AI models faster. It provides standardized models, dataloaders, and processing procedures, allowing researchers to focus on new datasets and model development. With a no-code solution, users can access benchmark and SOTA models and datasets, utilize training visualizations, extract code for publication, and leverage an LLM-powered automated methodology description writer. The toolbox helps researchers modularize development, compare pipeline performance reproducibly, and reduce time for model development, computation, and writing. Future versions aim to facilitate building upon state-of-the-art research by loading previously published study IDs with verified code, experiments, and results stored in the system.

semantic-router
The Semantic Router is an intelligent routing tool that utilizes a Mixture-of-Models (MoM) approach to direct OpenAI API requests to the most suitable models based on semantic understanding. It enhances inference accuracy by selecting models tailored to different types of tasks. The tool also automatically selects relevant tools based on the prompt to improve tool selection accuracy. Additionally, it includes features for enterprise security such as PII detection and prompt guard to protect user privacy and prevent misbehavior. The tool implements similarity caching to reduce latency. The comprehensive documentation covers setup instructions, architecture guides, and API references.

HAMi
HAMi is a Heterogeneous AI Computing Virtualization Middleware designed to manage Heterogeneous AI Computing Devices in a Kubernetes cluster. It allows for device sharing, device memory control, device type specification, and device UUID specification. The tool is easy to use and does not require modifying task YAML files. It includes features like hard limits on device memory, partial device allocation, streaming multiprocessor limits, and core usage specification. HAMi consists of components like a mutating webhook, scheduler extender, device plugins, and in-container virtualization techniques. It is suitable for scenarios requiring device sharing, specific device memory allocation, GPU balancing, low utilization optimization, and scenarios needing multiple small GPUs. The tool requires prerequisites like NVIDIA drivers, CUDA version, nvidia-docker, Kubernetes version, glibc version, and helm. Users can install, upgrade, and uninstall HAMi, submit tasks, and monitor cluster information. The tool's roadmap includes supporting additional AI computing devices, video codec processing, and Multi-Instance GPUs (MIG).

vts
VTS (Vector Transport Service) is an open-source tool developed by Zilliz based on Apache Seatunnel for moving vectors and unstructured data. It addresses data migration needs, supports real-time data streaming and offline import, simplifies unstructured data transformation, and ensures end-to-end data quality. Core capabilities include rich connectors, stream and batch processing, distributed snapshot support, high performance, and real-time monitoring. Future developments include incremental synchronization, advanced data transformation, and enhanced monitoring. VTS supports various connectors for data migration and offers advanced features like Transformers, cluster mode deployment, RESTful API, Docker deployment, and more.

starwhale
Starwhale is an MLOps/LLMOps platform that brings efficiency and standardization to machine learning operations. It streamlines the model development lifecycle, enabling teams to optimize workflows around key areas like model building, evaluation, release, and fine-tuning. Starwhale abstracts Model, Runtime, and Dataset as first-class citizens, providing tailored capabilities for common workflow scenarios including Models Evaluation, Live Demo, and LLM Fine-tuning. It is an open-source platform designed for clarity and ease of use, empowering developers to build customized MLOps features tailored to their needs.

nous
Nous is an open-source TypeScript platform for autonomous AI agents and LLM based workflows. It aims to automate processes, support requests, review code, assist with refactorings, and more. The platform supports various integrations, multiple LLMs/services, CLI and web interface, human-in-the-loop interactions, flexible deployment options, observability with OpenTelemetry tracing, and specific agents for code editing, software engineering, and code review. It offers advanced features like reasoning/planning, memory and function call history, hierarchical task decomposition, and control-loop function calling options. Nous is designed to be a flexible platform for the TypeScript community to expand and support different use cases and integrations.

Mooncake
Mooncake is a serving platform for Kimi, a leading LLM service provided by Moonshot AI. It features a KVCache-centric disaggregated architecture that separates prefill and decoding clusters, leveraging underutilized CPU, DRAM, and SSD resources of the GPU cluster. Mooncake's scheduler balances throughput and latency-related SLOs, with a prediction-based early rejection policy for highly overloaded scenarios. It excels in long-context scenarios, achieving up to a 525% increase in throughput while handling 75% more requests under real workloads.
For similar tasks

AIL-framework
AIL framework is a modular framework to analyze potential information leaks from unstructured data sources like pastes from Pastebin or similar services or unstructured data streams. AIL framework is flexible and can be extended to support other functionalities to mine or process sensitive information (e.g. data leak prevention).

ail-framework
AIL framework is a modular framework to analyze potential information leaks from unstructured data sources like pastes from Pastebin or similar services or unstructured data streams. AIL framework is flexible and can be extended to support other functionalities to mine or process sensitive information (e.g. data leak prevention).
For similar jobs

Copilot-For-Security
Microsoft Copilot for Security is a generative AI-powered assistant for daily operations in security and IT that empowers teams to protect at the speed and scale of AI.

AIL-framework
AIL framework is a modular framework to analyze potential information leaks from unstructured data sources like pastes from Pastebin or similar services or unstructured data streams. AIL framework is flexible and can be extended to support other functionalities to mine or process sensitive information (e.g. data leak prevention).

beelzebub
Beelzebub is an advanced honeypot framework designed to provide a highly secure environment for detecting and analyzing cyber attacks. It offers a low code approach for easy implementation and utilizes virtualization techniques powered by OpenAI Generative Pre-trained Transformer. Key features include OpenAI Generative Pre-trained Transformer acting as Linux virtualization, SSH Honeypot, HTTP Honeypot, TCP Honeypot, Prometheus openmetrics integration, Docker integration, RabbitMQ integration, and kubernetes support. Beelzebub allows easy configuration for different services and ports, enabling users to create custom honeypot scenarios. The roadmap includes developing Beelzebub into a robust PaaS platform. The project welcomes contributions and encourages adherence to the Code of Conduct for a supportive and respectful community.

hackingBuddyGPT
hackingBuddyGPT is a framework for testing LLM-based agents for security testing. It aims to create common ground truth by creating common security testbeds and benchmarks, evaluating multiple LLMs and techniques against those, and publishing prototypes and findings as open-source/open-access reports. The initial focus is on evaluating the efficiency of LLMs for Linux privilege escalation attacks, but the framework is being expanded to evaluate the use of LLMs for web penetration-testing and web API testing. hackingBuddyGPT is released as open-source to level the playing field for blue teams against APTs that have access to more sophisticated resources.

awesome-business-of-cybersecurity
The 'Awesome Business of Cybersecurity' repository is a comprehensive resource exploring the cybersecurity market, focusing on publicly traded companies, industry strategy, and AI capabilities. It provides insights into how cybersecurity companies operate, compete, and evolve across 18 solution categories and beyond. The repository offers structured information on the cybersecurity market snapshot, specialists vs. multiservice cybersecurity companies, cybersecurity stock lists, endpoint protection and threat detection, network security, identity and access management, cloud and application security, data protection and governance, security analytics and threat intelligence, non-US traded cybersecurity companies, cybersecurity ETFs, blogs and newsletters, podcasts, market insights and research, and cybersecurity solutions categories.

mcp-scan
MCP-Scan is a security scanning tool designed to detect common security vulnerabilities in Model Context Protocol (MCP) servers. It can auto-discover various MCP configurations, scan both local and remote servers for security issues like prompt injection attacks, tool poisoning attacks, and toxic flows. The tool operates in two main modes - 'scan' for static scanning of installed servers and 'proxy' for real-time monitoring and guardrailing of MCP connections. It offers features like scanning for specific attacks, enforcing guardrailing policies, auditing MCP traffic, and detecting changes to MCP tools. MCP-Scan does not store or log usage data and can be used to enhance the security of MCP environments.

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.