ail-typo-squatting

Generate list of potential typo squatting domains with domain name permutation engine to feed AIL and other systems.

Stars: 74

Visit

ail-typo-squatting is a Python library designed to generate a list of potential typo squatting domains using a domain name permutation engine. It can be used as a standalone tool or to feed other systems. The tool provides various algorithms to create typos by adding, changing, or omitting characters in domain names. It also offers DNS resolving capabilities to check the availability of generated variations. The project has been co-funded by CEF-TC-2020-2 - 2020-EU-IA-0260 - JTAN - Joint Threat Analysis Network.

README:

ail-typo-squatting

ail-typo-squatting is a Python library to generate list of potential typo squatting domains with domain name permutation engine to feed AIL and other systems.

The tool can be used as a stand-alone tool or to feed other systems.

If you don't want to use the Python library, https://typosquatting-finder.circl.lu/ is an online service which uses this library.

Requirements

Python 3.6+
inflect library
pyyaml
tldextract
dnspython

Installation

Source install

ail-typo-squatting can be install with poetry. If you don't have poetry installed, you can do the following curl -sSL https://install.python-poetry.org | python3 -.

$ poetry install
$ poetry shell
$ cd ail-typo-squatting
$ python typo.py -h

pip installation

$ pip3 install ail-typo-squatting

Usage

dacru@dacru:~/git/ail-typo-squatting/bin$ python3 typo.py --help  
usage: typo.py [-h] [-v] [-dn DOMAINNAME [DOMAINNAME ...]] [-fdn FILEDOMAINNAME] [-o OUTPUT] [-fo FORMATOUTPUT] [-br] [-dnsr] [-dnsl] [-l LIMIT] [-var] [-ko] [-a] [-om] [-repe] [-repl] [-drepl] [-cho]
               [-add] [-md] [-sd] [-vs] [-ada] [-hg] [-ahg] [-cm] [-hp] [-wt] [-wsld] [-at] [-sub] [-sp] [-cdd] [-addns] [-uddns] [-ns] [-combo] [-ca]

optional arguments:
  -h, --help            show this help message and exit
  -v                    verbose, more display
  -dn DOMAINNAME [DOMAINNAME ...], --domainName DOMAINNAME [DOMAINNAME ...]
                        list of domain name
  -fdn FILEDOMAINNAME, --filedomainName FILEDOMAINNAME
                        file containing list of domain name
  -o OUTPUT, --output OUTPUT
                        path to ouput location
  -fo FORMATOUTPUT, --formatoutput FORMATOUTPUT
                        format for the output file, yara - regex - yaml - text. Default: text
  -br, --betterregex    Use retrie for faster regex
  -dnsr, --dnsresolving
                        resolve all variation of domain name to see if it's up or not
  -dnsl, --dnslimited   resolve all variation of domain name but keep only up domain in final result json
  -l LIMIT, --limit LIMIT
                        limit of variations for a domain name
  -var, --givevariations
                        give the algo that generate variations
  -ko, --keeporiginal   Keep in the result list the original domain name
  -a, --all             Use all algo
  -om, --omission       Leave out a letter of the domain name
  -repe, --repetition   Character Repeat
  -repl, --replacement  Character replacement
  -drepl, --doublereplacement
                        Double Character Replacement
  -cho, --changeorder   Change the order of letters in word
  -add, --addition      Add a character in the domain name
  -md, --missingdot     Delete a dot from the domain name
  -sd, --stripdash      Delete of a dash from the domain name
  -vs, --vowelswap      Swap vowels within the domain name
  -ada, --adddash       Add a dash between the first and last character in a string
  -hg, --homoglyph      One or more characters that look similar to another character but are different are called homogylphs
  -ahg, --all_homoglyph
                        generate all possible homoglyph permutations. Ex: circl.lu, e1rc1.lu
  -cm, --commonmisspelling
                        Change a word by is misspellings
  -hp, --homophones     Change word by an other who sound the same when spoken
  -wt, --wrongtld       Change the original top level domain to another
  -wsld, --wrongsld     Change the original second level domain to another
  -at, --addtld         Adding a tld before the original tld
  -sub, --subdomain     Insert a dot at varying positions to create subdomain
  -sp, --singularpluralize
                        Create by making a singular domain plural and vice versa
  -cdd, --changedotdash
                        Change dot to dash
  -addns, --adddynamicdns
                        Add dynamic dns at the end of the domain
  -uddns, --updatedynamicdns
                        Update dynamic dns warning list
  -ns, --numeralswap    Change a numbers to words and vice versa. Ex: circlone.lu, circl1.lu
  -combo                Combine multiple algo on a domain name
  -ca, --catchall       Combine with -dnsr. Generate a random string in front of the domain.

Usage example

Creation of variations for ail-project.org and circl.lu, using all algorithm.

dacru@dacru:~/git/ail-typo-squatting/bin$ python3 typo.py -dn ail-project.org circl.lu -a -o .

Creation of variations for a file who contains domain name, using character omission - subdomain - hyphenation.

dacru@dacru:~/git/ail-typo-squatting/bin$ python3 typo.py -fdn domain.txt -co -sub -hyp -o . -fo yara

Creation of variations for ail-project.org and circl.lu, using all algorithm and using dns resolution.

dacru@dacru:~/git/ail-typo-squatting/bin$ python3 typo.py -dn ail-project.org circl.lu -a -dnsr -o .

Creation of variations for ail-project.org and give the algorithm that generate the variation (only for text format).

dacru@dacru:~/git/ail-typo-squatting/bin$ python3 typo.py -dn ail-project.org -a -o - -var

Used as a library

To run all algorithms

from ail_typo_squatting import runAll
import math

resultList = list()
domainList = ["google.com"]
formatoutput = "yara"
pathOutput = "."
for domain in domainList:
    resultList = runAll(
        domain=domain, 
        limit=math.inf, 
        formatoutput=formatoutput, 
        pathOutput=pathOutput, 
        verbose=False, 
        givevariations=False,
        keeporiginal=False
    )

    print(resultList)
    resultList = list()

To run specific algorithm

from ail_typo_squatting import formatOutput, omission, subdomain, addDash
import math

resultList = list()
domainList = ["google.com"]
limit = math.inf
formatoutput = "yara"
pathOutput = "."
for domain in domainList:
    resultList = omission(domain=domain, resultList=resultList, verbose=False, limit=limit, givevariations=False,  keeporiginal=False)

    resultList = subdomain(domain=domain, resultList=resultList, verbose=False, limit=limit, givevariations=False,  keeporiginal=False)

    resultList = addDash(domain=domain, resultList=resultList, verbose=False, limit=limit, givevariations=False,  keeporiginal=False)

    print(resultList)
    formatOutput(format=formatoutput, resultList=resultList, domain=domain, pathOutput=pathOutput, givevariations=False)

    resultList = list()

Sample output

There's 4 format possible for the output file:

text
yara
regex
sigma

For Text file, each line is a variation.

ail-project.org
il-project.org
al-project.org
ai-project.org
ailproject.org
ail-roject.org
ail-poject.org
ail-prject.org
ail-proect.org
ail-projct.org
ail-projet.org
ail-projec.org
aail-project.org
aiil-project.org
...

For Yara file, each rule is a variation.

rule ail-project_org {
    meta:
        domain = "ail-project.org"
    strings: 
        $s0 = "ail-project.org"
        $s1 = "il-project.org"
        $s2 = "al-project.org"
        $s3 = "ai-project.org"
        $s4 = "ailproject.org"
        $s5 = "ail-roject.org"
        $s6 = "ail-poject.org"
        $s7 = "ail-prject.org"
        $s8 = "ail-proect.org"
        $s9 = "ail-projct.org"
        $s10 = "ail-projet.org"
        $s11 = "ail-projec.org"
    condition:
         any of ($s*)
}

For Regex file, each variations is transform into regex and concatenate with other to do only one big regex.

ail\-project\.org|il\-project\.org|al\-project\.org|ai\-project\.org|ailproject\.org|ail\-roject\.org|ail\-poject\.org|ail\-prject\.org|ail\-proect\.org|ail\-projct\.org|ail\-projet\.org|ail\-projec\.org

For Sigma file, each variations are list under variations key.

title: ail-project.org
variations:
- ail-project.org
- il-project.org
- al-project.org
- ai-project.org
- ailproject.org
- ail-roject.org
- ail-poject.org
- ail-prject.org
- ail-proect.org
- ail-projct.org
- ail-projet.org
- ail-projec.org

DNS output

In case DNS resolve is selected, an additional file will be created in JSON format

each keys are variations and may have a field "ip" if the domain name have been resolved. The filed "NotExist" will be there each time with a Boolean value to determine if the domain is existing or not.

{
    "circl.lu": {
        "NotExist": false,
        "ip": [
            "185.194.93.14"
        ]
    },
    "ircl.lu": {
        "NotExist": true
    },
    "crcl.lu": {
        "NotExist": true
    },
    "cicl.lu": {
        "NotExist": true
    },
    "cirl.lu": {
        "NotExist": true
    },
    "circ.lu": {
        "NotExist": true
    },
    "ccircl.lu": {
        "NotExist": true
    },
    "ciircl.lu": {
        "NotExist": true
    },
    ...
}

List of algorithms used

Algo	Description
AddDash	These typos are created by adding a dash between the first and last character in a string.
Addition	These typos are created by add a characters in the domain name.
AddDynamicDns	These typos are created by adding a dynamic dns at the end of the original domain.
AddTld	These typos are created by adding a tld before the right tld. Example: google.com becomes google.com.it
ChangeDotDash	These typos are created by changing a dot to a dash.
ChangeOrder	These typos are created by changing the order of letters in the each part of the domain.
Combo	These typos are created by combining multiple algorithms. For example, circl.lu becomes cirl6.lu
CommonMisspelling	These typos are created by changing a word by is misspelling. Over 8000 common misspellings from Wikipedia. For example, www.youtube.com becomes www.youtub.com and www.abseil.com becomes www.absail.com.
Double Replacement	These typos are created by replacing identical, consecutive letters of the domain name.
Homoglyph	These typos are created by replacing characters to another character that look similar but are different. An example is that the lower case l looks similar to the numeral one, e.g. l vs 1. For example, google.com becomes goog1e.com.
Homophones	These typos are created by changing word by an other who sound the same when spoken. Over 450 sets of words that sound the same when spoken. For example, www.base.com becomes www.bass.com.
MissingDot	These typos are created by deleting a dot from the domain name.
NumeralSwap	These typos are created by changing a number to words and vice versa. For example, circlone.lu becomes circl1.lu.
Omission	These typos are created by leaving out a letter of the domain name, one letter at a time.
Repetition	These typos are created by repeating a letter of the domain name.
Replacement	These typos are created by replacing each letter of the domain name.
StripDash	These typos are created by deleting a dash from the domain name.
SingularPluralize	These typos are created by making a singular domain plural and vice versa.
Subdomain	These typos are created by placing a dot in the domain name in order to create subdomain. Example: google.com becomes goo.gle.com
VowelSwap	These typos are created by swapping vowels within the domain name except for the first letter. For example, www.google.com becomes www.gaagle.com.
WrongTld	These typos are created by changing the original top level domain to another. For example, www.trademe.co.nz becomes www.trademe.co.mz and www.google.com becomes www.google.org Uses the 19 most common top level domains.
WrongSld	These typos are created by changing the original second level domain to another. For example, www.trademe.co.uk becomes www.trademe.ac.uk and www.google.com will still be www.google.com .

Acknowledgment

The project has been co-funded by CEF-TC-2020-2 - 2020-EU-IA-0260 - JTAN - Joint Threat Analysis Network.

For Tasks:

Click tags to check more tools for each tasks

generate typo domains check domain availability perform dns resolving create variations for domain names detect potential threats

For Jobs:

cybersecurity analyst penetration tester security researcher threat intelligence analyst incident response analyst

Alternative AI tools for ail-typo-squatting

Similar Open Source Tools

ail-typo-squatting

github

: 74

CodeFuse-ModelCache

Codefuse-ModelCache is a semantic cache for large language models (LLMs) that aims to optimize services by introducing a caching mechanism. It helps reduce the cost of inference deployment, improve model performance and efficiency, and provide scalable services for large models. The project caches pre-generated model results to reduce response time for similar requests and enhance user experience. It integrates various embedding frameworks and local storage options, offering functionalities like cache-writing, cache-querying, and cache-clearing through RESTful API. The tool supports multi-tenancy, system commands, and multi-turn dialogue, with features for data isolation, database management, and model loading schemes. Future developments include data isolation based on hyperparameters, enhanced system prompt partitioning storage, and more versatile embedding models and similarity evaluation algorithms.

github

: 626

Noema-Declarative-AI

Noema is a framework that enables developers to control a language model and choose the path it will follow. It integrates Python with llm's generations, allowing users to use LLM as a thought interpreter rather than a source of truth. Noema is built on llama.cpp and guidance's shoulders. It applies the declarative programming paradigm to a language model, providing a way to represent functions, descriptions, and transformations. Users can create subjects, think about tasks, and generate content through generators, selectors, and code generators. Noema supports ReAct prompting, visualization, and semantic Python functionalities, offering a versatile tool for automating tasks and guiding language models.

github

: 66

ModelCache

github

: 902

Trace

Trace is a new AutoDiff-like tool for training AI systems end-to-end with general feedback. It generalizes the back-propagation algorithm by capturing and propagating an AI system's execution trace. Implemented as a PyTorch-like Python library, users can write Python code directly and use Trace primitives to optimize certain parts, similar to training neural networks.

github

: 500

superpipe

Superpipe is a lightweight framework designed for building, evaluating, and optimizing data transformation and data extraction pipelines using LLMs. It allows users to easily combine their favorite LLM libraries with Superpipe's building blocks to create pipelines tailored to their unique data and use cases. The tool facilitates rapid prototyping, evaluation, and optimization of end-to-end pipelines for tasks such as classification and evaluation of job departments based on work history. Superpipe also provides functionalities for evaluating pipeline performance, optimizing parameters for cost, accuracy, and speed, and conducting grid searches to experiment with different models and prompts.

github

: 99

swarms

Swarms provides simple, reliable, and agile tools to create your own Swarm tailored to your specific needs. Currently, Swarms is being used in production by RBC, John Deere, and many AI startups.

github

: 4.8k

cappr

CAPPr is a tool for text classification that does not require training or post-processing. It allows users to have their language models pick from a list of choices or compute the probability of a completion given a prompt. The tool aims to help users get more out of open source language models by simplifying the text classification process. CAPPr can be used with GGUF models, Hugging Face models, models from the OpenAI API, and for tasks like caching instructions, extracting final answers from step-by-step completions, and running predictions in batches with different sets of completions.

github

: 64

azure-functions-openai-extension

Azure Functions OpenAI Extension is a project that adds support for OpenAI LLM (GPT-3.5-turbo, GPT-4) bindings in Azure Functions. It provides NuGet packages for various functionalities like text completions, chat completions, assistants, embeddings generators, and semantic search. The project requires .NET 6 SDK or greater, Azure Functions Core Tools v4.x, and specific settings in Azure Function or local settings for development. It offers features like text completions, chat completion, assistants with custom skills, embeddings generators for text relatedness, and semantic search using vector databases. The project also includes examples in C# and Python for different functionalities.

github

: 87

raid

RAID is the largest and most comprehensive dataset for evaluating AI-generated text detectors. It contains over 10 million documents spanning 11 LLMs, 11 genres, 4 decoding strategies, and 12 adversarial attacks. RAID is designed to be the go-to location for trustworthy third-party evaluation of popular detectors. The dataset covers diverse models, domains, sampling strategies, and attacks, making it a valuable resource for training detectors, evaluating generalization, protecting against adversaries, and comparing to state-of-the-art models from academia and industry.

github

: 55

$fractl Screenshot$

fractl

Fractl is a programming language designed for generative AI, making it easier for developers to work with AI-generated code. It features a data-oriented and declarative syntax, making it a better fit for generative AI-powered code generation. Fractl also bridges the gap between traditional programming and visual building, allowing developers to use multiple ways of building, including traditional coding, visual development, and code generation with generative AI. Key concepts in Fractl include a graph-based hierarchical data model, zero-trust programming, declarative dataflow, resolvers, interceptors, and entity-graph-database mapping.

github

: 117

instructor-js

Instructor is a Typescript library for structured extraction in Typescript, powered by llms, designed for simplicity, transparency, and control. It stands out for its simplicity, transparency, and user-centric design. Whether you're a seasoned developer or just starting out, you'll find Instructor's approach intuitive and steerable.

github

: 299

zshot

Zshot is a highly customizable framework for performing Zero and Few shot named entity and relationships recognition. It can be used for mentions extraction, wikification, zero and few shot named entity recognition, zero and few shot named relationship recognition, and visualization of zero-shot NER and RE extraction. The framework consists of two main components: the mentions extractor and the linker. There are multiple mentions extractors and linkers available, each serving a specific purpose. Zshot also includes a relations extractor and a knowledge extractor for extracting relations among entities and performing entity classification. The tool requires Python 3.6+ and dependencies like spacy, torch, transformers, evaluate, and datasets for evaluation over datasets like OntoNotes. Optional dependencies include flair and blink for additional functionalities. Zshot provides examples, tutorials, and evaluation methods to assess the performance of the components.

github

: 329

xFinder

xFinder is a model specifically designed for key answer extraction from large language models (LLMs). It addresses the challenges of unreliable evaluation methods by optimizing the key answer extraction module. The model achieves high accuracy and robustness compared to existing frameworks, enhancing the reliability of LLM evaluation. It includes a specialized dataset, the Key Answer Finder (KAF) dataset, for effective training and evaluation. xFinder is suitable for researchers and developers working with LLMs to improve answer extraction accuracy.

github

: 153

MotionLLM

MotionLLM is a framework for human behavior understanding that leverages Large Language Models (LLMs) to jointly model videos and motion sequences. It provides a unified training strategy, dataset MoVid, and MoVid-Bench for evaluating human behavior comprehension. The framework excels in captioning, spatial-temporal comprehension, and reasoning abilities.

github

: 212

FlashRank

FlashRank is an ultra-lite and super-fast Python library designed to add re-ranking capabilities to existing search and retrieval pipelines. It is based on state-of-the-art Language Models (LLMs) and cross-encoders, offering support for pairwise/pointwise rerankers and listwise LLM-based rerankers. The library boasts the tiniest reranking model in the world (~4MB) and runs on CPU without the need for Torch or Transformers. FlashRank is cost-conscious, with a focus on low cost per invocation and smaller package size for efficient serverless deployments. It supports various models like ms-marco-TinyBERT, ms-marco-MiniLM, rank-T5-flan, ms-marco-MultiBERT, and more, with plans for future model additions. The tool is ideal for enhancing search precision and speed in scenarios where lightweight models with competitive performance are preferred.

github

: 541

For similar tasks

ail-typo-squatting

github

: 74

For similar jobs

ciso-assistant-community

CISO Assistant is a tool that helps organizations manage their cybersecurity posture and compliance. It provides a centralized platform for managing security controls, threats, and risks. CISO Assistant also includes a library of pre-built frameworks and tools to help organizations quickly and easily implement best practices.

github

: 2.8k

PurpleLlama

Purple Llama is an umbrella project that aims to provide tools and evaluations to support responsible development and usage of generative AI models. It encompasses components for cybersecurity and input/output safeguards, with plans to expand in the future. The project emphasizes a collaborative approach, borrowing the concept of purple teaming from cybersecurity, to address potential risks and challenges posed by generative AI. Components within Purple Llama are licensed permissively to foster community collaboration and standardize the development of trust and safety tools for generative AI.

github

: 2.9k

vpnfast.github.io

VPNFast is a lightweight and fast VPN service provider that offers secure and private internet access. With VPNFast, users can protect their online privacy, bypass geo-restrictions, and secure their internet connection from hackers and snoopers. The service provides high-speed servers in multiple locations worldwide, ensuring a reliable and seamless VPN experience for users. VPNFast is easy to use, with a user-friendly interface and simple setup process. Whether you're browsing the web, streaming content, or accessing sensitive information, VPNFast helps you stay safe and anonymous online.

github

: 80

taranis-ai

Taranis AI is an advanced Open-Source Intelligence (OSINT) tool that leverages Artificial Intelligence to revolutionize information gathering and situational analysis. It navigates through diverse data sources like websites to collect unstructured news articles, utilizing Natural Language Processing and Artificial Intelligence to enhance content quality. Analysts then refine these AI-augmented articles into structured reports that serve as the foundation for deliverables such as PDF files, which are ultimately published.

github

: 358

NightshadeAntidote

Nightshade Antidote is an image forensics tool used to analyze digital images for signs of manipulation or forgery. It implements several common techniques used in image forensics including metadata analysis, copy-move forgery detection, frequency domain analysis, and JPEG compression artifacts analysis. The tool takes an input image, performs analysis using the above techniques, and outputs a report summarizing the findings.

github

: 163

h4cker

This repository is a comprehensive collection of cybersecurity-related references, scripts, tools, code, and other resources. It is carefully curated and maintained by Omar Santos. The repository serves as a supplemental material provider to several books, video courses, and live training created by Omar Santos. It encompasses over 10,000 references that are instrumental for both offensive and defensive security professionals in honing their skills.

github

: 20.4k

AIMr

AIMr is an AI aimbot tool written in Python that leverages modern technologies to achieve an undetected system with a pleasing appearance. It works on any game that uses human-shaped models. To optimize its performance, users should build OpenCV with CUDA. For Valorant, additional perks in the Discord and an Arduino Leonardo R3 are required.

github

: 229

admyral

Admyral is an open-source Cybersecurity Automation & Investigation Assistant that provides a unified console for investigations and incident handling, workflow automation creation, automatic alert investigation, and next step suggestions for analysts. It aims to tackle alert fatigue and automate security workflows effectively by offering features like workflow actions, AI actions, case management, alert handling, and more. Admyral combines security automation and case management to streamline incident response processes and improve overall security posture. The tool is open-source, transparent, and community-driven, allowing users to self-host, contribute, and collaborate on integrations and features.

github

: 293