top_secret

top_secret

Filter sensitive information from free text before sending it to external services or APIs, such as chatbots and LLMs.

Stars: 240

Visit
 screenshot

Top Secret is a Ruby gem designed to filter sensitive information from free text before sending it to external services or APIs, such as chatbots and LLMs. It provides default filters for credit cards, emails, phone numbers, social security numbers, people's names, and locations, with the ability to add custom filters. Users can configure the tool to handle sensitive information redaction, scan for sensitive data, batch process messages, and restore filtered text from external services. Top Secret uses Regex and NER filters to detect and redact sensitive information, allowing users to override default filters, disable specific filters, and add custom filters globally. The tool is suitable for applications requiring data privacy and security measures.

README:

Top Secret

Ruby

Filter sensitive information from free text before sending it to external services or APIs, such as chatbots and LLMs.

By default it filters the following:

  • Credit cards
  • Emails
  • Phone numbers
  • Social security numbers
  • People's names
  • Locations

However, you can add your own custom filters.

Installation

Install the gem and add to the application's Gemfile by executing:

bundle add top_secret

If bundler is not being used to manage dependencies, install the gem by executing:

gem install top_secret

[!IMPORTANT] Top Secret depends on MITIE Ruby, which depends on MITIE.

You'll need to download and extract ner_model.dat first.

[!TIP] Due to its large size, you'll likely want to avoid committing ner_model.dat into version control.

You'll need to ensure the file exists in deployed environments. See relevant discussion for details.

Alternatively, you can disable NER filtering entirely by setting model_path to nil if you only need regex-based filters (credit cards, emails, phone numbers, SSNs). This improves performance and eliminates the model file dependency.

By default, Top Secret assumes the file will live at the root of your project, but this can be configured.

TopSecret.configure do |config|
  config.model_path = "path/to/ner_model.dat"
end

Default Filters

Top Secret ships with a set of filters to detect and redact the most common types of sensitive information.

You can override, disable, or add to this list as needed.

By default, the following filters are enabled

credit_card_filter

Matches common credit card formats

result = TopSecret::Text.filter("My card number is 4242-4242-4242-4242")
result.output

# => "My card number is [CREDIT_CARD_1]"

email_filter

Matches email addresses

result = TopSecret::Text.filter("Email me at [email protected]")
result.output

# => "Email me at [EMAIL_1]"

phone_number_filter

Matches phone numbers

result = TopSecret::Text.filter("Call me at 555-555-5555")
result.output

# => "Call me at [PHONE_NUMBER_1]"

ssn_filter

Matches U.S. Social Security numbers

result = TopSecret::Text.filter("My SSN is 123-45-6789")
result.output

# => "My SSN is [SSN_1]"

people_filter

Detects names of people (NER-based)

result = TopSecret::Text.filter("Ralph is joining the meeting")
result.output

# => "[PERSON_1] is joining the meeting"

location_filter

Detects location names (NER-based)

result = TopSecret::Text.filter("Let's meet in Boston")
result.output

# => "Let's meet in [LOCATION_1]"

Usage

TopSecret::Text.filter("Ralph can be reached at [email protected]")

This will return

<TopSecret::Text::Result
  @input="Ralph can be reached at [email protected]",
  @mapping={:EMAIL_1=>"[email protected]", :PERSON_1=>"Ralph"},
  @output="[PERSON_1] can be reached at [EMAIL_1]"
>

View the original text

result.input

# => "Ralph can be reached at [email protected]"

View the filtered text

result.output

# => "[PERSON_1] can be reached at [EMAIL_1]"

View the mapping

result.mapping

# => {:EMAIL_1=>"[email protected]", :PERSON_1=>"Ralph"}

Scanning for Sensitive Information

Use TopSecret::Text.scan to detect sensitive information without redacting the text. This is useful when you only need to check if sensitive data exists or get a mapping of what was found:

TopSecret::Text.scan("Ralph can be reached at [email protected]")

This will return

<TopSecret::Text::ScanResult
  @mapping={:EMAIL_1=>"[email protected]", :PERSON_1=>"Ralph"}
>

Check if sensitive information was found

result.sensitive?

# => true

View the mapping of found sensitive information

result.mapping

# => {:EMAIL_1=>"[email protected]", :PERSON_1=>"Ralph"}

The scan method accepts the same filter options as filter:

# Override default filters
email_filter =  TopSecret::Filters::Regex.new(
  label: "EMAIL_ADDRESS",
  regex: /\w+\[at\]\w+\.\w+/
)
result = TopSecret::Text.scan("Contact user[at]example.com", email_filter:)
result.mapping
# => {:EMAIL_ADDRESS_1=>"user[at]example.com"}

# Disable specific filters
result = TopSecret::Text.scan("Ralph works in Boston", people_filter: nil)
result.mapping
# => {:LOCATION_1=>"Boston"}

# Add custom filters
ip_filter = TopSecret::Filters::Regex.new(
  label: "IP_ADDRESS",
  regex: /\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/
)
result = TopSecret::Text.scan("Server IP is 192.168.1.1", custom_filters: [ip_filter])
result.mapping
# => {:IP_ADDRESS_1=>"192.168.1.1"}

Batch Processing

When processing multiple messages, use filter_all to ensure consistent redaction labels across all messages:

messages = [
  "Contact [email protected] for details",
  "Email [email protected] again if needed",
  "Also CC [email protected] on the thread"
]

result = TopSecret::Text.filter_all(messages)

This will return

<TopSecret::Text::BatchResult
  @mapping={:EMAIL_1=>"[email protected]", :EMAIL_2=>"[email protected]"},
  @items=[
    <TopSecret::Text::BatchResult::Item @input="Contact [email protected] for details", @output="Contact [EMAIL_1] for details">,
    <TopSecret::Text::BatchResult::Item @input="Email [email protected] again if needed", @output="Email [EMAIL_1] again if needed">,
    <TopSecret::Text::BatchResult::Item @input="Also CC [email protected] on the thread", @output="Also CC [EMAIL_2] on the thread">
  ]
>

Access the global mapping

result.mapping

# => {:EMAIL_1=>"[email protected]", :EMAIL_2=>"[email protected]"}

Access individual items

result.items[0].input
# => "Contact [email protected] for details"

result.items[0].output
# => "Contact [EMAIL_1] for details"

The key benefit is that identical values receive the same labels across all messages - notice how [email protected] becomes [EMAIL_1] in both the first and second messages.

Restoring Filtered Text

When external services (like LLMs) return responses containing filter placeholders, use TopSecret::FilteredText.restore to substitute them back with original values:

# Filter messages before sending to LLM
messages = ["Contact [email protected] for details"]
batch_result = TopSecret::Text.filter_all(messages)

# Send filtered text to LLM: "Contact [EMAIL_1] for details"
# LLM responds with: "I'll email [EMAIL_1] about this request"
llm_response = "I'll email [EMAIL_1] about this request"

# Restore the original values
restore_result = TopSecret::FilteredText.restore(llm_response, mapping: batch_result.mapping)

This will return

<TopSecret::FilteredText::Result
  @output="I'll email [email protected] about this request",
  @restored=["[EMAIL_1]"],
  @unrestored=[]
>

Access the restored text

restore_result.output
# => "I'll email [email protected] about this request"

Track which placeholders were restored

restore_result.restored
# => ["[EMAIL_1]"]

restore_result.unrestored
# => []

The restoration process tracks both successful and failed placeholder substitutions, allowing you to handle cases where the LLM response contains placeholders not found in your mapping.

Working with LLMs

When sending filtered information to LLMs, they'll likely need to be instructed on how to handle those filters. Otherwise, we risk them not being returned in the response, which would break the restoration process.

Here's a recommended approach:

instructions = <<~TEXT
  I'm going to send filtered information to you in the form of free text.
  If you need to refer to the filtered information in a response, just reference it by the filter.
TEXT

Complete example:

require "openai"
require "top_secret"

openai = OpenAI::Client.new(
  api_key: Rails.application.credentials.openai.api_key!
)

original_messages = [
  "Ralph lives in Boston.",
  "You can reach them at [email protected] or 877-976-2687"
]

# Filter all messages
result = TopSecret::Text.filter_all(original_messages)
filtered_messages = result.items.map(&:output)

user_messages = filtered_messages.map { {role: "user", content: it} }

# Instruct LLM how to handle filtered messages
instructions = <<~TEXT
  I'm going to send filtered information to you in the form of free text.
  If you need to refer to the filtered information in a response, just reference it by the filter.
TEXT

messages = [
  {role: "system", content: instructions},
  *user_messages
]

chat_completion = openai.chat.completions.create(messages:, model: :"gpt-5")
response = chat_completion.choices.last.message.content

# Restore the response from the mapping
mapping = result.mapping
restored_response = TopSecret::FilteredText.restore(response, mapping:).output

puts(restored_response)

Advanced Examples

Overriding the default filters

When overriding or disabling a default filter, you must map to the correct key.

[!IMPORTANT] Invalid filter keys will raise an ArgumentError. Only the following keys are valid: credit_card_filter, email_filter, phone_number_filter, ssn_filter, people_filter, location_filter

regex_filter = TopSecret::Filters::Regex.new(label: "EMAIL_ADDRESS", regex: /\b\w+\[at\]\w+\.\w+\b/)
ner_filter = TopSecret::Filters::NER.new(label: "NAME", tag: :person, min_confidence_score: 0.25)

TopSecret::Text.filter("Ralph can be reached at ralph[at]thoughtbot.com",
  email_filter: regex_filter,
  people_filter: ner_filter
)

This will return

<TopSecret::Text::Result
  @input="Ralph can be reached at ralph[at]thoughtbot.com",
  @mapping={:EMAIL_ADDRESS_1=>"ralph[at]thoughtbot.com", :NAME_1=>"Ralph", :NAME_2=>"ralph["},
  @output="[NAME_1] can be reached at [EMAIL_ADDRESS_1]"
>

Disabling a default filter

TopSecret::Text.filter("Ralph can be reached at [email protected]",
  email_filter: nil,
  people_filter: nil
)

This will return

<TopSecret::Text::Result
  @input="Ralph can be reached at [email protected]",
  @mapping={},
  @output="Ralph can be reached at [email protected]"
>

Error handling for invalid filter keys

# This will raise ArgumentError: Unknown key: :invalid_filter. Valid keys are: ...
TopSecret::Text.filter("some text", invalid_filter: some_filter)

Custom Filters

Adding new Regex filters

ip_address_filter = TopSecret::Filters::Regex.new(
  label: "IP_ADDRESS",
  regex: /\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/
)

TopSecret::Text.filter("Ralph's IP address is 192.168.1.1",
  custom_filters: [ip_address_filter]
)

This will return

<TopSecret::Text::Result
  @input="Ralph's IP address is 192.168.1.1",
  @mapping={:PERSON_1=>"Ralph", :IP_ADDRESS_1=>"192.168.1.1"},
  @output="[PERSON_1]'s IP address is [IP_ADDRESS_1]"
>

Adding new NER filters

Since MITIE Ruby has an API for training a model, you're free to add new NER filters.

language_filter = TopSecret::Filters::NER.new(
  label: "LANGUAGE",
  tag: :language,
  min_confidence_score: 0.75
)

TopSecret::Text.filter("Ralph's favorite programming language is Ruby.",
  custom_filters: [language_filter]
)

This will return

<TopSecret::Text::Result
  @input="Ralph's favorite programming language is Ruby.",
  @mapping={:PERSON_1=>"Ralph", :LANGUAGE_1=>"Ruby"},
  @output="[PERSON_1]'s favorite programming language is [LANGUAGE_1]"
>

How Filters Work

Top Secret uses two types of filters to detect and redact sensitive information:

TopSecret::Filters::Regex

Regex filters use regular expressions to find patterns in text. They are useful for structured data like credit card numbers, emails, or IP addresses.

regex_filter = TopSecret::Filters::Regex.new(
  label: "IP_ADDRESS",
  regex: /\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/
)

result = TopSecret::Text.filter("Server IP: 192.168.1.1",
  custom_filters: [regex_filter]
)

result.output
# => "Server IP: [IP_ADDRESS_1]"

TopSecret::Filters::NER

NER (Named Entity Recognition) filters use the MITIE library to detect entities like people, locations, and other categories based on trained language models. They are ideal for free-form text where patterns are less predictable.

ner_filter = TopSecret::Filters::NER.new(
  label: "PERSON",
  tag: :person,
  min_confidence_score: 0.25
)

result = TopSecret::Text.filter("Ralph and Ruby work at thoughtbot.",
  people_filter: ner_filter
)

result.output
# => "[PERSON_1] and [PERSON_2] work at thoughtbot."

NER filters match based on the tag you specify (:person, :location, etc.) and only include matches with a confidence score above min_confidence_score.

Supported NER Tags

By default, Top Secret only ships with NER filters for two entity types:

  • :person
  • :location

If you need other tags you can train your own MITIE model and add custom NER filters:

Configuration

Overriding the model path

TopSecret.configure do |config|
  config.model_path = "path/to/ner_model.dat"
end

Disabling NER filtering

For improved performance or when the MITIE model file cannot be deployed, you can disable NER-based filtering entirely. This will disable people and location detection but retain all regex-based filters (credit cards, emails, phone numbers, SSNs):

TopSecret.configure do |config|
  config.model_path = nil
end

This is useful in environments where:

  • The model file cannot be deployed due to size constraints
  • You only need regex-based filtering
  • You want to optimize for performance over NER capabilities

Overriding the confidence score

TopSecret.configure do |config|
  config.min_confidence_score = 0.75
end

Overriding the default filters

TopSecret.configure do |config|
  config.email_filter = TopSecret::Filters::Regex.new(
    label: "EMAIL_ADDRESS",
    regex: /\b\w+\[at\]\w+\.\w+\b/
  )
end

Disabling a default filter

TopSecret.configure do |config|
  config.email_filter = nil
end

Adding custom filters globally

ip_address_filter = TopSecret::Filters::Regex.new(
  label: "IP_ADDRESS",
  regex: /\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/
)

TopSecret.configure do |config|
  config.custom_filters << ip_address_filter
end

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

[!IMPORTANT] Top Secret depends on MITIE Ruby, which depends on MITIE.

You'll need to download and extract ner_model.dat first, and place it in the root of this project.

Performance Benchmarks

Run bin/benchmark to test performance and catch regressions:

bin/benchmark  # CI-optimized benchmark with pass/fail thresholds

[!NOTE] When adding new public methods to the API, ensure they are included in the benchmark script to catch performance regressions.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and the created tag, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/thoughtbot/top_secret.

Please create a new discussion if you want to share ideas for new features.

This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.

License

Open source templates are Copyright (c) thoughtbot, inc. It contains free software that may be redistributed under the terms specified in the LICENSE file.

Code of Conduct

Everyone interacting in the TopSecret project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.

About thoughtbot

thoughtbot

This repo is maintained and funded by thoughtbot, inc. The names and logos for thoughtbot are trademarks of thoughtbot, inc.

We love open source software! See our other projects. We are available for hire.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for top_secret

Similar Open Source Tools

For similar tasks

For similar jobs