Published on: April 1, 2025
8 min read
This tutorial explains how GitLab's customizable Secret Detection rulesets enhance data security by identifying PII patterns in code repositories. Learn how AI can help.
Protecting sensitive information is more critical than ever. GitLab's Secret Detection feature provides a powerful solution to identify and prevent the exposure of sensitive data. This tutorial explores how GitLab Secret Detection works, how to create custom rulesets for finding personally identifiable information, and how GitLab Duo Chat can streamline the creation of regex patterns for PII detection.
GitLab Secret Detection is a security scanning feature integrated into the GitLab CI/CD pipeline. It automatically scans your codebase to identify hardcoded secrets, credentials, and other sensitive information that shouldn't be stored in your repository.
While GitLab's default secret detection covers common secrets like API keys and passwords, you may need custom rules to identify specific types of PII relevant to your organization.
To get started, create a new GitLab project and follow the steps below. You can follow along and see usage examples in our PII Demo Application.
Step 1: Set up Secret Detection
Ensure Secret Detection is enabled in your .gitlab-ci.yml
file:
include:
- template: Security/Secret-Detection.gitlab-ci.yml
secret_detection:
variables:
SECRET_DETECTION_EXCLUDED_PATHS: "rules,.gitlab,README.md,LICENSE"
SECRET_DETECTION_HISTORIC_SCAN: "true"
Step 2: Create a custom ruleset file
Create the directory and file rules/pii-data-extenson.toml
, which contains the regex patterns for PII data along with an allowlist of patterns to ignore. Below are patterns to detect passport numbers (USA), phone numbers (USA), and email addresses:
[extend]
# Extends default packaged ruleset, NOTE: do not change the path.
path = "/gitleaks.toml"
# Patterns to ignore (used for tests)
[allowlist]
description = "allowlist of patterns and paths to ignore in detection"
regexTarget = "match"
regexes = ['''555-555-5555''', '''[email protected]''']
paths = ['''(.*?)(jpg|gif|doc|pdf|bin|svg|socket)''']
# US Passport Number (USA)
[[rules]]
id = "us_passport_detection"
title = "US Passport Number"
description = "Detects US passport numbers"
regex = '''\b[A-Z]{1,2}[0-9]{6,9}\b'''
keywords = ["passport"]
# Phone Number (USA)
[[rules]]
id = "us_phone_number_detection_basic"
title = "US Phone Number"
description = "Detects US phone numbers in basic format"
regex = '''\b\d{3}-\d{3}-\d{4}\b'''
keywords = ["phone", "mobile"]
# Email Address
[[rules]]
id = "email_address"
title = "Email Address"
description = "Detects email addresses"
regex = '''[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'''
keywords = ["email", "e-mail"]
Step 3: Extend Secret Detection with the custom ruleset file
Create a directory and file .gitlab/secret-detection-ruleset.toml
in the root of your repository. This file allows you to extend the standard configuration with the PII rules file, and overwrite the severity of the detected vulnerabilities (default severity is Critical
).
# Define the pii rules to add to default configuration
[[secrets.passthrough]]
type = "file"
target = "gitleaks.toml"
value = "rules/pii-data-extension.toml"
# Overwrite Phone Number (USA) PII Severity
[[secrets.ruleset]]
[secrets.ruleset.identifier]
type = "gitleaks_rule_id"
value = "us_phone_number_detection_basic"
[secrets.ruleset.override]
severity = "Medium"
# Overwrite Email Address PII Severity
[[secrets.ruleset]]
[secrets.ruleset.identifier]
type = "gitleaks_rule_id"
value = "email_address"
[secrets.ruleset.override]
severity = "Low"
Step 4: Commit your changes
Now add the changes in the above steps to your project.
cd /path/to/your/project
git add .
git commit -m "Add PII data ruleset and Secret Scanning"
git push
Once the code is committed, Secret Detection will run within the default branch.
Step 5: Test detection of PII data
Now that we have configured the Secret Detection scanner, we should perform a test to see if the scanner is detecting the new custom patterns. This can be done by creating a merge request, which adds a new file named customer-data.yaml
with the following:
customers:
test_user:
phone_number: 555-555-555
email: [email protected]
justin_case:
phone_number: 512-123-4567
passport_number: A12345678
email: [email protected]
chris_p_bacon:
phone_number: 305-123-4567
passport_number: B09876543
email: [email protected]
The scanner should now perform the following:
phone_number
and email
of test_user
due to patterns being in allowlistjustin_case
and chris_p_bacon
Critical
(default)Medium
(override)Low
(override)Once the merge request is submitted, the Secret Detection scanner runs and provides the following results:
When clicking on a vulnerability, you are presented with detailed vulnerability data based on what was configured in your newly set up rules:
This data allows you to determine the validity of the data present and address it accordingly.
There are additional ways to configure custom rulesets. For example, rules can be applied remotely to several projects, avoiding the need to duplicate the rules file. See the Secret Detection Configuration documentation for more information.
When building your custom ruleset, consider including patterns for:
Creating effective regex patterns can be challenging. This is where GitLab Duo Chat comes in handy. GitLab Duo Chat is an AI-powered assistant that accelerates development with contextual, conversational AI. Chat is available to any user with an assigned GitLab Duo seat.
Access GitLab Duo Chat: In the upper right corner, select GitLab Duo Chat. A drawer opens on the right side of your screen.
Prompt the chat: Enter a prompt in the chat input box and press Enter or select Send:
I need a regex pattern to detect Canadian Zip Codes in our codebase for secret detection
Canadian postal codes follow the format "A1A 1A1" (where A is a letter and 1 is a digit). Here's a regex pattern that will detect Canadian postal codes in your codebase, accounting for both spaced and non-spaced formats:
[A-Za-z]\d[A-Za-z][ -]?\d[A-Za-z]\d
This pattern matches:
* A letter, followed by a digit, followed by a letter
* An optional space or hyphen
* A digit, followed by a letter, followed by a digit
If you need a more strict pattern that only matches uppercase letters (as official Canadian postal codes use), you can use:
[A-Z]\d[A-Z][ -]?\d[A-Z]\d
For implementation in a secret detection tool, you might want to add word boundaries to avoid partial matches:
\b[A-Z]\d[A-Z][ -]?\d[A-Z]\d\b
Would you like more details about implementing this in a specific secret detection system?
Test the pattern: Test the pattern against sample data to ensure it works correctly. You can use sites like https://regex101.com/ to validate patterns before you submit them to your ruleset.
Add to your ruleset: Once satisfied, add the pattern to your rules/pii-data-extension.toml
file and perform some tests to validate. This sample merge request contains a newly added rule for Canadian Zip Codes based on the above.
Additionally, you can use GitLab Duo Chat in:
In the future, you’ll be able to leverage GitLab Duo Workflow (currently in private beta) to automatically generate and add these patterns to your code base directly from your IDE. GitLab Duo Workflow is an AI agent, which transforms AI from reactive assistant to autonomous contributor, optimizing your software development lifecycle. Learn more about GitLab Duo Workflow.
Once you have set up a PII data ruleset to meet your organization's needs, remote rulesets can scan for PII data across multiple repositories without the need to duplicate the rules file. Watch this video to learn more:
When GitLab Secret Detection identifies potential PII in your code:
GitLab Secret Detection, combined with custom PII rulesets, provides a powerful defense against inadvertent exposure of sensitive information. By leveraging GitLab Duo Chat to create precise regex patterns, teams can efficiently implement comprehensive PII detection across their codebase, ensuring regulatory compliance and protecting user data.
Remember that secret detection is just one component of a comprehensive security strategy. Combine it with other GitLab security features like static application security testing, dynamic application security testing, and dependency scanning for a more robust security posture.
Start implementing these practices today to better protect your users' personal information and maintain the security integrity of your applications.
Start a free, 60-day trial of GitLab Ultimate and GitLab Duo today!
To learn more about GitLab security and compliance and how we can help enhance your AppSec workflows, follow the links below: