Best Practices and Examples for Masking with Regular Expressions (Regex)
Table of Contents
Overview
Regular expressions (regex) are a powerful tool for identifying, matching, and masking sensitive information in text. This guide provides practical examples and best practices for masking different types of data commonly encountered in applications such as insurance, healthcare, billing, and logistics.
This article provides best practices for creating Custom Data Types in Data Masking
The best practices below are for creating Regex expressions, AI Assist for Regex is also available.
Best Practices for Regex Masking
- Start Specific, Then Generalize: Begin with strict patterns to avoid false positives, then expand as needed.
- Include Edge Cases: Test formats with slight variations (extra spaces, different separators, leading zeros).
- Use Boundaries over start and end anchors: \b ensures matches occur at word boundaries. Start and end anchors are not supported.
- Test Negative Cases: Include examples that should not match when testing your regex pattern to prevent accidental exposure of data.
- Avoid lookbehinds and lookaheads. For performance purposes, these functions are not permitted within your pattern. Remove (?=…), (?<=…), (?<!…), or (?!…) from your pattern.
- Limit regex pattern to 256 characters.
- Regex is case sensitive. If you want to include all permutations of a word ensure you are specifying both lower case and upper case.
- Use OR (|) if you want to specify multiple variations of the data type you want masked.
Examples of Masking Use Cases with Regular Expressions
| Situation | Example 1 | Example 2 |
| Credit Card Number Masking |
Regex: (\d{4} \d{4} \d{4} \d{4})|(\d{16}) Example Data to Match: 1234 1234 1234 1234 1234123412341234 |
Regex: \b\d{4}-\d{4}-\d{4}-\d{4}\b Example Data to Match: 1234-1234-1234-1234 |
| Policy Number Masking |
Regex: \b[A-Z]{3}\s\d{4}-[A-Z]\d{2}-\d{2}[A-Z]\b Example Data to Match: ABC 1234-A12-45Z |
Regex: \b[A-Z]\d{2}\s\d{4}-59\s[A-Z]\d{2}\b Example Data to Match: B23 8821-59 K09 |
| US Address Masking |
Regex: \b[A-Za-z0-9 ]{10,60}, [A-Za-z ]{3,30}, [A-Z]{2} \d{5}(?:\-\d{4})?\b Example Data to Match: 77B Elm Avenue Apt 4C, Brooklyn, NY 11215 2808 S 31st St, Sioux Falls, SD 57105 |
|
| International Phone Number Masking |
Regex: \+\d{1,3}\s?\d{4,14} Example Data to Match: +44 7823 445229 +91 9988776655 |
|
| Driver’s License Masking (US General) |
Regex: \b([A-Z]\d{12}|\d{8,12})\b Example Data to Match: K298334982134 839202114 |
|
| Medical or Health Information Masking Example |
Regex (keywords): (?i)\b(arthritis|diabetes|asthma|cancer)\b Example Data to Match: She was diagnosed with asthma last year. |
Tips and Common Pitfalls
- Always test your regex on real data samples before saving.
- Beware of optional spaces or dashes in numbers; use [-\s]? where appropriate.
- False positives: Make sure your regex doesn’t accidentally match unrelated numbers or words.
-
Use word boundaries (\b) to prevent partial matches.
Was this article helpful?