Rules to Better Regular Expressions
Do you format and comment your regular expressions?
Regular expressions are a very powerful tool for pattern matching, but a complicated regex can be very difficult for a human to read and to comprehend. That is why, like any good code, a good regular expression must be well formatted and documented.
Here are some guidelines when formatting and documenting your regex:
- Keep each line under 80 characters, horizontal scrolling reduces readability
- Break long patterns into multiple lines, usually after a space or a line break
- Indent bracers to help think in the right scope
- Format complicated OR patterns into multiple blocks like a case statement
- Comment your regex on what it does, don't just translate it into English
# Match <BODY<BODY# Match any non > char for zero to infinite number of times[^>]*# MATCH >>❌ Figure: Bad example: Comment that translates the regex into English
# Match the BODY tag<BODY# Match any character in the body tag[^>]*# Match the end BODY tag>✅ Figure: Good example: Comment that explains the purpose of the pattern
(?six-mn:(Label|TextBox)\s+(?<Name>\w+).*(?<Result>\k<Name>\.TextAlign\s*=\s* ((System\.)?Drawing\.)?ContentAlignment\.(?! TopLeft|MiddleLeft|TopCenter|MiddleCenter)\w*)(?!(?<=\k<Name>\.Image.*)|(?=.*\k<Name>\.Image)))❌ Figure: Bad example: Pray you never have to modify this regex
(?six-mn:# Match for Label or TextBox control# Store name into <name> group(Label|TextBox)\s+(?<Name>\w+).*# Match any non-standard TextAlign# Store any match in Result group for error reporting in CA(?<Result># Match for control's TextAlign Property\k<Name>\.TextAlign\s*=\s*# Match for possible namespace((System\.)?Drawing\.)?ContentAlignment\.# Match any ContentAlignment that is not in the group(?!TopLeft|MiddleLeft|TopCenter|MiddleCenter)\w*)# Skip any Control that has image on it(?!(?<=\k<Name>\.Image.*)|(?=.*\k<Name>\.Image)))✅ Figure: Good example: Now it make sense!
Do you test your regular expressions?
Everyone writes unit tests for their code, because it helps developer to make changes in future without breaking existing functionalities. The same goes for regular expressions. A good regular expression will have a set of test cases to make sure any future changes does not invalidate existing requirements.
You should not fix a regular expression until we have added a good and a bad test case.
If your application is driven by regular expressions, you need a good test harness. Here is an example of a test harness we use in CodeAuditor.
Figure: Test Harness for regular expressions in CodeAuditor
Do you use resource file to store your regular expressions?
Using resource files to store regular expressions simplifies management and promotes consistency across the project, enhancing maintainability and development workflows.
public static Queue getFilesInProject(string projectFile){Queue tempQueue = new Queue();TextReader tr = File.OpenText(projectFile);// RT (10/10/2005): New regex to support VS 2005 project files (.csproj & .vbproj)//(?ixm-sn://# VS 2003//(?:RelPath\s=\s\"(?<filename>.*?)\")//|//# VS 2005//(?:(?<=Compile|EmbeddedResource|Content|None)\sInclude=\"(?<FileName>.*?)\")//)Regex regex = new Regex(@"(?ixm-sn:(?:RelPath\s=\s\""(?<FileName>.*?)\"")|(?:(?<=Compile|EmbeddedResource|Content|None)\sInclude=\""(?<FileName>.*?)\""))");MatchCollection matches = regex.Matches(tr.ReadToEnd());}❌ Figure: Bad example: Regular expression is embedded in code
The problem with this code is that the regular expression is embedded within the method and not easily testable without creating mock files on-the-fly, etc. Another issue with embedding regular expressions in-code is escaping issues - often people will forget to escape the special characters or escape them incorrectly and thus cause the regular expression to behave differently between the design and execution environments.
The way we deal with this is to put the regular expression in a resource file. Using a resource file, it solves the aforementioned issues, and it also allows us to leave a comment for the regular expression.
Figure: Good example - The regular expression (with comment) is stored in a resource file
public static Queue getFilesInProject(string projectFile){Queue tempQueue = new Queue();TextReader tr = File.OpenText(projectFile);Regex regex = new Regex(RegularExpression.GetFilesInProject);MatchCollection matches = regex.Matches(tr.ReadToEnd());}✅ Figure: Good example: We can easily get the regular expression from resource file
Do you use a regular expression to validate an email address?
A regex is the best way to verify an email address.
public bool IsValidEmail(string email){// Return true if it is in valid email format.if (email.IndexOf("@") <= 0) return false;if (email.EndWith("@")) return false;if (email.IndexOf(".") <= 0) return false;if ( ...}❌ Figure: Figure: Bad example of verify email address
public bool IsValidEmail(string email){// Return true if it is in valid email format.return System.Text.RegularExpressions.Regex.IsMatch( email,@"^([\w-\.]+)@(([[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$";}✅ Figure: Figure: Good example of verify email address
Do you use a regular expression to validate an URL?
A regex is the best way to verify an URI.
public bool IsValidUri(string uri){try{Uri testUri = new Uri(uri);return true;}catch (UriFormatException ex){return false;}}❌ Figure: Figure: Bad example of verifying URI
public bool IsValidUri(string uri){// Return true if it is in valid Uri format.return System.Text.RegularExpressions.Regex.IsMatch( uri,@"^(http|ftp|https)://([^\/][\w-/:]+\.?)+([\w- ./?/:/;/\%&=]+)?(/[\w- ./?/:/;/\%&=]*)?");}✅ Figure: Figure: Good example of verifying URI
You should have unit tests for it, see our Rules to Better Unit Tests for more information.
Do you use online regex tools to simplify your regular expression workflows?
Regular expressions (regex) are powerful tools for pattern matching and text processing, but they can be challenging to write and debug. Online regex tools like RegExr and Regex101 simplify this process by providing interactive environments to test, debug, and learn regex. These tools save time, reduce errors, and help you master regex faster.
Why use online regex tools?
- Instant Feedback: Test your regex patterns in real-time and see immediate results
- Learning Resources: Many tools include tutorials, examples, and explanations to help you understand regex syntax
- Debugging Features: Identify issues in your regex with visual aids and detailed error messages
- Cross-Platform: These tools are accessible from any browser, making them convenient for developers on the go
import re# Define the regex pattern for validating passwordspattern = r"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*()_+\-=\[\]{};':\"\\|,.<>\/?]).{8,}$"# List of passwords to validatepasswordList = ["A1b2C3d4!","S3cur3#Key","Pass!","Password1","password!1","A1b2C3d4!"]# Loop through the passwords and use re.search to validate themfor password in passwordList:if re.search(pattern, password):print(f"Valid: {password}")else:print(f"Invalid: {password}")Valid: A1b2C3d4!Valid: S3cur3#KeyInvalid: Pass!Invalid: Password1Invalid: password!1Valid: A1b2C3d4!❌ Figure: Figure: Writing and testing regex directly in your code without live validation
✅ Figure: Using RegExr to debug and validate your pattern before implementation
Best Online Regex Tools
- RegExr (Recommended)
- User-friendly interface and community-driven examples.
- Open source and can be hosted privately. See https://github.com/gskinner/regexr/
- Regex debugger to step through your pattern.
- Code generator for multiple programming languages.
- Extensive regex library and quick reference guide.
Avoid overcomplicating your regex patterns; use the tools to simplify and optimize them.