Regular Expression Denial of Service (ReDoS) vulnerabilities occur when poorly designed regular expressions can be exploited to cause excessive CPU consumption, potentially leading to application unavailability.
Use this file to discover all available pages before exploring further.
Regular Expression Denial of Service Overview
Regular Expression Denial of Service (ReDoS) is a vulnerability that occurs when a regular expression contains patterns that can cause exponential or catastrophic backtracking. When these regular expressions process maliciously crafted input, they can consume excessive CPU resources, potentially leading to denial of service.The root cause of ReDoS is the backtracking behavior in regex engines. When a regex engine attempts to match a pattern but encounters a situation where multiple paths could be taken, it may need to try all possible paths before determining that a match is impossible. With carefully crafted input, this can lead to exponential time complexity.Preventing ReDoS requires careful design of regular expressions, avoiding vulnerable patterns, and implementing proper timeouts and input validation.
Catastrophic Backtracking
// Anti-pattern: Regex vulnerable to catastrophic backtrackingfunction validateEmail(email) { // This regex is vulnerable to catastrophic backtracking const emailRegex = /^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$/; return emailRegex.test(email);}// Example attack string: "a".repeat(100000) + "@example.com"// This will cause the regex engine to backtrack extensively// Better approach: Using a safer regex patternfunction validateEmailSafely(email) { // Limit repetition and avoid nested quantifiers const safeEmailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/; // Add input length validation if (email.length > 254) { // RFC 5321 SMTP limit return false; } return safeEmailRegex.test(email);}
Catastrophic backtracking occurs when a regular expression contains patterns that can match the same input in multiple ways, causing the regex engine to try all possible combinations.Common patterns that can lead to catastrophic backtracking:
Nested repetition quantifiers (e.g., (a+)+)
Overlapping character classes with repetition
Multiple optional groups with repetition
Alternation with shared prefixes or suffixes
Greedy quantifiers followed by similar patterns
To prevent catastrophic backtracking:
Avoid nested repetition quantifiers
Use atomic groups or possessive quantifiers when available
Limit the use of backtracking by using non-capturing groups
Implement input length validation before regex processing
Consider using regex engines with linear time guarantees
Vulnerable Regex Patterns
// Anti-pattern: Vulnerable regex patterns// Example 1: Nested repetition - (a+)+ patternconst nestedRepetitionRegex = /^(a+)+$/;// Attack string: "a".repeat(100) + "X"// Example 2: Alternation with repetitionconst alternationRegex = /(ab|a)+c/;// Attack string: "ab".repeat(100)// Example 3: Adjacent repetition quantifiersconst adjacentRepetitionRegex = /^a+b+$/;// Attack string: "a".repeat(100000)// Example 4: Complex backreferencesconst backreferenceRegex = /^(a+)\1+$/;// Attack string: "a".repeat(50000)// Better approach: Safer regex patterns// Example 1: Using atomic groups (?>...)const atomicGroupRegex = /^(?>(a+))+$/;// Example 2: Using possessive quantifiers a++const possessiveRegex = /^a++b++$/;// Example 3: Using non-backtracking alternativesconst nonBacktrackingRegex = /^[a]+[b]+$/;// Example 4: Using lookahead for validationconst lookaheadRegex = /^(?=.{1,255}$)a+b+$/;
# Python examplesimport re# Anti-pattern: Vulnerable regexvulnerable_regex = re.compile(r'^(a+)*b$')# Better approach: Using atomic groups in Python# Python doesn't support atomic groups directly, but we can use lookaheadsafer_regex = re.compile(r'^(?=(a*))\1b$')# Even better: Use the 're2' library which guarantees linear time# pip install re2import re2linear_time_regex = re2.compile(r'^(a+)*b$')
Certain regex patterns are particularly vulnerable to ReDoS attacks due to their backtracking behavior.To identify and fix vulnerable patterns:
Avoid nested repetition like (a+)+ or (a*b*)*
Be cautious with alternation that has common prefixes like (ab|abc)+
Avoid backreferences in combination with repetition
Use atomic groups (?>...) when available (in languages that support them)
Use possessive quantifiers a++ when available (in languages that support them)
Consider using character classes [a] instead of single characters with quantifiers
Implement proper input validation before applying regex
Timeout Mechanisms
// Anti-pattern: No timeout for regex operationsfunction validateInput(input) { const complexRegex = /^(a+)*b$/; return complexRegex.test(input);}// Better approach: Implementing timeout mechanismsfunction validateInputWithTimeout(input, timeoutMs = 100) { return new Promise((resolve, reject) => { // Create a worker to run the regex in a separate thread const worker = new Worker(URL.createObjectURL(new Blob([` self.onmessage = function(e) { const input = e.data; const complexRegex = /^(a+)*b$/; const result = complexRegex.test(input); self.postMessage(result); } `], { type: 'application/javascript' }))); // Set timeout const timeoutId = setTimeout(() => { worker.terminate(); reject(new Error('Regex execution timed out')); }, timeoutMs); // Handle response worker.onmessage = (e) => { clearTimeout(timeoutId); worker.terminate(); resolve(e.data); }; // Start the worker worker.postMessage(input); });}// Usage with async/awaitasync function processUserInput(input) { try { // Validate input length first if (input.length > 1000) { return false; } const isValid = await validateInputWithTimeout(input); return isValid; } catch (error) { console.error('Validation error:', error); return false; // Fail closed on timeout }}
# Python timeout exampleimport reimport signalclass TimeoutException(Exception): passdef timeout_handler(signum, frame): raise TimeoutException("Regex execution timed out")def validate_with_timeout(pattern, input_str, timeout_sec=1): # Set the timeout signal.signal(signal.SIGALRM, timeout_handler) signal.alarm(timeout_sec) try: # Run the regex result = re.match(pattern, input_str) is not None # Cancel the timeout signal.alarm(0) return result except TimeoutException: # Handle timeout return False finally: # Reset the alarm signal.alarm(0)# Usagepattern = r'^(a+)*b$'input_str = 'a' * 100000is_valid = validate_with_timeout(pattern, input_str)
Implementing timeout mechanisms is crucial for mitigating the impact of ReDoS attacks, as they limit the resources that can be consumed by a single regex operation.To implement effective timeout mechanisms:
Use asynchronous processing or separate threads for regex operations
Set appropriate timeout values based on expected regex complexity
Implement proper error handling for timeouts
Consider using worker threads or child processes for isolation
Fail closed (reject input) when timeouts occur
Log timeout incidents for monitoring and alerting
Consider implementing circuit breakers for regex operations
Test timeout mechanisms with known problematic inputs
Alternative Regex Engines
// Anti-pattern: Using standard regex engine for security-critical operationsfunction validateUserInput(input) { const vulnerableRegex = /^(a|aa)+$/; return vulnerableRegex.test(input);}// Better approach: Using alternative regex engines// Example with RE2 in Node.jsconst RE2 = require('re2');function validateUserInputSafely(input) { try { // RE2 guarantees linear time complexity const safeRegex = new RE2('^(a|aa)+$'); return safeRegex.test(input); } catch (error) { console.error('Regex error:', error); return false; }}
# Python example with re2# Standard library regex (potentially vulnerable)import redef validate_standard(input_str): pattern = re.compile(r'^(a|aa)+$') return bool(pattern.match(input_str))# Using Google's RE2 engineimport re2def validate_re2(input_str): pattern = re2.compile(r'^(a|aa)+$') return bool(pattern.match(input_str))
// Java example with different regex enginesimport java.util.regex.Pattern;import com.google.re2j.Pattern as RE2Pattern;// Standard Java regex (potentially vulnerable)public boolean validateStandard(String input) { Pattern pattern = Pattern.compile("^(a|aa)+$"); return pattern.matcher(input).matches();}// Using Google's RE2Jpublic boolean validateRE2(String input) { RE2Pattern pattern = RE2Pattern.compile("^(a|aa)+$"); return pattern.matcher(input).matches();}
Alternative regex engines with guaranteed linear time complexity can eliminate the risk of ReDoS vulnerabilities.Popular alternative regex engines:
Google’s RE2: Guarantees linear time complexity (available for multiple languages)
re2j: Java port of RE2
node-re2: Node.js binding for RE2
Rust’s regex crate: Guarantees linear time complexity
Considerations when using alternative engines:
They may not support all features of traditional regex engines (e.g., backreferences)
There may be syntax differences or compatibility issues
Performance characteristics may differ for non-pathological cases
Implementation and integration may require additional dependencies
Testing is essential to ensure equivalent functionality
Input Validation Strategies
// Anti-pattern: Applying regex directly to unchecked inputfunction validateEmail(email) { const emailRegex = /^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$/; return emailRegex.test(email);}// Better approach: Input validation before regexfunction validateEmailSafely(email) { // Check input type if (typeof email !== 'string') { return false; } // Check input length if (email.length > 254 || email.length < 3) { return false; } // Check for basic structure before applying regex if (!email.includes('@') || !email.includes('.')) { return false; } // Apply simpler checks first const parts = email.split('@'); if (parts.length !== 2 || parts[0].length === 0 || parts[1].length === 0) { return false; } // Finally apply the regex const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/; return emailRegex.test(email);}
# Python input validation exampledef validate_email_safely(email): # Check input type if not isinstance(email, str): return False # Check input length if len(email) > 254 or len(email) < 3: return False # Check for basic structure if '@' not in email or '.' not in email: return False # Apply simpler checks parts = email.split('@') if len(parts) != 2 or not parts[0] or not parts[1]: return False # Apply regex only after preliminary checks import re email_regex = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$') return bool(email_regex.match(email))
Proper input validation before applying regular expressions can significantly reduce the risk of ReDoS attacks.Effective input validation strategies:
Validate input type and structure before applying regex
Implement input length limits appropriate to the expected data
Use simpler, faster checks before complex regex patterns
Consider breaking complex regex patterns into multiple simpler patterns
Validate against a whitelist of allowed values when possible
Implement proper error handling for invalid input
Log validation failures for monitoring and alerting
Consider using input normalization before validation
Testing for ReDoS Vulnerabilities
// Example: Testing regex patterns for ReDoS vulnerabilitiesfunction testRegexPerformance(regex, inputs) { const results = []; for (const input of inputs) { const startTime = performance.now(); const result = regex.test(input); const endTime = performance.now(); const duration = endTime - startTime; results.push({ input, length: input.length, result, duration, isSuspicious: duration > 100 // Flag if takes more than 100ms }); } return results;}// Generate test inputs of increasing lengthfunction generateTestInputs(basePattern, lengths) { return lengths.map(length => basePattern.repeat(length));}// Example usageconst vulnerableRegex = /^(a+)+$/;const testInputs = generateTestInputs('a', [10, 100, 1000, 10000]);const results = testRegexPerformance(vulnerableRegex, testInputs);// Analyze results to detect exponential behaviorfunction analyzeResults(results) { // Calculate growth rate for (let i = 1; i < results.length; i++) { const prev = results[i-1]; const curr = results[i]; const lengthRatio = curr.length / prev.length; const timeRatio = curr.duration / prev.duration; // If time grows much faster than input length, likely exponential if (timeRatio > lengthRatio * 2) { console.warn(`Potential ReDoS vulnerability detected: Time ratio (${timeRatio.toFixed(2)}) > Length ratio (${lengthRatio.toFixed(2)})`); return true; } } return false;}
# Python testing exampleimport timeimport redef test_regex_performance(pattern, inputs): results = [] compiled_regex = re.compile(pattern) for input_str in inputs: start_time = time.time() result = bool(compiled_regex.match(input_str)) end_time = time.time() duration = (end_time - start_time) * 1000 # Convert to ms results.append({ "input": input_str[:20] + "..." if len(input_str) > 20 else input_str, "length": len(input_str), "result": result, "duration": duration, "suspicious": duration > 100 }) return results# Generate test cases with increasing complexitydef generate_test_inputs(base_pattern, lengths): return [base_pattern * length for length in lengths]# Example usagepattern = r'^(a+)+$'test_inputs = generate_test_inputs('a', [10, 100, 1000, 5000])results = test_regex_performance(pattern, test_inputs)# Analyze for exponential behaviordef is_vulnerable(results): for i in range(1, len(results)): prev, curr = results[i-1], results[i] length_ratio = curr["length"] / prev["length"] time_ratio = curr["duration"] / prev["duration"] if prev["duration"] > 0 else float('inf') if time_ratio > length_ratio * 2: print(f"Potential vulnerability: Time ratio ({time_ratio:.2f}) > Length ratio ({length_ratio:.2f})") return True return False
Testing for ReDoS vulnerabilities is essential to identify problematic regex patterns before they can be exploited.Effective testing strategies:
Test regex patterns with inputs of increasing length
Measure execution time to detect exponential behavior
Use automated tools designed to detect vulnerable patterns
Implement regex testing as part of the CI/CD pipeline
Use fuzzing techniques to generate potentially problematic inputs
Test with known attack patterns for common regex vulnerabilities
Consider using static analysis tools that can identify vulnerable patterns
Benchmark regex performance against expected time complexity
# Python static analysis example# Using the rexploitable library# pip install rexploitablefrom rexploitable import analyze_patterndef check_regex_safety(pattern): result = analyze_pattern(pattern) if result.vulnerable: print(f"Vulnerable regex detected: {pattern}") print(f"Vulnerability type: {result.vulnerability_type}") print(f"Example attack string: {result.attack_string}") return result# Check patternscheck_regex_safety(r'^(a+)+$') # Should be vulnerablecheck_regex_safety(r'^[a-z]+$') # Should be safe
Static analysis tools can help identify potentially vulnerable regex patterns during development, before they can be exploited.Popular static analysis tools for ReDoS:
vuln-regex-detector (JavaScript): Analyzes regex vulnerability with attack string generation
rexploitable (Python): Analyzes regex patterns for vulnerabilities
RegexStaticAnalysis (Java): Static analysis for Java regex patterns
RXXR2: Research tool for regex vulnerability analysis
Benefits of static analysis:
Early detection of vulnerable patterns
Integration into development workflows and CI/CD pipelines
Automated scanning of codebases
Education of developers about vulnerable patterns
Reduced risk of deploying vulnerable code
ReDoS Prevention Checklist
ReDoS Prevention Checklist:1. Regex Pattern Design - Avoid nested repetition quantifiers (e.g., (a+)+) - Limit the use of alternation with overlapping patterns - Use atomic groups or possessive quantifiers when available - Prefer character classes over alternation when possible - Avoid backreferences in combination with repetition - Keep regex patterns as simple as possible2. Input Validation - Implement strict input validation before applying regex - Set appropriate length limits for inputs - Use simpler checks before complex regex patterns - Consider breaking complex patterns into multiple simpler ones - Validate against a whitelist when possible3. Execution Controls - Implement timeout mechanisms for regex operations - Run complex regex in separate threads or processes - Consider using worker pools for regex processing - Implement circuit breakers for regex operations - Fail closed (reject) when timeouts occur4. Alternative Approaches - Consider using alternative regex engines with linear time guarantees - Use parser generators or state machines for complex parsing - Consider non-regex approaches for input validation when appropriate - Use specialized validation libraries for common formats5. Testing and Analysis - Test regex patterns with inputs of increasing length - Use static analysis tools to identify vulnerable patterns - Include regex performance testing in CI/CD pipelines - Benchmark regex performance against expected time complexity - Use fuzzing to identify potentially problematic inputs6. Monitoring and Response - Monitor regex execution times in production - Implement alerting for abnormal regex processing times - Log and analyze regex timeout incidents - Have a response plan for ReDoS attacks - Regularly review and update regex patterns
Preventing ReDoS vulnerabilities requires a comprehensive approach that addresses regex pattern design, input validation, execution controls, and monitoring.Key prevention strategies:
Design regex patterns carefully to avoid vulnerable constructs
Implement proper input validation before applying regex
Use timeout mechanisms to limit resource consumption
Consider alternative regex engines with linear time guarantees
Test regex patterns for performance issues
Use static analysis tools to identify vulnerable patterns