Matter AI | Code Reviewer Documentation home pagelight logodark logo
  • Contact
  • Github
  • Sign in
  • Sign in
  • Documentation
  • Blog
  • Discord
  • Github
  • Introduction
    • What is Matter AI?
    Getting Started
    • QuickStart
    Product
    • Security Analysis
    • Code Quality
    • Agentic Chat
    • RuleSets
    • Memories
    • Analytics
    • Command List
    • Configurations
    Patterns
    • Languages
    • Security
      • Injection Vulnerabilities
      • Authentication Vulnerabilities
      • Authorization Vulnerabilities
      • Cryptography Vulnerabilities
      • Data Protection Vulnerabilities
      • Input Validation Vulnerabilities
      • Session Management Vulnerabilities
      • Error Handling Vulnerabilities
      • Logging Vulnerabilities
      • Configuration Vulnerabilities
      • File Handling Vulnerabilities
      • Memory Management Vulnerabilities
      • API Security Vulnerabilities
      • Cross-Site Scripting Vulnerabilities
      • Cross-Site Request Forgery Vulnerabilities
      • Insecure Deserialization Vulnerabilities
      • Security Misconfiguration Vulnerabilities
      • Broken Access Control Vulnerabilities
      • Sensitive Data Exposure Vulnerabilities
      • XML Vulnerabilities
      • Regular Expression Denial of Service (ReDoS)
    • Performance
    Integrations
    • Code Repositories
    • Team Messengers
    • Ticketing
    Enterprise
    • Enterprise Deployment Overview
    • Enterprise Configurations
    • Observability and Fallbacks
    • Create Your Own GitHub App
    • Self-Hosting Options
    • RBAC
    Patterns
    Security

    Regular Expression Denial of Service (ReDoS)

    Regular Expression Denial of Service (ReDoS) vulnerabilities occur when poorly designed regular expressions can be exploited to cause excessive CPU consumption, potentially leading to application unavailability.

    Regular Expression Denial of Service (ReDoS) is a vulnerability that occurs when a regular expression contains patterns that can cause exponential or catastrophic backtracking. When these regular expressions process maliciously crafted input, they can consume excessive CPU resources, potentially leading to denial of service.

    The root cause of ReDoS is the backtracking behavior in regex engines. When a regex engine attempts to match a pattern but encounters a situation where multiple paths could be taken, it may need to try all possible paths before determining that a match is impossible. With carefully crafted input, this can lead to exponential time complexity.

    Preventing ReDoS requires careful design of regular expressions, avoiding vulnerable patterns, and implementing proper timeouts and input validation.

    // Anti-pattern: Regex vulnerable to catastrophic backtracking
    function validateEmail(email) {
      // This regex is vulnerable to catastrophic backtracking
      const emailRegex = /^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$/;
      
      return emailRegex.test(email);
    }
    
    // Example attack string: "a".repeat(100000) + "@example.com"
    // This will cause the regex engine to backtrack extensively
    
    // Better approach: Using a safer regex pattern
    function validateEmailSafely(email) {
      // Limit repetition and avoid nested quantifiers
      const safeEmailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
      
      // Add input length validation
      if (email.length > 254) { // RFC 5321 SMTP limit
        return false;
      }
      
      return safeEmailRegex.test(email);
    }

    Catastrophic backtracking occurs when a regular expression contains patterns that can match the same input in multiple ways, causing the regex engine to try all possible combinations.

    Common patterns that can lead to catastrophic backtracking:

    • Nested repetition quantifiers (e.g., (a+)+)
    • Overlapping character classes with repetition
    • Multiple optional groups with repetition
    • Alternation with shared prefixes or suffixes
    • Greedy quantifiers followed by similar patterns

    To prevent catastrophic backtracking:

    • Avoid nested repetition quantifiers
    • Use atomic groups or possessive quantifiers when available
    • Limit the use of backtracking by using non-capturing groups
    • Implement input length validation before regex processing
    • Consider using regex engines with linear time guarantees
    // Anti-pattern: Vulnerable regex patterns
    
    // Example 1: Nested repetition - (a+)+ pattern
    const nestedRepetitionRegex = /^(a+)+$/;
    // Attack string: "a".repeat(100) + "X"
    
    // Example 2: Alternation with repetition
    const alternationRegex = /(ab|a)+c/;
    // Attack string: "ab".repeat(100)
    
    // Example 3: Adjacent repetition quantifiers
    const adjacentRepetitionRegex = /^a+b+$/;
    // Attack string: "a".repeat(100000)
    
    // Example 4: Complex backreferences
    const backreferenceRegex = /^(a+)\1+$/;
    // Attack string: "a".repeat(50000)
    
    // Better approach: Safer regex patterns
    
    // Example 1: Using atomic groups (?>...)
    const atomicGroupRegex = /^(?>(a+))+$/;
    
    // Example 2: Using possessive quantifiers a++
    const possessiveRegex = /^a++b++$/;
    
    // Example 3: Using non-backtracking alternatives
    const nonBacktrackingRegex = /^[a]+[b]+$/;
    
    // Example 4: Using lookahead for validation
    const lookaheadRegex = /^(?=.{1,255}$)a+b+$/;
    # Python examples
    import re
    
    # Anti-pattern: Vulnerable regex
    vulnerable_regex = re.compile(r'^(a+)*b$')
    
    # Better approach: Using atomic groups in Python
    # Python doesn't support atomic groups directly, but we can use lookahead
    safer_regex = re.compile(r'^(?=(a*))\1b$')
    
    # Even better: Use the 're2' library which guarantees linear time
    # pip install re2
    import re2
    linear_time_regex = re2.compile(r'^(a+)*b$')

    Certain regex patterns are particularly vulnerable to ReDoS attacks due to their backtracking behavior.

    To identify and fix vulnerable patterns:

    • Avoid nested repetition like (a+)+ or (a*b*)*
    • Be cautious with alternation that has common prefixes like (ab|abc)+
    • Avoid backreferences in combination with repetition
    • Use atomic groups (?>...) when available (in languages that support them)
    • Use possessive quantifiers a++ when available (in languages that support them)
    • Consider using character classes [a] instead of single characters with quantifiers
    • Implement proper input validation before applying regex
    // Anti-pattern: No timeout for regex operations
    function validateInput(input) {
      const complexRegex = /^(a+)*b$/;
      return complexRegex.test(input);
    }
    
    // Better approach: Implementing timeout mechanisms
    function validateInputWithTimeout(input, timeoutMs = 100) {
      return new Promise((resolve, reject) => {
        // Create a worker to run the regex in a separate thread
        const worker = new Worker(URL.createObjectURL(new Blob([`
          self.onmessage = function(e) {
            const input = e.data;
            const complexRegex = /^(a+)*b$/;
            const result = complexRegex.test(input);
            self.postMessage(result);
          }
        `], { type: 'application/javascript' })));
        
        // Set timeout
        const timeoutId = setTimeout(() => {
          worker.terminate();
          reject(new Error('Regex execution timed out'));
        }, timeoutMs);
        
        // Handle response
        worker.onmessage = (e) => {
          clearTimeout(timeoutId);
          worker.terminate();
          resolve(e.data);
        };
        
        // Start the worker
        worker.postMessage(input);
      });
    }
    
    // Usage with async/await
    async function processUserInput(input) {
      try {
        // Validate input length first
        if (input.length > 1000) {
          return false;
        }
        
        const isValid = await validateInputWithTimeout(input);
        return isValid;
      } catch (error) {
        console.error('Validation error:', error);
        return false; // Fail closed on timeout
      }
    }
    # Python timeout example
    import re
    import signal
    
    class TimeoutException(Exception):
        pass
    
    def timeout_handler(signum, frame):
        raise TimeoutException("Regex execution timed out")
    
    def validate_with_timeout(pattern, input_str, timeout_sec=1):
        # Set the timeout
        signal.signal(signal.SIGALRM, timeout_handler)
        signal.alarm(timeout_sec)
        
        try:
            # Run the regex
            result = re.match(pattern, input_str) is not None
            # Cancel the timeout
            signal.alarm(0)
            return result
        except TimeoutException:
            # Handle timeout
            return False
        finally:
            # Reset the alarm
            signal.alarm(0)
    
    # Usage
    pattern = r'^(a+)*b$'
    input_str = 'a' * 100000
    is_valid = validate_with_timeout(pattern, input_str)

    Implementing timeout mechanisms is crucial for mitigating the impact of ReDoS attacks, as they limit the resources that can be consumed by a single regex operation.

    To implement effective timeout mechanisms:

    • Use asynchronous processing or separate threads for regex operations
    • Set appropriate timeout values based on expected regex complexity
    • Implement proper error handling for timeouts
    • Consider using worker threads or child processes for isolation
    • Fail closed (reject input) when timeouts occur
    • Log timeout incidents for monitoring and alerting
    • Consider implementing circuit breakers for regex operations
    • Test timeout mechanisms with known problematic inputs
    // Anti-pattern: Using standard regex engine for security-critical operations
    function validateUserInput(input) {
      const vulnerableRegex = /^(a|aa)+$/;
      return vulnerableRegex.test(input);
    }
    
    // Better approach: Using alternative regex engines
    // Example with RE2 in Node.js
    const RE2 = require('re2');
    
    function validateUserInputSafely(input) {
      try {
        // RE2 guarantees linear time complexity
        const safeRegex = new RE2('^(a|aa)+$');
        return safeRegex.test(input);
      } catch (error) {
        console.error('Regex error:', error);
        return false;
      }
    }
    # Python example with re2
    # Standard library regex (potentially vulnerable)
    import re
    
    def validate_standard(input_str):
        pattern = re.compile(r'^(a|aa)+$')
        return bool(pattern.match(input_str))
    
    # Using Google's RE2 engine
    import re2
    
    def validate_re2(input_str):
        pattern = re2.compile(r'^(a|aa)+$')
        return bool(pattern.match(input_str))
    // Java example with different regex engines
    import java.util.regex.Pattern;
    import com.google.re2j.Pattern as RE2Pattern;
    
    // Standard Java regex (potentially vulnerable)
    public boolean validateStandard(String input) {
        Pattern pattern = Pattern.compile("^(a|aa)+$");
        return pattern.matcher(input).matches();
    }
    
    // Using Google's RE2J
    public boolean validateRE2(String input) {
        RE2Pattern pattern = RE2Pattern.compile("^(a|aa)+$");
        return pattern.matcher(input).matches();
    }

    Alternative regex engines with guaranteed linear time complexity can eliminate the risk of ReDoS vulnerabilities.

    Popular alternative regex engines:

    • Google’s RE2: Guarantees linear time complexity (available for multiple languages)
    • re2j: Java port of RE2
    • node-re2: Node.js binding for RE2
    • Rust’s regex crate: Guarantees linear time complexity

    Considerations when using alternative engines:

    • They may not support all features of traditional regex engines (e.g., backreferences)
    • There may be syntax differences or compatibility issues
    • Performance characteristics may differ for non-pathological cases
    • Implementation and integration may require additional dependencies
    • Testing is essential to ensure equivalent functionality
    // Anti-pattern: Applying regex directly to unchecked input
    function validateEmail(email) {
      const emailRegex = /^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$/;
      return emailRegex.test(email);
    }
    
    // Better approach: Input validation before regex
    function validateEmailSafely(email) {
      // Check input type
      if (typeof email !== 'string') {
        return false;
      }
      
      // Check input length
      if (email.length > 254 || email.length < 3) {
        return false;
      }
      
      // Check for basic structure before applying regex
      if (!email.includes('@') || !email.includes('.')) {
        return false;
      }
      
      // Apply simpler checks first
      const parts = email.split('@');
      if (parts.length !== 2 || parts[0].length === 0 || parts[1].length === 0) {
        return false;
      }
      
      // Finally apply the regex
      const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
      return emailRegex.test(email);
    }
    # Python input validation example
    def validate_email_safely(email):
        # Check input type
        if not isinstance(email, str):
            return False
        
        # Check input length
        if len(email) > 254 or len(email) < 3:
            return False
        
        # Check for basic structure
        if '@' not in email or '.' not in email:
            return False
        
        # Apply simpler checks
        parts = email.split('@')
        if len(parts) != 2 or not parts[0] or not parts[1]:
            return False
        
        # Apply regex only after preliminary checks
        import re
        email_regex = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
        return bool(email_regex.match(email))

    Proper input validation before applying regular expressions can significantly reduce the risk of ReDoS attacks.

    Effective input validation strategies:

    • Validate input type and structure before applying regex
    • Implement input length limits appropriate to the expected data
    • Use simpler, faster checks before complex regex patterns
    • Consider breaking complex regex patterns into multiple simpler patterns
    • Validate against a whitelist of allowed values when possible
    • Implement proper error handling for invalid input
    • Log validation failures for monitoring and alerting
    • Consider using input normalization before validation
    // Example: Testing regex patterns for ReDoS vulnerabilities
    function testRegexPerformance(regex, inputs) {
      const results = [];
      
      for (const input of inputs) {
        const startTime = performance.now();
        const result = regex.test(input);
        const endTime = performance.now();
        const duration = endTime - startTime;
        
        results.push({
          input,
          length: input.length,
          result,
          duration,
          isSuspicious: duration > 100 // Flag if takes more than 100ms
        });
      }
      
      return results;
    }
    
    // Generate test inputs of increasing length
    function generateTestInputs(basePattern, lengths) {
      return lengths.map(length => basePattern.repeat(length));
    }
    
    // Example usage
    const vulnerableRegex = /^(a+)+$/;
    const testInputs = generateTestInputs('a', [10, 100, 1000, 10000]);
    const results = testRegexPerformance(vulnerableRegex, testInputs);
    
    // Analyze results to detect exponential behavior
    function analyzeResults(results) {
      // Calculate growth rate
      for (let i = 1; i < results.length; i++) {
        const prev = results[i-1];
        const curr = results[i];
        const lengthRatio = curr.length / prev.length;
        const timeRatio = curr.duration / prev.duration;
        
        // If time grows much faster than input length, likely exponential
        if (timeRatio > lengthRatio * 2) {
          console.warn(`Potential ReDoS vulnerability detected: Time ratio (${timeRatio.toFixed(2)}) > Length ratio (${lengthRatio.toFixed(2)})`);
          return true;
        }
      }
      
      return false;
    }
    # Python testing example
    import time
    import re
    
    def test_regex_performance(pattern, inputs):
        results = []
        compiled_regex = re.compile(pattern)
        
        for input_str in inputs:
            start_time = time.time()
            result = bool(compiled_regex.match(input_str))
            end_time = time.time()
            duration = (end_time - start_time) * 1000  # Convert to ms
            
            results.append({
                "input": input_str[:20] + "..." if len(input_str) > 20 else input_str,
                "length": len(input_str),
                "result": result,
                "duration": duration,
                "suspicious": duration > 100
            })
        
        return results
    
    # Generate test cases with increasing complexity
    def generate_test_inputs(base_pattern, lengths):
        return [base_pattern * length for length in lengths]
    
    # Example usage
    pattern = r'^(a+)+$'
    test_inputs = generate_test_inputs('a', [10, 100, 1000, 5000])
    results = test_regex_performance(pattern, test_inputs)
    
    # Analyze for exponential behavior
    def is_vulnerable(results):
        for i in range(1, len(results)):
            prev, curr = results[i-1], results[i]
            length_ratio = curr["length"] / prev["length"]
            time_ratio = curr["duration"] / prev["duration"] if prev["duration"] > 0 else float('inf')
            
            if time_ratio > length_ratio * 2:
                print(f"Potential vulnerability: Time ratio ({time_ratio:.2f}) > Length ratio ({length_ratio:.2f})")
                return True
        
        return False

    Testing for ReDoS vulnerabilities is essential to identify problematic regex patterns before they can be exploited.

    Effective testing strategies:

    • Test regex patterns with inputs of increasing length
    • Measure execution time to detect exponential behavior
    • Use automated tools designed to detect vulnerable patterns
    • Implement regex testing as part of the CI/CD pipeline
    • Use fuzzing techniques to generate potentially problematic inputs
    • Test with known attack patterns for common regex vulnerabilities
    • Consider using static analysis tools that can identify vulnerable patterns
    • Benchmark regex performance against expected time complexity
    // Example: Using static analysis tools
    
    // Install tools:
    // npm install safe-regex
    // npm install vuln-regex-detector
    
    // Using safe-regex
    const safeRegex = require('safe-regex');
    
    function checkRegexSafety(pattern) {
      const isSafe = safeRegex(pattern);
      if (!isSafe) {
        console.warn(`Potentially vulnerable regex pattern detected: ${pattern}`);
      }
      return isSafe;
    }
    
    // Check some patterns
    checkRegexSafety(/^(a+)+$/); // Should return false (unsafe)
    checkRegexSafety(/^[a-z]+$/); // Should return true (safe)
    
    // Using vuln-regex-detector
    const { isVulnerable } = require('vuln-regex-detector');
    
    async function analyzeRegexVulnerability(pattern) {
      try {
        const result = await isVulnerable(pattern);
        if (result.vulnerable) {
          console.warn(`Vulnerable regex detected: ${pattern}`);
          console.warn(`Attack string: ${result.attackString}`);
        }
        return result;
      } catch (error) {
        console.error('Error analyzing regex:', error);
        return { vulnerable: false, error: error.message };
      }
    }
    
    // Usage
    analyzeRegexVulnerability('^(a+)+$').then(result => {
      console.log('Vulnerability analysis:', result);
    });
    # Python static analysis example
    
    # Using the rexploitable library
    # pip install rexploitable
    
    from rexploitable import analyze_pattern
    
    def check_regex_safety(pattern):
        result = analyze_pattern(pattern)
        if result.vulnerable:
            print(f"Vulnerable regex detected: {pattern}")
            print(f"Vulnerability type: {result.vulnerability_type}")
            print(f"Example attack string: {result.attack_string}")
        return result
    
    # Check patterns
    check_regex_safety(r'^(a+)+$')  # Should be vulnerable
    check_regex_safety(r'^[a-z]+$')  # Should be safe

    Static analysis tools can help identify potentially vulnerable regex patterns during development, before they can be exploited.

    Popular static analysis tools for ReDoS:

    • safe-regex (JavaScript): Identifies potentially vulnerable patterns
    • vuln-regex-detector (JavaScript): Analyzes regex vulnerability with attack string generation
    • rexploitable (Python): Analyzes regex patterns for vulnerabilities
    • RegexStaticAnalysis (Java): Static analysis for Java regex patterns
    • RXXR2: Research tool for regex vulnerability analysis

    Benefits of static analysis:

    • Early detection of vulnerable patterns
    • Integration into development workflows and CI/CD pipelines
    • Automated scanning of codebases
    • Education of developers about vulnerable patterns
    • Reduced risk of deploying vulnerable code
    ReDoS Prevention Checklist:
    
    1. Regex Pattern Design
       - Avoid nested repetition quantifiers (e.g., (a+)+)
       - Limit the use of alternation with overlapping patterns
       - Use atomic groups or possessive quantifiers when available
       - Prefer character classes over alternation when possible
       - Avoid backreferences in combination with repetition
       - Keep regex patterns as simple as possible
    
    2. Input Validation
       - Implement strict input validation before applying regex
       - Set appropriate length limits for inputs
       - Use simpler checks before complex regex patterns
       - Consider breaking complex patterns into multiple simpler ones
       - Validate against a whitelist when possible
    
    3. Execution Controls
       - Implement timeout mechanisms for regex operations
       - Run complex regex in separate threads or processes
       - Consider using worker pools for regex processing
       - Implement circuit breakers for regex operations
       - Fail closed (reject) when timeouts occur
    
    4. Alternative Approaches
       - Consider using alternative regex engines with linear time guarantees
       - Use parser generators or state machines for complex parsing
       - Consider non-regex approaches for input validation when appropriate
       - Use specialized validation libraries for common formats
    
    5. Testing and Analysis
       - Test regex patterns with inputs of increasing length
       - Use static analysis tools to identify vulnerable patterns
       - Include regex performance testing in CI/CD pipelines
       - Benchmark regex performance against expected time complexity
       - Use fuzzing to identify potentially problematic inputs
    
    6. Monitoring and Response
       - Monitor regex execution times in production
       - Implement alerting for abnormal regex processing times
       - Log and analyze regex timeout incidents
       - Have a response plan for ReDoS attacks
       - Regularly review and update regex patterns

    Preventing ReDoS vulnerabilities requires a comprehensive approach that addresses regex pattern design, input validation, execution controls, and monitoring.

    Key prevention strategies:

    • Design regex patterns carefully to avoid vulnerable constructs
    • Implement proper input validation before applying regex
    • Use timeout mechanisms to limit resource consumption
    • Consider alternative regex engines with linear time guarantees
    • Test regex patterns for performance issues
    • Use static analysis tools to identify vulnerable patterns
    • Monitor regex execution in production
    • Keep regex-related libraries and tools updated
    XML VulnerabilitiesCPU-Intensive Operations
    websitexgithublinkedin
    Powered by Mintlify
    Assistant
    Responses are generated using AI and may contain mistakes.