🎯 Exemples recommandés
Balanced sample collections from various categories for you to explore
Modèles d'Expression Régulière Risqués
Collection de modèles regex démontrant des vulnérabilités de sécurité, des problèmes de performance et des anti-paternes courants à éviter
📝 Échappements de Classe de Caractères Invalides regex
Classes de caractères avec des plages invalides ou des caractères spéciaux non échappés
# Invalid Character Class Samples
# Character classes with syntax errors or ambiguous patterns
# Risk Level: LOW - Syntax errors or unexpected behavior
# --- Invalid Ranges ---
# Pattern: [a-Z]
# Problem: Z comes before a in ASCII, invalid range
# May match: a, Z, and characters between them in ASCII
[a-Z]
# Fix: [a-zA-Z]
# Pattern: [Z-a]
# Problem: Reverse range, invalid
[Z-a]
# Fix: [a-z]
# Pattern: [0-z]
# Problem: Ambiguous range including special chars
[0-z]
# Fix: [0-9a-zA-Z] or [0-9a-zA-Z]
# --- Unescaped Special Characters ---
# Pattern: [^]]
# Problem: Unclosed character class
# Problematic in: [^]]+
# Fix: [^\]]+ or [^]]+] (depending on engine)
# Pattern: [--]
# Problem: Dash placement confusion
[--]
# Fix: [-\-] or [\--]
# Pattern: [^^]
# Problem: Caret placement confusion
[^^]
# Fix: [\^] or [^\^]
# --- Ambiguous Escapes ---
# Pattern: [\w]
# Problem: Redundant escape in character class
# \w works, but may not mean what you think in some engines
[\w]
# Fix: \w outside class or [a-zA-Z0-9_]
# Pattern: [\b]
# Problem: In character class, \b is backspace, not word boundary
[\b]
# Clarify: Inside class = backspace, Outside = word boundary
# Pattern: [\d]
# Problem: May not work in all engines
[\d]
# Fix: [0-9] or \d outside class
📝 Échappements Redondants regex
Séquences d'échappement inutiles qui réduisent la lisibilité
# Redundant Escape Samples
# Unnecessary escape sequences that clutter patterns
# Risk Level: LOW - Style and readability issues
# --- Unnecessary Character Escapes ---
# Pattern: \-
# Problem: Hyphen doesn't need escaping outside character class
\-
# Fix: - (just use hyphen)
# Pattern: \:
# Problem: Colon is not a special character
\:
# Fix: : (just use colon)
# Pattern: \.
# Problem: Escaping period when you want literal
# If you want literal: \.
# If you want any char: .
\. # literal period
. # any character
# Pattern: \
# Problem: Single backslash (escaped backslash)
# Often confused with: \ (backslash escape sequence)
\\ # literal backslash
# --- Unnecessary Character Class Escapes ---
# Pattern: [a-z]
# Problem: Escape of hyphen not needed when at edges
[a-z\-] # unnecessary
[a-z-] # better (hyphen at end)
[-a-z] # better (hyphen at start)
# Pattern: [\^]
# Problem: Caret doesn't need escape when not first
[^\^] # caret not first, no escape needed
[\^] # caret first (or anywhere) in negated class
# Pattern: [\]]
# Problem: Escape only needed in some positions
[a-z\]] # necessary here
[\]a-z] # necessary here
# --- Letter Escapes ---
# Pattern: \c\a\t
# Problem: Escaping letters when not needed
# Unless they are special: b, d, s, w, etc.
\c\a\t # unnecessary
cat # just letters
# Pattern: [\Q\E]
# Problem: \Q and \E don't work in character classes
[\Q\E] # just matches Q, E, or backslash
# --- Numeric Escapes ---
# Pattern: \1 vs \1
# Problem: Ambiguous - backreference or octal?
# In modern regex: Usually backreference
# In some contexts: Octal
# Clarify: Use \k<name> for named backreferences to avoid ambiguity
📝 Groupes de Capture Excessifs regex
Groupes de capture inutiles qui impactent les performances
# Excessive Capturing Group Samples
# Unnecessary capturing groups that hurt performance and readability
# Risk Level: LOW - Performance and maintainability issues
# --- Unneeded Captures ---
# Pattern: (\d+)\s+(\w+)\s+(\d+)
# Problem: Capturing when you only need to match
# If you don't need the groups, use non-capturing
(\d+)\s+(\w+)\s+(\d+)
# Fix: \d+\s+\w+\s+\d+ (no groups)
# Or: (?:\d+)\s+(?:\w+)\s+(?:\d+) (non-capturing)
# Pattern: (https?)://([^\s]+)
# Problem: Capturing protocol when you just want validation
(https?)://([^\s]+)
# Fix: (?:https?)://[^\s]+ or https?://[^\s]+
# --- Nested Captures ---
# Pattern: ((\d+)\s+(\w+))
# Problem: Nested capturing groups
# Creates: Group 1: entire match, Group 2: digits, Group 3: word
((\d+)\s+(\w+))
# Fix: Use non-capturing where possible:
# (?: (\d+)\s+(\w+) ) or just flatten
# Pattern: (a(b(c)d)e)
# Problem: Deeply nested captures
# Creates: 4 capturing groups
(a(b(c)d)e)
# Fix: (?:a(?:b(?:c)d)e) or a(?:b(?:c)d)e
# --- Performance Impact ---
# Pattern with captures: ~30-50% slower than non-capturing
# Benchmark: Matching 1000 strings
# Slow: (\d{3})-(\d{3})-(\d{4})
# Fast: \d{3}-\d{3}-\d{4}
# Slow: (\w+)@(\w+)\.(\w+)
# Fast: \w+@\w+\.\w+
# --- When to Use Captures ---
# Use capturing groups when:
# - You need to extract specific parts
# - You need backreferences
# Example: (\w+)\s+\1 # repeated word
# Use non-capturing (?:...) when:
# - Grouping for quantifiers: (?:abc){3}
# - Grouping for alternation: (?:a|b|c)
# - Grouping for precedence: ^(?:abc|def)
# --- Named Groups for Clarity ---
# Instead of: (\d{4})-(\d{2})-(\d{2})
# Use: (?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
# Improves: Readability and maintenance
# Performance: Similar to capturing groups, but clearer
📝 Modèles de Validation Sans Ancrage regex
Modèles regex sans ancrages qui peuvent correspondre à des positions non intentionnelles dans l'entrée
# Unanchored Pattern Samples
# Patterns that can match anywhere in the string, causing validation issues
# Risk Level: MEDIUM - Can bypass validation
# --- Number Validation Without Anchors ---
# Pattern: \d+
# Problem: Matches digits anywhere, not the whole string
# Valid: "123"
# Also Matches: "abc123def", "123abc", "a1b2c3"
\d+
# Fix: ^\d+$
# --- Email Validation Without Anchors ---
# Pattern: [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
# Problem: Matches email anywhere, allows bypass
# Valid: "[email protected]"
# Also Matches: "[email protected]<script>alert('xss')</script>", "[email protected] malicious content"
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
# Fix: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
# --- URL Validation Without Anchors ---
# Pattern: https?://[^\s]+
# Problem: Matches URL anywhere, allows injection
# Valid: "https://example.com"
# Also Matches: "javascript:https://evil.com", "https://example.com" onclick="steal()""
https?://[^\s]+
# Fix: ^https?://[^\s]+$
# --- Phone Number Without Anchors ---
# Pattern: \d{3}-?\d{3}-?\d{4}
# Problem: Matches pattern anywhere
# Valid: "123-456-7890"
# Also Matches: "Call 123-456-7890 now!", "my number is 123-456-7890 thanks"
\d{3}-?\d{3}-?\d{4}
# Fix: ^\d{3}-?\d{3}-?\d{4}$
# --- HTML Tag Without Anchors ---
# Pattern: <div>.*</div>
# Problem: Matches across multiple unintended divs
# Valid: "<div>content</div>"
# Also Matches: "<div>content1</div><script>evil()</script><div>content2</div>"
<div>.*</div>
# Fix: <div>[^<]*</div> or use proper HTML parser
📝 Correspondance Gloutonne Excessive regex
Quantificateurs gloutons qui consomment plus que prévu, provoquant des correspondances incorrectes
# Excessive Greedy Matching Samples
# Greedy quantifiers (.*) consuming more than intended
# Risk Level: MEDIUM - Incorrect behavior and performance issues
# --- HTML/XML Greedy Matching ---
# Pattern: <div>.*</div>
# Problem: .* is greedy and matches across multiple tags
# Input: <div>First</div><div>Second</div>
# Matches: Entire string instead of individual divs
<div>.*</div>
# Fix: <div>.*?</div> (lazy) or <div>[^<]*</div> (negated character class)
# Pattern: <.*>
# Problem: Matches from first < to last >
# Input: <div> <span>text</span> </div>
# Matches: Entire string as one match
<.*>
# Fix: <[^>]+>
# --- URL Greedy Capture ---
# Pattern: https?://.*
# Problem: .* captures to end of line, including other URLs
# Input: "https://example.com https://another.com"
# Matches: "https://example.com https://another.com"
https?://.*
# Fix: https?://[^\s]+
# --- Quote Matching ---
# Pattern: ".*"
# Problem: Greedy match across multiple quoted strings
# Input: He said "hello" and "goodbye"
# Matches: "hello" and "goodbye" as one match: "hello" and "goodbye"
".*"
# Fix: "[^"]*"
# --- Code Comment Extraction ---
# Pattern: //.*
# Problem: Matches across multiple lines if no newline
# Input: code // comment1; more code // comment2
//.*
# Fix: //[^\n]*
# --- Between Delimiters ---
# Pattern: \|.*\|
# Problem: Greedy match from first | to last |
# Input: a|b|c|d
# Matches: "b|c" instead of "b"
\|.*\|
# Fix: \|[^|]*\|
📝 Quantificateurs Paresseux Inefficaces regex
Quantificateurs paresseux qui sont encore inefficaces ou incorrects pour le cas d'usage
# Inefficient Lazy Quantifier Samples
# .*? patterns that are inefficient or better served by character classes
# Risk Level: LOW-MEDIUM - Performance issues
# --- Lazy vs Character Class ---
# Pattern: <a>.*?</a>
# Problem: Lazy quantifier still backtracks
# Better: <a>[^<]*</a>
<a>.*?</a>
# Pattern: ".*?"
# Problem: Lazy quantifier for quotes is slow
# Better: "[^"]*"
".*?"
# Pattern: \(.*?\)
# Problem: Lazy for parentheses
# Better: \([^)]*\)
\(.*?\)
# --- Lazy in Complex Patterns ---
# Pattern: ^\w+: .*?$\s+^\w+: .*?$ (with multiline)
# Problem: Multiple lazy quantifiers slow on large text
# Better: Use specific patterns for each field
^\w+: .*?$\s+^\w+: .*?$
# --- Nested Lazy ---
# Pattern: (.*?){3}
# Problem: Nested lazy quantifier
# Better: Use specific pattern or split logic
(.*?){3,5}
# --- Lazy with Alternation ---
# Pattern: (a.*?|b.*?|c.*?)
# Problem: Lazy with multiple alternatives
# Better: (?:a[^b]*|b[^c]*|c[^a]*)
(a.*?|b.*?|c.*?)
# --- Performance Comparison ---
# Inefficient: .*?@.*?
# For email: [^@]+@[^@]+
# The character class version is 2-3x faster
# Inefficient: <div>\s*.*?\s*</div>
# Better: <div>\s*[^<]*\s*</div>
<div>\s*.*?\s*</div>
📝 Ordre d'Alternation Ambigu regex
Modèles d'alternation où l'ordre provoque des correspondances inattendues
# Ambiguous Alternation Order Samples
# Alternation patterns where order critically affects matching
# Risk Level: MEDIUM - Incorrect behavior
# --- Prefix Ambiguity ---
# Pattern: cat|category
# Problem: "cat" matches first, "category" never fully matched
# Input: "category"
# Matches: "cat" instead of "category"
cat|category
# Fix: category|cat
# Pattern: a|ab|abc
# Problem: Shortest matches first
# Input: "abc"
# Matches: "a" instead of "abc"
a|ab|abc
# Fix: abc|ab|a
# --- Partial Match Issues ---
# Pattern: Mon|Tues|Wed|Thurs|Fri|Sat|Sun
# Problem: Order matters for prefixes
# Input: "Thursday"
# Matches: "Th" from "Thurs"? No, but ambiguous
Mon|Tues|Wed|Thurs|Fri|Sat|Sun
# Fix: Order by length descending: Thurs|Tues|Wed|...
# Pattern: http|https
# Problem: "http" matches first
# Input: "https://example.com"
# Matches: "http" instead of "https"
http|https
# Fix: https|http
# --- Overlapping Patterns ---
# Pattern: \d+|\d{2}
# Problem: First always wins
# Input: "12"
# Matches: "12" via \d+ not \d{2}
\d+|\d{2}
# Fix: \d{2}|\d+ or use specific patterns
# --- Logical Conflicts ---
# Pattern: foo.*|foo.*bar
# Problem: First pattern swallows second
# Input: "foobazbar"
# Matches: via foo.* (greedy), foo.*bar never tried
foo.*|foo.*bar
# Fix: foo.*bar|foo.*
# --- Word Boundaries with Alternation ---
# Pattern: \b(cat|category)\b
# Problem: Both can't have word boundaries correctly
# Input: "category"
# Issue: "cat" part matches, "egory" breaks word boundary
\b(cat|category)\b
# Fix: \b(category|cat)\b but still problematic for "cat" in "category"
📝 Double Négation Confusante regex
Modèles à double négation difficiles à lire et sujets aux erreurs
# Double Negation Samples
# Confusing negative patterns that are hard to read and maintain
# Risk Level: LOW - Readability and maintenance issues
# --- Negated Negated Character Class ---
# Pattern: [^[^]]
# Problem: Double negation - confusing
# Means: Not (not closing bracket)
[^[^]]
# Fix: [\]] or just ] if not in class
# Pattern: [^\D]
# Problem: Double negation for digit
# Means: Not (not digit) = digit
[^\D]
# Fix: \d or [0-9]
# Pattern: [^\W]
# Problem: Double negation for word char
# Means: Not (not word) = word char
[^\W]
# Fix: \w or [a-zA-Z0-9_]
# Pattern: [^\S]
# Problem: Double negation for whitespace
# Means: Not (not space) = space
[^\S]
# Fix: \s or [ \t\n\r]
# --- Nested Negative Lookaheads ---
# Pattern: ^(?!.*(?!pattern)).*
# Problem: Confusing double negative lookahead
^(?!.*(?!pattern)).*
# Fix: Simplify logic or use positive assertions
# Pattern: (?![^a])
# Problem: Double negative lookahead
# Means: Not followed by not 'a' = followed by 'a'
(?![^a])
# Fix: (?=a)
# --- Negated Everything Except ---
# Pattern: [^abc]
# Problem: Negative thinking
# Consider: What if you want to match most things?
[^abc]
# Fix: Consider if positive class is clearer for your case
# Pattern: (?!abc).*
# Problem: Negative lookahead to exclude
# Input: "def" matches, "abc" doesn't
(?!abc).*
# Fix: Consider if positive pattern is clearer
📝 Ambiguïté Octale et Référence Arrière regex
Modèles où \1 pourrait signifier octal ou référence arrière
# Octal Ambiguity Samples
# Patterns where \1, \2 etc. have ambiguous meanings
# Risk Level: MEDIUM - Cross-compiler compatibility issues
# --- \0 Ambiguity ---
# Pattern: \0
# Problem: \0 can mean null character or octal 000
# In JavaScript: Octal escape (deprecated)
# In Python: Octal 000 (null char)
\0
# Fix: Use \x00 for null character (more explicit)
# Pattern: \01
# Problem: Could be octal 001 or backreference to group 1
# Modern engines: Usually backreference if group exists
# Old engines: Octal 001
\01
# Fix: Use \x01 for octal or \g<1> for backreference
# --- Backreference vs Octal ---
# Pattern: (.)\1
# Problem: Is \1 backreference to group 1 or octal?
# With group: Backreference
# Without group: Octal (in some engines)
(.)\1
# Fix: Use \g<1> or \k<1> for named groups
# Pattern: \10
# Problem: Could be backreference to group 10 or octal 010
# Depends on number of capturing groups
\10
# Fix: Use \g{10} for clarity
# --- Leading Zeros ---
# Pattern: \01 in (a)\01
# Problem: Ambiguous with only one group
(a)\01 # Is this backreference to 1 or octal?
# Fix: \g<1> or avoid leading zeros
# Pattern: \001
# Problem: Definitely octal, but unclear which
\001 # Octal 001 = decimal 1
# Fix: \x01 for hex escape
# --- Octal in Character Classes ---
# Pattern: [\01]
# Problem: Octal in character class
# Matches: character with octal value 001 (null char)
[\01]
# Fix: [\x01] for clarity
# Pattern: [\0-\7]
# Problem: Octal range
# Matches: characters from null to bell (0-7 decimal)
[\0-\7]
# Fix: Use \x00-\x07 or explicit characters
# --- Cross-Engine Differences ---
# In JavaScript:
\1 # Octal 001 (deprecated in strict mode)
# In Python:
\1 # Backreference to group 1
\01 # Octal if group 1 doesn't exist
# In PCGRE:
\1 # Backreference
\01 # Octal if fewer than 1 group
\g1 # Unambiguous backreference
📝 Modèles de Backtracking Catastrophique regex
Modèles regex qui peuvent causer un backtracking exponentiel conduisant à un Déni de Service (ReDoS)
# Catastrophic Backtracking (ReDoS) Samples
# These patterns can cause exponential time complexity on non-matching input
# Risk Level: CRITICAL - Can cause DoS attacks
# --- Nested Quantifiers ---
# Pattern: (a+)+
# Problem: Nested quantifiers create exponential backtracking
# Dangerous Input: aaaaaaaaaaaaaaaaX (20+ 'a's followed by non-matching char)
(a+)+
# Pattern: (.*?)+
# Problem: Lazy quantifier nested in greedy quantifier
# Dangerous Input: Any input that doesn't fully match
(.*?)+
# Pattern: (.*)+
# Problem: Nested greedy quantifiers
# Dangerous Input: Input with partial match at end
(.*)+
# Pattern: ^(a+)+$
# Problem: Anchored nested quantifiers
# Dangerous Input: aaaaaaaaaaaaaaaaaaaaX
^(a+)+$
# Pattern: ^((a+)+)+$
# Problem: Triple nested quantifiers - extremely dangerous
# Dangerous Input: aaaaaaaaaaaaaaaaaaaaX
^((a+)+)+$
# --- Overlapping Alternatives ---
# Pattern: (a|a)+
# Problem: Identical alternatives in quantifier
# Dangerous Input: aaaaaaaaaaaaaaaaX
(a|a)+
# Pattern: (ab|abc)+
# Problem: Overlapping alternatives
# Dangerous Input: ababababababababX
(ab|abc)+
# Pattern: (\d|\d\d)+
# Problem: Prefix relationship between alternatives
# Dangerous Input: 12345678901234567890X
(\d|\d\d)+
# --- Exponential Patterns ---
# Pattern: (a|b|c)*x
# Problem: Wildcard before specific match
# Dangerous Input: aaaaaaaaaaaaaaaaaaaa (without x)
(a|b|c)*x
# Pattern: ^(a*)*$
# Problem: Nested star quantifier
# Dangerous Input: aaaaaaaaaaaaaaaaaaaaX
^(a.*)*$
# Pattern: .*=(.*).*=(.*).*
# Problem: Multiple backtracking points
# Dangerous Input: a=bbbbbbbbbbbbbbbbbbbb=c=dddddddddddddddddddd
.*=(.*).*=(.*).*