Greedy
patternPart(.*)
is greedy
Greedy consumes the maximum of the input that matches
String methodFullName = "foo.bar.foobar.Foox.doFoo"; Pattern pattern = Pattern.compile("(.*)\\.(.*)$"); Matcher matcher = pattern.matcher(methodFullName); if (matcher.matches()) { String className = matcher.group(1); String methodName = matcher.group(2); System.out.println(className + "!" + methodName); } |
output:
foo.bar.foobar.Foox!doFoo |
Reluctant
Reluctant consumes the minimum of the input that matches
patternPart.*?otherPatternParn
is reluctant in the way where it will search the pattern
patternPart
and will consume less chars as possible in .* in order to be able to match to the next part : otherPatternParn
Simple example :
We need to look for a command where 3 times the term « java » happens :
ps aux | grep grep -P ".*java.*?java.*?java"
Here we will capture only data inside the first td tag (by the way we use grouping : enclose the reluctant pattern to ease the getting of result):
String methodFullName = "<tr><td>foo</td><td>bar</td></tr>"; Pattern pattern = Pattern.compile("<tr><td>(.*?)</td>.*</tr>"); Matcher matcher = pattern.matcher(methodFullName); if (matcher.matches()) { String firstTd = matcher.group(1); System.out.println(firstTd); } |
output:
foo |
Ignore cases : (?i) patternPart
String myString="Hello world my dear"; boolean matches = myString.matches("(?i).*HeLLo WorlD.*"); System.out.println(matches); |
DOTALL flag to match any chars even line terminator
In dotall mode, the expression .
matches any character, including a line terminator.
By default this expression does not match line terminators.
We could specify it with Pattern.compile(String regex, int flag)
or directly in the regex witt the flag : (?s)
.
String myString="Hey\n I love you\t Hello world\n"; boolean matches = myString.matches("(?s).*Hello world.*"); System.out.println(matches); |
output:
true |
Exclude one or multiple characters
The syntax is : [^charact]
Example:
regex_with_character_exclusion = 'house [^a-z]*love' match: Match = re.match(regex_with_character_exclusion, 'house love') print(f'match={match}') # match match: Match = re.match(regex_with_character_exclusion, 'house B-123love') print(f'match={match}') # match match: Match = re.match(regex_with_character_exclusion, 'house b-123love') print(f'match={match}') # not match |
Specify a set of character that is including alphabet with accent
pattern: Pattern = re.compile(r'[^à-ÿa-zA-Z]') |