Regex (Java, Bash…)

Greedy
patternPart(.*) is greedy
Greedy consumes the maximum of the input that matches

String methodFullName = "foo.bar.foobar.Foox.doFoo";
Pattern pattern = Pattern.compile("(.*)\\.(.*)$");
Matcher matcher = pattern.matcher(methodFullName);
if (matcher.matches()) {
    String className = matcher.group(1);
    String methodName = matcher.group(2);
    System.out.println(className + "!" + methodName);
}

output:

foo.bar.foobar.Foox!doFoo

Reluctant
Reluctant consumes the minimum of the input that matches
patternPart.*?otherPatternParn is reluctant in the way where it will search the pattern patternPart and will consume less chars as possible in .* in order to be able to match to the next part : otherPatternParn 

Simple example :
We need to look for a command where 3 times the term « java » happens :
ps aux | grep grep -P ".*java.*?java.*?java"

Here we will capture only data inside the first td tag (by the way we use grouping : enclose the reluctant pattern to ease the getting of result):

String methodFullName = "<tr><td>foo</td><td>bar</td></tr>";
        Pattern pattern = Pattern.compile("<tr><td>(.*?)</td>.*</tr>");
        Matcher matcher = pattern.matcher(methodFullName);
        if (matcher.matches()) {
            String firstTd = matcher.group(1);
            System.out.println(firstTd);
        }

output:

foo

Ignore cases : (?i) patternPart

String myString="Hello world my dear";
boolean matches = myString.matches("(?i).*HeLLo WorlD.*");
System.out.println(matches);

DOTALL flag to match any chars even line terminator
In dotall mode, the expression . matches any character, including a line terminator.
By default this expression does not match line terminators.
We could specify it with Pattern.compile(String regex, int flag) or directly in the regex witt the flag : (?s).

String myString="Hey\n I love you\t Hello world\n";
boolean matches = myString.matches("(?s).*Hello world.*");
System.out.println(matches);

output:

true

Exclude one or multiple characters
The syntax is : [^charact]
Example:

regex_with_character_exclusion = 'house [^a-z]*love'
match: Match = re.match(regex_with_character_exclusion, 'house love')
print(f'match={match}')  # match
match: Match = re.match(regex_with_character_exclusion, 'house B-123love')
print(f'match={match}')  # match
match: Match = re.match(regex_with_character_exclusion, 'house b-123love')
print(f'match={match}')  # not match

Specify a set of character that is including alphabet with accent

pattern: Pattern = re.compile(r'[^à-ÿa-zA-Z]')
Ce contenu a été publié dans Non classé. Vous pouvez le mettre en favoris avec ce permalien.

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *