Pattern Matching - Java compared to Perl - codecentric AG Blog

:

: 6

“Perl is born to pattern match.” – I truly believe that this statement is no exaggeration. Perl is solving this problem in an extremely efficient and elegant way. The following short script is showing some examples for regular expressions in Perl. Hopefully I will not be struck by lightning for posting Perl code in our blog that is otherwise dominated by Java ;-).

#!/usr/bin/perl -w
 
$sampleText = <<END;
Here is some text that will be used for pattern matching in this example.
Of course we need some nice outstanding words to match and some \\special
character and here some 1234 Number that will just do fine. And more ...
END
 
print "Complete Text:\n";
print $sampleText;
print "\n";
 
#
# Let's match something easy like the word "outstanding"
#
if ($sampleText =~ /(outstanding)/) {
    print "Pattern found: " . $1 . "\n\n";
}
 
#
# Let's match two expressions one being a number
#
if ($sampleText =~ /(\d+)\s+(Number)/) {
    print "Pattern found: " . $1 . $2 . "\n\n";
}
 
#
# Let's match something a bit more complicated like \\special
#
if ($sampleText =~ /(\\special)/) {
    print "Pattern found: " . $1 . "\n\n";
}
 
#
# Let's match something ignoring the case and that is the first word of
# the input string.
#
if ($sampleText =~ /^(here)/i) {
    print "Pattern found: " . $1 . "\n\n";
}
 
#
# Let's replace all occurrences of the word "and" with "NOAND"
# (without the \s+ we would also change the "and" in outst-and-ing)
#
if ($sampleText =~ s/(\s+)(and)(\s+)/$1NOAND$3/gi) {
    print "Changed Text:\n" . $sampleText . "\n\n";
}

#!/usr/bin/perl -w $sampleText = <<END; Here is some text that will be used for pattern matching in this example. Of course we need some nice outstanding words to match and some \\special character and here some 1234 Number that will just do fine. And more ... END print "Complete Text:\n"; print $sampleText; print "\n"; # # Let's match something easy like the word "outstanding" # if ($sampleText =~ /(outstanding)/) { print "Pattern found: " . $1 . "\n\n"; } # # Let's match two expressions one being a number # if ($sampleText =~ /(\d+)\s+(Number)/) { print "Pattern found: " . $1 . $2 . "\n\n"; } # # Let's match something a bit more complicated like \\special # if ($sampleText =~ /(\\special)/) { print "Pattern found: " . $1 . "\n\n"; } # # Let's match something ignoring the case and that is the first word of # the input string. # if ($sampleText =~ /^(here)/i) { print "Pattern found: " . $1 . "\n\n"; } # # Let's replace all occurrences of the word "and" with "NOAND" # (without the \s+ we would also change the "and" in outst-and-ing) # if ($sampleText =~ s/(\s+)(and)(\s+)/$1NOAND$3/gi) { print "Changed Text:\n" . $sampleText . "\n\n"; }

Doing the same thing in Java is a bit more tricky as Java’s very strict object-oriented approach makes this a bit bulky. One have to use the classes Pattern and Matcher. Here it can be already seen that the Pattern-class is implemented with a close look to the Perl-implementation. Thus there are similar operators, for example Pattern.CASE_INSENSITIVE for the i-operator and Pattern.MULTILINE for the m-operator. The g-operator is implemented by the method replaceAll(…) from the Matcher-class.

The following code shows the Java-equivalent to the Perl-script shown above:

import java.util.regex.Pattern;
import java.util.regex.Matcher;
 
public class PMatch {
 
	public String sampleText = "Here is some text that will be used for"
			+ " pattern matching in this example.\n"
			+ "Of course we need some nice outstanding words to match"
			+ " and some \\special\n"
			+ "character and here some 1234 Number that will just do"
			+ " fine. And more ...";
 
	public void printText() {
		System.out.println("Complete Text:\n" + sampleText + "\n");
	}
 
	public void matchStandardText() {
		Pattern p = Pattern.compile("(outstanding)");
		Matcher m = p.matcher(sampleText);
		if (m.find()) {
			System.out.println("Pattern found: " + m.group(1) + "\n");
		}
	}
 
	public void matchTwoExpressions() {
		Pattern p = Pattern.compile("(\\d+)\\s+(Number)");
		Matcher m = p.matcher(sampleText);
		if (m.find()) {
			System.out.println("Pattern found: " + m.group(1) + m.group(2)
					+ "\n");
		}
	}
 
	public void matchSecialChar() {
		Pattern p = Pattern.compile("(\\\\special)");
		Matcher m = p.matcher(sampleText);
		if (m.find()) {
			System.out.println("Pattern found: " + m.group(1) + "\n");
		}
	}
 
	public void matchIgnoreCase() {
		Pattern p = Pattern.compile("^(here)", Pattern.CASE_INSENSITIVE);
		Matcher m = p.matcher(sampleText);
		if (m.find()) {
			System.out.println("Pattern found: " + m.group(1) + "\n");
		}
	}
 
	public void replace() {
		Pattern p = Pattern.compile("(\\s+)(and)(\\s+)",
				Pattern.CASE_INSENSITIVE);
		Matcher m = p.matcher(sampleText);
		if (m.find()) {
			sampleText = m.replaceAll(m.group(1) + "NOAND" + m.group(3));
			System.out.println("Changed Text:\n" + sampleText);
		}
	}
 
	public static void main(String[] args) {
		PMatch pMatch = new PMatch();
 
		pMatch.printText();
		pMatch.matchStandardText();
		pMatch.matchTwoExpressions();
		pMatch.matchSecialChar();
		pMatch.matchIgnoreCase();
		pMatch.replace();
	}
}

import java.util.regex.Pattern; import java.util.regex.Matcher; public class PMatch { public String sampleText = "Here is some text that will be used for" + " pattern matching in this example.\n" + "Of course we need some nice outstanding words to match" + " and some \\special\n" + "character and here some 1234 Number that will just do" + " fine. And more ..."; public void printText() { System.out.println("Complete Text:\n" + sampleText + "\n"); } public void matchStandardText() { Pattern p = Pattern.compile("(outstanding)"); Matcher m = p.matcher(sampleText); if (m.find()) { System.out.println("Pattern found: " + m.group(1) + "\n"); } } public void matchTwoExpressions() { Pattern p = Pattern.compile("(\\d+)\\s+(Number)"); Matcher m = p.matcher(sampleText); if (m.find()) { System.out.println("Pattern found: " + m.group(1) + m.group(2) + "\n"); } } public void matchSecialChar() { Pattern p = Pattern.compile("(\\\\special)"); Matcher m = p.matcher(sampleText); if (m.find()) { System.out.println("Pattern found: " + m.group(1) + "\n"); } } public void matchIgnoreCase() { Pattern p = Pattern.compile("^(here)", Pattern.CASE_INSENSITIVE); Matcher m = p.matcher(sampleText); if (m.find()) { System.out.println("Pattern found: " + m.group(1) + "\n"); } } public void replace() { Pattern p = Pattern.compile("(\\s+)(and)(\\s+)", Pattern.CASE_INSENSITIVE); Matcher m = p.matcher(sampleText); if (m.find()) { sampleText = m.replaceAll(m.group(1) + "NOAND" + m.group(3)); System.out.println("Changed Text:\n" + sampleText); } } public static void main(String[] args) { PMatch pMatch = new PMatch(); pMatch.printText(); pMatch.matchStandardText(); pMatch.matchTwoExpressions(); pMatch.matchSecialChar(); pMatch.matchIgnoreCase(); pMatch.replace(); } }

It becomes quite obvious that there are a lot of similarities. One only has to keep in mind that a String remains a String in Java and thus a “\” has to be escaped with a “\”. This leads then to expressions like Pattern.compile(“(\\\\special)”), but this is of course no problem in the end. The output of both programs is identical.

Complete Text:
Here is some text that will be used for pattern matching in this example.
Of course we need some nice outstanding words to match and some \special
character will just do fine. And more ...
 
Pattern found: outstanding
 
Pattern found: 1234Number
 
Pattern found: \special
 
Pattern found: Here
 
Changed Text:
Here is some text that will be used for pattern matching in this example.
Of course we need some nice outstanding words to match NOAND some \special
character will just do fine. NOAND more ...

Complete Text: Here is some text that will be used for pattern matching in this example. Of course we need some nice outstanding words to match and some \special character will just do fine. And more ... Pattern found: outstanding Pattern found: 1234Number Pattern found: \special Pattern found: Here Changed Text: Here is some text that will be used for pattern matching in this example. Of course we need some nice outstanding words to match NOAND some \special character will just do fine. NOAND more ...

Does this mean it is better using Perl for applications that are using Pattern Matching intensively? No, luckily there is also an alternative for fans of Java, namely: Groovy. Groovy is supporting a syntax that comes really close to Perl. The examples shown here are giving an idea how this might look like. On the other hand some small Perl-script every now and then is also not to be scoffed at :-).