Languages
[Edit]
EN

Java - i18n equivalent for \w in regular expression (i18l word characters matching)

5 points
Created by:
Emrys-Li
350

In this short article, we would like to show how to improve \w rule to match i18n word characters in Java.

\w is equals to [a-zA-Z_0-9]

To match i18n word characters we should use:

[\p{L}_\p{N}]

 

Practical example

In this section, the below program iterates through text finding matched i18n characters organized in words.

On line printed in output represents a single matched word.

package com.example;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Program {

    public static void main(String[] args) {

        Pattern pattern = Pattern.compile("[\\p{L}_\\p{N}]+");  // i18n equivalent for \w

        String text = "日本 żółty Россия red";
        Matcher matcher = pattern.matcher(text);

        while (matcher.find()) {
            System.out.println(matcher.group());
        }
    }
}

Output:

日本
żółty
Россия
red

Note: above rule can have problems with some scripts/alphabets, e.g. Hebrew.

References

  1. Predefined Character Classes - Oracle Docs 
Native Advertising
🚀
Get your tech brand or product in front of software developers.
For more information Contact us
Dirask - we help you to
solve coding problems.
Ask question.

❤️💻 🙂

Join