Java - split string by new line character
In this article, we're going to have a look at the problem how to split string to separated lines in Java.
Quick solution:
- It works on Linux and Windows:
xxxxxxxxxx
String text = "1\r\n2\r\n3";
String[] lines = text.split("\\r?\\n"); // \r\n or \n
- It works on all operating systems:
xxxxxxxxxx
String text = "1\r\n2\r\n3";
String[] lines = text.split("\r\n|\n\r|\n|\r"); // \r\n , \n\r , \n or \r
or:
xxxxxxxxxx
// import java.util.regex.Pattern;
Pattern PATTERN = Pattern.compile("\\r\\n|\\n\\r|\\n|\\r"); // \r\n , \n\r , \n or \r
String text = "1\r\n2\r\n3";
String[] lines = PATTERN.split(text);
// go to last section to see practical example
Look at below problem description and examples to see how it works in practice.
Different operating systems have different newline symbols.
There are a few most commonly used new line separations:
\n | Multics, Unix and Unix-like systems (Linux, macOS, FreeBSD, AIX, Xenix, etc.), BeOS, Amiga, RISC OS, and others. |
\r\n | Atari TOS, Microsoft Windows, DOS (MS-DOS, PC DOS, etc.), DEC TOPS-10, RT-11, CP/M, MP/M, OS/2, Symbian OS, Palm OS, Amstrad CPC, and most other early non-Unix and non-IBM operating systems. |
\r | Commodore 8-bit machines (C64, C128), Acorn BBC, ZX Spectrum, TRS-80, Apple II series, Oberon, the classic Mac OS, MIT Lisp Machine and OS-9 |
\n\r | Acorn BBC and RISC OS spooled text output. |
Source: https://en.wikipedia.org/wiki/Newline
This approach works with all operating systems. The presented example shows splitting to separated lines on mixed text.
xxxxxxxxxx
package com.example;
public class Program {
public static void main(String[] args) {
String text = "line 1\n" +
"line 2\r" +
"line 3\r\n" +
"line 4\n\r" +
"line 5";
String[] lines = text.split("\\r\\n|\\n\\r|\\n|\\r"); // expression symbols order is very important
for (String line : lines) {
System.out.println(line);
}
}
}
Output:
xxxxxxxxxx
line 1
line 2
line 3
line 4
line 5
Note:
above expression symbols order is very important to try to:
- split text by two characters as newline symbol at first (
\r\n
or\n\r
),- and later with single newline symbol (
\n
or\r
).let's suppose we have HTTP protocol response:
xxxxxxxxxx
1HTTP/1.1 200 OK\r\nContent-Length: 25\r\nContent-Type: text/html\r\n\r\nHello world!\nSome text...
for
\n
or\r
at begining of the expression we could get different number of newlines after splitting.after splitting we should get:
xxxxxxxxxx
1HTTP/1.1 200 OK
2Content-Length: 25
3Content-Type: text/html
4
5Hello world!
6Some text...
Second important thing is newline symbol unification per operationg system that makes posible to use below expression.
xxxxxxxxxx
package com.example;
public class Program {
public static void main(String[] args) {
String text = "line 1\r\n" +
"line 2\r\n" +
"line 3\r\n" +
"line 4\r\n" +
"line 5";
String[] lines = text.split("\\r\\n"); // expression symbols order is very important
for (String line : lines) {
System.out.println(line);
}
}
}
Output:
xxxxxxxxxx
line 1
line 2
line 3
line 4
line 5
Systems: Atari TOS, Microsoft Windows, DOS (MS-DOS, PC DOS, etc.), DEC TOPS-10, RT-11, CP/M, MP/M, OS/2, Symbian OS, Palm OS, Amstrad CPC, and most other early non-Unix and non-IBM operating systems.
xxxxxxxxxx
package com.example;
public class Program {
public static void main(String[] args) {
String text = "line 1\n" +
"line 2\n" +
"line 3\n" +
"line 4\n" +
"line 5";
String[] lines = text.split("\\n");
for (String line : lines) {
System.out.println(line);
}
}
}
Output:
xxxxxxxxxx
line 1
line 2
line 3
line 4
line 5
Systems: Multics, Unix and Unix-like systems (Linux, macOS, FreeBSD, AIX, Xenix, etc.), BeOS, Amiga, RISC OS, and others.
xxxxxxxxxx
package com.example;
public class Program {
public static void main(String[] args) {
String text = "line 1\r" +
"line 2\r" +
"line 3\r" +
"line 4\r" +
"line 5";
String[] lines = text.split("\\r");
for (String line : lines) {
System.out.println(line);
}
}
}
Output:
xxxxxxxxxx
line 1
line 2
line 3
line 4
line 5
Systems: Commodore 8-bit machines (C64, C128), Acorn BBC, ZX Spectrum, TRS-80, Apple II series, Oberon, the classic Mac OS, MIT Lisp Machine and OS-9.
Some split operations are executed many times in a source code. That makes sense to do not compile patterns inside String
split()
function each time when we call it - check split()
function body. The improvement for the code can be to use Pattern
class and create an object for it only once.
Example:
xxxxxxxxxx
package com.example;
import java.util.regex.Pattern;
public class Program {
private static Pattern PATTERN = Pattern.compile("\\r\\n|\\n\\r|\\n|\\r");
public static void main(String[] args) {
String text = "HTTP/1.1 200 OK\r\n" +
"Content-Length: 25\r\n" +
"Content-Type: text/html\r\n" +
"\r\n" +
"Hello world!\n" +
"Some text...";
String[] lines = PATTERN.split(text);
for (String line : lines) {
System.out.println(line);
}
}
}
Output:
xxxxxxxxxx
HTTP/1.1 200 OK
Content-Length: 25
Content-Type: text/html
Hello world!
Some text...