EN
Java - get html text with Jsoup library
10
points
In this short article we would like to show how to use Jsoup library to get plain text from html - it is similar to innerText
or textContent
property.
Quick solution:
// import org.jsoup.Jsoup;
// import org.jsoup.nodes.Document;
String html = "<html><body>Text here...</body></html>";
Document document = Jsoup.parse(html);
String text = document.text();
System.out.print(text); // Text here...
Practical example
Note: example requires to attach maven dependency to
pom.xml
file - check below code blocks.
package example;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class Program {
public static void main(String[] args) {
String html = "" +
"<html>" +
" <body>" +
" <p>First line...</p>" +
" <p>Second line...</p>" +
" </body>" +
"</html>";
Document document = Jsoup.parse(html);
String text = document.text();
System.out.print(text); // First line... Second line...
}
}
Output:
First line... Second line...
dependency required in pom.xml
:
<!-- https://mvnrepository.com/artifact/org.jsoup/jsoup -->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.13.1</version>
</dependency>
Note: go to https://mvnrepository.com/artifact/org.jsoup/jsoup and download latest version.