Languages
[Edit]
EN

Java - get html text with Jsoup library

10 points
Created by:
JustMike
26670

In this short article we would like to show how to use Jsoup library to get plain text from html - it is similar to innerText or textContent property.

Quick solution:

// import org.jsoup.Jsoup;
// import org.jsoup.nodes.Document;

String html = "<html><body>Text here...</body></html>";
Document document = Jsoup.parse(html);
String text = document.text();

System.out.print(text); // Text here...

Practical example

Note: example requires to attach maven dependency to pom.xml file - check below code blocks.

package example;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class Program {

    public static void main(String[] args) {

        String html = "" +
                "<html>" +
                "  <body>" +
                "    <p>First line...</p>" +
                "    <p>Second line...</p>" +
                "  </body>" +
                "</html>";

        Document document = Jsoup.parse(html);
        String text = document.text();

        System.out.print(text); // First line... Second line...
    }
}

Output:

First line... Second line...

dependency required in pom.xml:

<!-- https://mvnrepository.com/artifact/org.jsoup/jsoup -->
<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.13.1</version>
</dependency>

Note: go to https://mvnrepository.com/artifact/org.jsoup/jsoup and download latest version. 

Native Advertising
­čÜÇ
Get your tech brand or product in front of software developers.
For more information Contact us
Dirask - friendly IT community for everyone.

ÔŁĄ´ŞĆ­čĺ╗ ­čÖé

Join