EN
Node.js - get text content from HTML with htmlparser2 library
6
points
In this short article, we would like to show how to get Node's text content from HTML using htmlparser2 library under Node.js.
Quick solution (example index.js
file):
const htmlparser2 = require('htmlparser2');
const getText = html => {
const handler = new htmlparser2.DomHandler();
const parser = new htmlparser2.Parser(handler);
parser.write(html);
parser.end();
return htmlparser2.DomUtils.textContent(handler.root.childNodes); // or from handler.dom
};
// Example usage:
const html = '<div><p>This is example text 1</p><br /><p>This is example text 2</p></div>';
const text = getText(html);
console.log(text);
Running with:
node ./index.js
Output:
This is example text 1
This is example text 2
htmlparser2
installation
Run the following command in the node.js project directory:
npm install --save htmlparser2