[Edit]

Node.js - supported text encoding

1 contributors

2 contributions

0 discussions

9 points

Created by:

Laylah-Walsh

654

In this short article, we would like to show what text encoding types are supported by Node.js.

Typical encodings

Edit

This type of encoding should be used to encode text as bytes and back.

Name	Description
`utf8`, `utf-8`	Multi-byte encoded Unicode characters. Many web pages and other document formats use UTF-8. It is recommended to use this encoding.
`utf16le`, `utf-16le`	Multi-byte encoded Unicode characters. Unlike `utf8`, each character in the string will be encoded using either 2 or 4 bytes.
`latin1`	Latin-1 stands for ISO-8859-1. This character encoding only supports the Unicode characters from `U+0000` to `U+00FF`. Each character is encoded using a single byte. Characters that do not fit into that range are truncated and will be mapped to characters in that range.

Source: Buffers and character encodings - node.js Docs

Example:

xxxxxxxxxx
 
const buffer1 = Buffer.from('Text...', 'utf-8');     // [84, 101, 120, 116, 46, 46, 46]
const buffer2 = Buffer.from('Text...', 'utf-16le');  // [84, 0, 101, 0, 120, 0, 116, 0, 46, 0, 46, 0, 46, 0]
const buffer3 = Buffer.from('ÄäÖö...', 'latin1');    // [196, 228, 214, 246, 46, 46, 46]
​
const text1 = buffer1.toString('utf-8');     // Text...
const text2 = buffer2.toString('utf-16le');  // Text...
const text3 = buffer3.toString('latin1');    // ÄäÖö...

Binary-To-Text encodings

Edit

This type of encoding should be used to store and transfer binary data in a secure form or when only text format is allowed.

Name	Description
`base64`	Base64 encoding. When creating a `Buffer` from a string, this encoding will also correctly accept "URL and Filename Safe Alphabet" as specified in RFC 4648, Section 5. Whitespace characters such as spaces, tabs, and new lines contained within the base64-encoded string are ignored.
`base64url`	base64url encoding as specified in RFC 4648, Section 5. When creating a `Buffer` from a string, this encoding will also correctly accept regular base64-encoded strings. When encoding a `Buffer` to a string, this encoding will omit padding. Introduced in Node.js v14+.
`hex`	Encode each byte as two hexadecimal characters. Data truncation may occur when decoding strings that do not exclusively consist of an even number of hexadecimal characters. See below for an example.

Source: Buffers and character encodings - node.js Docs

Example:

xxxxxxxxxx
 
const buffer = Buffer.from('Text...', 'utf-8');  // [84, 101, 120, 116, 46, 46, 46]
​
const text1 = buffer.toString('base64');     // VGV4dC4uLg==
const text2 = buffer.toString('base64url');  // VGV4dC4uLg
const text3 = buffer.toString('hex');        // 546578742e2e2e

Legacy encoding

Edit

Note: do not use this encodings if not needed!

Name	Description
`ascii`	For 7-bit ASCII data only. When encoding a string into a `Buffer`, this is equivalent to using `latin1`. When decoding a `Buffer` into a string, using this encoding will additionally unset the highest bit of each byte before decoding as `latin1`. Generally, there should be no reason to use this encoding, as `utf8` (or, if the data is known to always be ASCII-only, `latin1`) will be a better choice when encoding or decoding ASCII-only text. It is only provided for legacy compatibility.
`binary`	Alias for `latin1`. See binary strings for more background on this topic. The name of this encoding can be very misleading, as all of the encodings listed here convert between strings and binary data. For converting between strings and `Buffers`, typically `utf8` is the right choice.
`ucs2`, `ucs-2`	Aliases of `utf16le`. UCS-2 used to refer to a variant of UTF-16 that did not support characters that had code points larger than `U+FFFF`. In Node.js, these code points are always supported.