Node.js - supported text encoding
In this short article, we would like to show what text encoding types are supported by Node.js.
Typical encodings
This type of encoding should be used to encode text as bytes and back.
Name | Description |
utf8 , utf-8 |
Multi-byte encoded Unicode characters. Many web pages and other document formats use UTF-8. It is recommended to use this encoding. |
utf16le , utf-16le | Multi-byte encoded Unicode characters. Unlike utf8 , each character in the string will be encoded using either 2 or 4 bytes. |
latin1 | Latin-1 stands for ISO-8859-1. This character encoding only supports the Unicode characters from U+0000 to U+00FF . Each character is encoded using a single byte. Characters that do not fit into that range are truncated and will be mapped to characters in that range. |
Source: Buffers and character encodings - node.js Docs
Example:
const buffer1 = Buffer.from('Text...', 'utf-8'); // [84, 101, 120, 116, 46, 46, 46]
const buffer2 = Buffer.from('Text...', 'utf-16le'); // [84, 0, 101, 0, 120, 0, 116, 0, 46, 0, 46, 0, 46, 0]
const buffer3 = Buffer.from('ÄäÖö...', 'latin1'); // [196, 228, 214, 246, 46, 46, 46]
const text1 = buffer1.toString('utf-8'); // Text...
const text2 = buffer2.toString('utf-16le'); // Text...
const text3 = buffer3.toString('latin1'); // ÄäÖö...
Binary-To-Text encodings
This type of encoding should be used to store and transfer binary data in a secure form or when only text format is allowed.
Name | Description |
base64 | Base64 encoding. When creating a Buffer from a string, this encoding will also correctly accept "URL and Filename Safe Alphabet" as specified in RFC 4648, Section 5. Whitespace characters such as spaces, tabs, and new lines contained within the base64-encoded string are ignored. |
base64url |
base64url encoding as specified in RFC 4648, Section 5. When creating a Introduced in Node.js v14+. |
hex | Encode each byte as two hexadecimal characters. Data truncation may occur when decoding strings that do not exclusively consist of an even number of hexadecimal characters. See below for an example. |
Source: Buffers and character encodings - node.js Docs
Example:
const buffer = Buffer.from('Text...', 'utf-8'); // [84, 101, 120, 116, 46, 46, 46]
const text1 = buffer.toString('base64'); // VGV4dC4uLg==
const text2 = buffer.toString('base64url'); // VGV4dC4uLg
const text3 = buffer.toString('hex'); // 546578742e2e2e
Legacy encoding
Note: do not use this encodings if not needed!
Name | Description |
ascii | For 7-bit ASCII data only. When encoding a string into a Buffer , this is equivalent to using latin1 . When decoding a Buffer into a string, using this encoding will additionally unset the highest bit of each byte before decoding as latin1 . Generally, there should be no reason to use this encoding, as utf8 (or, if the data is known to always be ASCII-only, latin1 ) will be a better choice when encoding or decoding ASCII-only text. It is only provided for legacy compatibility. |
binary | Alias for latin1 . See binary strings for more background on this topic. The name of this encoding can be very misleading, as all of the encodings listed here convert between strings and binary data. For converting between strings and Buffers , typically utf8 is the right choice. |
ucs2 , ucs-2 | Aliases of utf16le . UCS-2 used to refer to a variant of UTF-16 that did not support characters that had code points larger than U+FFFF . In Node.js, these code points are always supported. |
Source: Buffers and character encodings - node.js Docs