Without further ado, the following was found: Issue: ISO → ISO/IEC "ASCII (American Standard Code For Information Interchange) is the original 7-" "bit character set, originally designed for American English. Also known as " "US-ASCII. It is currently described by the ISO 646:1991 IRV (International " "Reference Version) standard." "The ISO 2022 and 4873 standards describe a font-control model based on VT100 " "practice. This model is (partially) supported by the Linux kernel and by " "B<xterm>(1). Several ISO 2022-based character encodings have been defined, " "especially for Japanese." "A 94-character set is designated as GI<n> character set by an escape " "sequence ESC ( xx (for G0), ESC ) xx (for G1), ESC * xx (for G2), ESC + xx " "(for G3), where xx is a symbol or a pair of symbols found in the ISO 2375 " "International Register of Coded Character Sets. For example, ESC ( @ " "selects the ISO 646 character set as G0, ESC ( A selects the UK standard " "character set (with pound instead of number sign), ESC ( B selects ASCII " "(with dollar instead of currency sign), ESC ( M selects a character set for " "African languages, ESC ( ! A selects the Cuban character set, and so on." "ISO 4873 stipulates a narrower use of character sets, where G0 is fixed " "(always ASCII), so that G1, G2, and G3 can be invoked only for codes with " "the high order bit set. In particular, B<\\(haN> and B<\\(haO> are not used " "anymore, ESC ( xx can be used only with xx=B, and ESC ) xx, ESC * xx, ESC + " "xx are equivalent to ESC - xx, ESC . xx, ESC / xx, respectively." "Unicode (ISO 10646) is a standard which aims to unambiguously represent " "every character in every human language. Unicode's structure permits 20.1 " "bits to encode every character. Since most computers don't include 20.1-bit " "integers, Unicode is usually encoded as 32-bit integers internally and " "either a series of 16-bit integers (UTF-16) (needing two 16-bit integers " "only when encoding certain rare characters) or a series of 8-bit bytes " "(UTF-8)." "A byte 110xxxxx is the start of a 2-byte code, and 110xxxxx 10yyyyyy is " "assembled into 00000xxx xxyyyyyy. A byte 1110xxxx is the start of a 3-byte " "code, and 1110xxxx 10yyyyyy 10zzzzzz is assembled into xxxxyyyy yyzzzzzz. " "(When UTF-8 is used to code the 31-bit ISO 10646 then this progression " "continues up to 6-byte codes.)" "For most texts in ISO 8859 character sets, this means that the characters " "outside of ASCII are now coded with two bytes. This tends to expand " "ordinary text files by only one or two percent. For Russian or Greek texts, " "this expands ordinary text files by 100%, since text in those languages is " "mostly outside of ASCII. For Japanese users this means that the 16-bit " "codes now in common use will take three bytes. While there are algorithmic " "conversions from some character sets (especially ISO 8859-1) to Unicode, " "general conversion requires carrying around conversion tables, which can be " "quite large for 16-bit codes."