Re: [PATCH] charsets.7: update to reflect past developments

"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> · Thu, 05 Jun 2014 06:54:20 +0200

On 06/04/2014 07:51 AM, Marko Myllynen wrote:
> Hi,
> 
> while working with locale related pages charsets(7) and charmap(5)
> were found to be pretty out of date and page for repertoiremap(5)
> missing altogether. While at least charsets(7) could still be
> improved now it doesn't look so outdated anymore.

Hi Marko,

Something is broken in this patch. It doesn't apply to HEAD.
Also, all of your instances of "\-" should be just "-". Could 
you fix please.

Cheers.

Michael

> 
>>From 9710bee5517d5869ed019875e6bbf7a6a488a000 Mon Sep 17 00:00:00 2001
> From: Marko Myllynen <myllynen@xxxxxxxxxx>
> Date: Mon, 2 Jun 2014 16:36:06 +0300
> Subject: [PATCH] charsets.7: update to reflect past developments
> 
> Rewrite the introduction to make Unicode's prominence more obvious.
> Reformulate parts of the text to reflect current Unicode world.
> Minor clarification for ASCII/ISO sections, some minor syntax fixes.
> ---
>  man7/charsets.7 |  335 +++++++++++++++++++++++++------------------------------
>  1 files changed, 152 insertions(+), 183 deletions(-)
> 
> diff --git a/man7/charsets.7 b/man7/charsets.7
> index de04d06..73b412c 100644
> --- a/man7/charsets.7
> +++ b/man7/charsets.7
> @@ -11,162 +11,142 @@
>  .\" This is combined from many sources, including notes by aeb and
>  .\" research by esr.  Portions derive from a writeup by Roman Czyborra.
>  .\"
> -.\" Last changed by David Starner <dstarner98@xxxxxxxxxxxxx>.
> +.\" Changes also by David Starner <dstarner98@xxxxxxxxxxxxx>.
>  .\"
> -.\" FIXME This page was written long ago, and various pieces are probably
> -.\"    no longer quite current. A reworking by someone knowledgeable
> -.\"    on charsets is needed. Among other things, the page needs to
> -.\"    give more prominence to Unicode. mtk, May 2014
> -.\"
> -.TH CHARSETS 7 2014-05-28 "Linux" "Linux Programmer's Manual"
> +.TH CHARSETS 7 2014-06-02 "Linux" "Linux Programmer's Manual"
>  .SH NAME
> -charsets \- programmer's view of character sets and internationalization
> +charsets \- character set standards and internationalization
>  .SH DESCRIPTION
> -Linux is an international operating system.
> -Various of its utilities
> -and device drivers (including the console driver) support multilingual
> -character sets including Latin-alphabet letters with diacritical
> -marks, accents, ligatures, and entire non-Latin alphabets including
> -Greek, Cyrillic, Arabic, and Hebrew.
> +This manual page gives an overview on different character set standards
> +and how they were used on Linux before Unicode became ubiquitous.
> +Some of this information is still helpful for people working with legacy
> +systems and documents.
> +.LP
> +Standards discussed include such as
> +ASCII, GB 2312, ISO 8859, JIS, KOI8\-R, KS, and Unicode.
>  .LP
> -This manual page presents a programmer's-eye view of different
> -character-set standards and how they fit together on Linux.
> -Standards
> -discussed include ASCII, ISO 8859, KOI8-R, Unicode, ISO 2022 and
> -ISO 4873.
> -The primary emphasis is on character sets actually used as
> -locale character sets, not the myriad others that can be found in data
> +The primary emphasis is on character sets that were actually used by
> +locale character sets, not the myriad others that could be found in data
>  from other systems.
>  .SS ASCII
>  ASCII (American Standard Code For Information Interchange) is the original
> -7-bit character set, originally designed for American English.
> -It is currently described by the ECMA-6 standard.
> +7\-bit character set, originally designed for American English.
> +Also known as US\-ASCII.
> +It is currently described by the ISO 646:1991 IRV
> +(International Reference Version) standard.
>  .LP
>  Various ASCII variants replacing the dollar sign with other currency
> -symbols and replacing punctuation with non-English alphabetic characters
> -to cover German, French, Spanish, and others in 7 bits exist.
> -All are
> -deprecated; glibc doesn't support locales whose character sets aren't
> -true supersets of ASCII.
> -(These sets are also known as ISO-646, a close
> -relative of ASCII that permitted replacing these characters.)
> +symbols and replacing punctuation with non\-English alphabetic
> +characters to cover German, French, Spanish, and others in 7 bits
> +emerged.
> +All are deprecated;
> +glibc does not support locales whose character sets are not true
> +supersets of ASCII.
>  .LP
> -As Linux was written for hardware designed in the US, it natively
> -supports ASCII.
> +As Unicode, when using UTF\-8, is ASCII\-compatible, plain ASCII text
> +still renders properly on modern UTF\-8 using systems.
>  .SS ISO 8859
> -ISO 8859 is a series of 15 8-bit character sets all of which have US
> -ASCII in their low (7-bit) half, invisible control characters in
> -positions 128 to 159, and 96 fixed-width graphics in positions 160-255.
> +ISO 8859 is a series of 15 8\-bit character sets all of which have ASCII
> +in their low (7\-bit) half, invisible control characters in positions
> +128 to 159, and 96 fixed-width graphics in positions 160\-255.
>  .LP
> -Of these, the most important is ISO 8859-1 (Latin-1).
> -It is natively
> -supported in the Linux console driver, fairly well supported in X11R6,
> -and is the base character set of HTML.
> +Of these, the most important is ISO 8859\-1
> +("Latin Alphabet No .1" / Latin\-1).
> +It was widely adopted and supported by different systems,
> +and is gradually being replaced with Unicode.
> +The ISO 8859-1 characters are also the first 256 characters of Unicode.
>  .LP
>  Console support for the other 8859 character sets is available under
> -Linux through user-mode utilities (such as
> +Linux through user\-mode utilities (such as
>  .BR setfont (8))
> -.\" // some distributions still have the deprecated consolechars
>  that modify keyboard bindings and the EGA graphics
>  table and employ the "user mapping" font table in the console
>  driver.
>  .LP
>  Here are brief descriptions of each set:
>  .TP
> -8859-1 (Latin-1)
> -Latin-1 covers most Western European languages such as Albanian, Catalan,
> -Danish, Dutch, English, Faroese, Finnish, French, German, Galician,
> -Irish, Icelandic, Italian, Norwegian, Portuguese, Spanish, and
> -Swedish.
> -The lack of the ligatures Dutch ij, French oe and old-style
> -,,German`` quotation marks is considered tolerable.
> +8859\-1 (Latin\-1)
> +Latin\-1 covers many West European languages such as Albanian, Basque,
> +Danish, English, Faroese, Galician, German, Icelandic, Irish, Italian,
> +Norwegian, Portuguese, Spanish, and Swedish.
> +The lack of the ligatures Dutch Ĳ/ĳ, French œ, and old-style „German“
> +quotation marks was considered tolerable.
>  .TP
> -8859-2 (Latin-2)
> -Latin-2 supports most Latin-written Slavic and Central European
> -languages: Croatian, Czech, German, Hungarian, Polish, Romanian,
> +8859\-2 (Latin\-2)
> +Latin\-2 supports many Latin\-written Central and East European
> +languages such as Bosnian, Croatian, Czech, German, Hungarian, Polish,
>  Slovak, and Slovene.
> +Replacing Romanian ș/ț with ş/ţ was considered tolerable.
>  .TP
> -8859-3 (Latin-3)
> -Latin-3 is popular with authors of Esperanto, Galician, and Maltese.
> -(Turkish is now written with 8859-9 instead.)
> +8859\-3 (Latin\-3)
> +Latin\-3 was designed to cover of Esperanto, Maltese, and Turkish but
> +8859\-9 later superseded it for Turkish.
>  .TP
> -8859-4 (Latin-4)
> -Latin-4 introduced letters for Estonian, Latvian, and Lithuanian.
> -It is essentially obsolete; see 8859-10 (Latin-6) and 8859-13 (Latin-7).
> +8859\-4 (Latin\-4)
> +Latin\-4 introduced letters for North European languages such as
> +Estonian, Latvian, Lithuanian but was superseded by 8859\-10 and
> +8859\-13.
>  .TP
> -8859-5
> +8859\-5
>  Cyrillic letters supporting Bulgarian, Byelorussian, Macedonian,
> -Russian, Serbian, and Ukrainian.
> -Ukrainians read the letter "ghe"
> -with downstroke as "heh" and would need a ghe with upstroke to write a
> -correct ghe.
> -See the discussion of KOI8-R below.
> +Russian, Serbian, and (almost completely) Ukrainian.
> +It was never widely used, see the discussion of KOI8\-R/KOI8\-U below.
>  .TP
> -8859-6
> -Supports Arabic.
> -The 8859-6 glyph table is a fixed font of separate
> +8859\-6
> +Was created for Arabic.
> +The 8859\-6 glyph table is a fixed font of separate
>  letter forms, but a proper display engine should combine these
>  using the proper initial, medial, and final forms.
>  .TP
> -8859-7
> -Supports Modern Greek.
> +8859\-7
> +Was created for modern Greek in 1987, updated in 2003.
>  .TP
> -8859-8
> +8859\-8
>  Supports modern Hebrew without niqud (punctuation signs).
> -Niqud and full-fledged Biblical Hebrew are outside the scope of this
> -character set; under Linux, UTF-8 is the preferred encoding for
> -these.
> +Niqud and full\-fledged Biblical Hebrew were outside the scope of this
> +character set.
>  .TP
> -8859-9 (Latin-5)
> -This is a variant of Latin-1 that replaces Icelandic letters with
> +8859\-9 (Latin\-5)
> +This is a variant of Latin\-1 that replaces Icelandic letters with
>  Turkish ones.
>  .TP
> -8859-10 (Latin-6)
> -Latin 6 adds the last Inuit (Greenlandic) and Sami (Lappish) letters
> -that were missing in Latin 4 to cover the entire Nordic area.
> -RFC 1345 listed a preliminary and different "latin6".
> -Skolt Sami still
> -needs a few more accents than these.
> +8859\-10 (Latin\-6)
> +Latin\-6 added Inuit (Greenlandic) and Sami (Lappish) letters that were
> +missing in Latin\-4 to cover the entire Nordic area.
>  .TP
> -8859-11
> -This exists only as a rejected draft standard.
> -The draft standard
> -was identical to TIS-620, which is used under Linux for Thai.
> +8859\-11
> +Supports the Thai alphabet and is nearly identical to the TIS\-620
> +standard.
>  .TP
> -8859-12
> +8859\-12
>  This set does not exist.
> -While Vietnamese has been suggested for this
> -space, it does not fit within the 96 (noncombining) characters ISO
> -8859 offers.
> -UTF-8 is the preferred character set for Vietnamese use
> -under Linux.
>  .TP
> -8859-13 (Latin-7)
> +8859\-13 (Latin\-7)
>  Supports the Baltic Rim languages; in particular, it includes Latvian
> -characters not found in Latin-4.
> +characters not found in Latin\-4.
>  .TP
> -8859-14 (Latin-8)
> -This is the Celtic character set, covering Gaelic and Welsh.
> -This charset also contains the dotted characters needed for Old Irish.
> +8859\-14 (Latin\-8)
> +This is the Celtic character set, covering Old Irish, Manx, Gaelic,
> +Welsh, Cornish, and Breton.
>  .TP
> -8859-15 (Latin-9)
> -This adds the Euro sign and French and Finnish letters that were missing in
> -Latin-1.
> +8859\-15 (Latin\-9)
> +Latin\-9 is similar to widely used Latin\-1 but replaces some less
> +common symbols with the Euro sign and French and Finnish letters that
> +were missing in Latin\-1.
>  .TP
> -8859-16 (Latin-10)
> -This set covers many of the languages covered by 8859-2, and supports
> -Romanian more completely than that set does.
> -.SS KOI8-R
> -KOI8-R is a non-ISO character set popular in Russia.
> -The lower half
> -is US ASCII; the upper is a Cyrillic character set somewhat better
> -designed than ISO 8859-5.
> -KOI8-U is a common character set, based off
> -KOI8-R, that has better support for Ukrainian.
> -Neither of these sets
> -are ISO-2022 compatible, unlike the ISO-8859 series.
> +8859\-16 (Latin\-10)
> +This set covers many Southeast European languages, and most
> +importantly supports Romanian more completely than Latin\-2.
> +.SS KOI8\-R / KOI8\-U
> +KOI8\-R is a non\-ISO character set popular in Russia before Unicode.
> +The lower half is ASCII;
> +the upper is a Cyrillic character set somewhat better designed than
> +ISO 8859\-5.
> +KOI8\-U, based off KOI8\-R, has better support for Ukrainian.
> +Neither of these sets are ISO\-2022 compatible,
> +unlike the ISO\-8859 series.
>  .LP
> -Console support for KOI8-R is available under Linux through user-mode
> +Console support for KOI8\-R is available under Linux through user\-mode
>  utilities that modify keyboard bindings and the EGA graphics table,
>  and employ the "user mapping" font table in the console driver.
>  .\" Thanks to Tomohiro KUBOTA for the following sections about
> @@ -175,69 +155,63 @@ and employ the "user mapping" font table in the console driver.
>  JIS X 0208 is a Japanese national standard character set.
>  Though there are some more Japanese national standard character sets (like
>  JIS X 0201, JIS X 0212, and JIS X 0213), this is the most important one.
> -Characters are mapped into a 94x94 two-byte matrix,
> -whose each byte is in the range 0x21-0x7e.
> +Characters are mapped into a 94x94 two\-byte matrix,
> +whose each byte is in the range 0x21\-0x7e.
>  Note that JIS X 0208 is a character set, not an encoding.
>  This means that JIS X 0208
>  itself is not used for expressing text data.
>  JIS X 0208 is used
> -as a component to construct encodings such as EUC-JP, Shift_JIS,
> -and ISO-2022-JP.
> -EUC-JP is the most important encoding for Linux
> -and includes US ASCII and JIS X 0208.
> -In EUC-JP, JIS X 0208
> +as a component to construct encodings such as EUC\-JP, Shift_JIS,
> +and ISO\-2022\-JP.
> +EUC\-JP is the most important encoding for Linux
> +and includes ASCII and JIS X 0208.
> +In EUC\-JP, JIS X 0208
>  characters are expressed in two bytes, each of which is the
>  JIS X 0208 code plus 0x80.
>  .SS KS X 1001
>  KS X 1001 is a Korean national standard character set.
>  Just as
> -JIS X 0208, characters are mapped into a 94x94 two-byte matrix.
> +JIS X 0208, characters are mapped into a 94x94 two\-byte matrix.
>  KS X 1001 is used like JIS X 0208, as a component
> -to construct encodings such as EUC-KR, Johab, and ISO-2022-KR.
> -EUC-KR is the most important encoding for Linux and includes
> -US ASCII and KS X 1001.
> +to construct encodings such as EUC\-KR, Johab, and ISO\-2022\-KR.
> +EUC\-KR is the most important encoding for Linux and includes
> +ASCII and KS X 1001.
>  KS C 5601 is an older name for KS X 1001.
>  .SS GB 2312
>  GB 2312 is a mainland Chinese national standard character set used
>  to express simplified Chinese.
>  Just like JIS X 0208, characters are
> -mapped into a 94x94 two-byte matrix used to construct EUC-CN.
> -EUC-CN
> -is the most important encoding for Linux and includes US ASCII and
> +mapped into a 94x94 two\-byte matrix used to construct EUC\-CN.
> +EUC\-CN
> +is the most important encoding for Linux and includes ASCII and
>  GB 2312.
> -Note that EUC-CN is often called as GB, GB 2312, or CN-GB.
> +Note that EUC\-CN is often called as GB, GB 2312, or CN\-GB.
>  .SS Big5
> -Big5 is a popular character set in Taiwan to express traditional
> +Big5 was a popular character set in Taiwan to express traditional
>  Chinese.
>  (Big5 is both a character set and an encoding.)
> -It is a superset of US ASCII.
> -Non-ASCII characters are expressed in two bytes.
> -Bytes 0xa1-0xfe are used as leading bytes for two-byte characters.
> -Big5 and its extension is widely used in Taiwan and Hong Kong.
> -It is not ISO 2022-compliant.
> -.SS TIS 620
> -TIS 620 is a Thai national standard character set and a superset
> -of US ASCII.
> -Like ISO 8859 series, Thai characters are mapped into
> -0xa1-0xfe.
> -TIS 620 is the only commonly used character set under
> -Linux besides UTF-8 to have combining characters.
> -.SS UNICODE
> -Unicode (ISO 10646) is a standard which aims to unambiguously represent every
> -character in every human language.
> +It is a superset of ASCII.
> +Non\-ASCII characters are expressed in two bytes.
> +Bytes 0xa1\-0xfe are used as leading bytes for two\-byte characters.
> +Big5 and its extension were widely used in Taiwan and Hong Kong.
> +It is not ISO 2022 compliant.
> +.SS TIS\-620
> +TIS\-620 is a Thai national standard character set and a superset
> +of ASCII.
> +Like in the ISO 8859 series, Thai characters are mapped into
> +0xa1\-0xfe.
> +.SS Unicode
> +Unicode (ISO 10646) is a standard which aims to unambiguously represent
> +every character in every human language.
>  Unicode's structure permits 20.1 bits to encode every character.
> -Since most computers don't include 20.1-bit
> -integers, Unicode is usually encoded as 32-bit integers internally and
> -either a series of 16-bit integers (UTF-16) (needing two 16-bit integers
> -only when encoding certain rare characters) or a series of 8-bit bytes
> -(UTF-8).
> -Information on Unicode is available at
> -.UR http://www.unicode.org
> -.UE .
> +Since most computers don't include 20.1\-bit integers, Unicode is
> +usually encoded as 32\-bit integers internally and either a series of
> +16\-bit integers (UTF\-16) (needing two 16\-bit integers only when
> +encoding certain rare characters) or a series of 8-bit bytes (UTF\-8).
>  .LP
> -Linux represents Unicode using the 8-bit Unicode Transformation Format
> -(UTF-8).
> -UTF-8 is a variable length encoding of Unicode.
> +Linux represents Unicode using the 8\-bit Unicode Transformation Format
> +(UTF\-8).
> +UTF\-8 is a variable length encoding of Unicode.
>  It uses 1
>  byte to code 7 bits, 2 bytes for 11 bits, 3 bytes for 16 bits, 4 bytes
>  for 21 bits, 5 bytes for 26 bits, 6 bytes for 31 bits.
> @@ -246,41 +220,41 @@ Let 0,1,x stand for a zero, one, or arbitrary bit.
>  A byte 0xxxxxxx
>  stands for the Unicode 00000000 0xxxxxxx which codes the same symbol
>  as the ASCII 0xxxxxxx.
> -Thus, ASCII goes unchanged into UTF-8, and
> +Thus, ASCII goes unchanged into UTF\-8, and
>  people using only ASCII do not notice any change: not in code, and not
>  in file size.
>  .LP
> -A byte 110xxxxx is the start of a 2-byte code, and 110xxxxx 10yyyyyy
> +A byte 110xxxxx is the start of a 2\-byte code, and 110xxxxx 10yyyyyy
>  is assembled into 00000xxx xxyyyyyy.
>  A byte 1110xxxx is the start
> -of a 3-byte code, and 1110xxxx 10yyyyyy 10zzzzzz is assembled
> +of a 3\-byte code, and 1110xxxx 10yyyyyy 10zzzzzz is assembled
>  into xxxxyyyy yyzzzzzz.
> -(When UTF-8 is used to code the 31-bit ISO 10646
> -then this progression continues up to 6-byte codes.)
> +(When UTF\-8 is used to code the 31\-bit ISO 10646
> +then this progression continues up to 6\-byte codes.)
>  .LP
> -For most people who use ISO-8859 character sets, this means that the
> +For most texts in ISO\-8859 character sets, this means that the
>  characters outside of ASCII are now coded with two bytes.
>  This tends
>  to expand ordinary text files by only one or two percent.
>  For Russian
> -or Greek users, this expands ordinary text files by 100%, since text in
> +or Greek texts, this expands ordinary text files by 100%, since text in
>  those languages is mostly outside of ASCII.
>  For Japanese users this means
> -that the 16-bit codes now in common use will take three bytes.
> -While there
> -are algorithmic conversions from some character sets (especially ISO-8859-1) to
> -Unicode, general conversion requires carrying around conversion tables,
> -which can be quite large for 16-bit codes.
> +that the 16\-bit codes now in common use will take three bytes.
> +While there are algorithmic conversions from some character sets
> +(especially ISO 8859\-1) to Unicode, general conversion requires
> +carrying around conversion tables, which can be quite large for 16\-bit
> +codes.
>  .LP
> -Note that UTF-8 is self-synchronizing: 10xxxxxx is a tail, any other
> +Note that UTF\-8 is self\-synchronizing: 10xxxxxx is a tail, any other
>  byte is the head of a code.
>  Note that the only way ASCII bytes occur
> -in a UTF-8 stream, is as themselves.
> +in a UTF\-8 stream, is as themselves.
>  In particular, there are no
>  embedded NULs (\(aq\\0\(aq) or \(aq/\(aqs that form part of some larger code.
>  .LP
>  Since ASCII, and, in particular, NUL and \(aq/\(aq, are unchanged, the
> -kernel does not notice that UTF-8 is being used.
> +kernel does not notice that UTF\-8 is being used.
>  It does not care at
>  all what the bytes it is handling stand for.
>  .LP
> @@ -288,32 +262,28 @@ Rendering of Unicode data streams is typically handled through
>  "subfont" tables which map a subset of Unicode to glyphs.
>  Internally
>  the kernel uses Unicode to describe the subfont loaded in video RAM.
> -This means that in UTF-8 mode one can use a character set with 512
> -different symbols.
> +This means that the Linux console in UTF\-8 mode one can use a character
> +set with 512 different symbols.
>  This is not enough for Japanese, Chinese and
>  Korean, but it is enough for most other purposes.
>  .LP
> -At the current time, the console driver does not handle combining
> -characters.
> -So Thai, Sioux and any other script needing combining
> -characters can't be handled on the console.
>  .SS ISO 2022 and ISO 4873
> -The ISO 2022 and 4873 standards describe a font-control model
> +The ISO 2022 and 4873 standards describe a font\-control model
>  based on VT100 practice.
>  This model is (partially) supported
>  by the Linux kernel and by
>  .BR xterm (1).
> -It is popular in Japan and Korea.
> +It used to be popular in Japan and Korea.
>  .LP
>  There are 4 graphic character sets, called G0, G1, G2, and G3,
>  and one of them is the current character set for codes with
>  high bit zero (initially G0), and one of them is the current
>  character set for codes with high bit one (initially G1).
>  Each graphic character set has 94 or 96 characters, and is
> -essentially a 7-bit character set.
> +essentially a 7\-bit character set.
>  It uses codes either
> -040-0177 (041-0176) or 0240-0377 (0241-0376).
> -G0 always has size 94 and uses codes 041-0176.
> +040\-0177 (041\-0176) or 0240\-0377 (0241\-0376).
> +G0 always has size 94 and uses codes 041\-0176.
>  .LP
>  Switching between character sets is done using the shift functions
>  \fB^N\fP (SO or LS1), \fB^O\fP (SI or LS0), ESC n (LS2), ESC o (LS3),
> @@ -326,7 +296,7 @@ The function SS\fIn\fP makes character set G\fIn\fP (\fIn\fP=2 or 3)
>  the current one for the next character only (regardless of the value
>  of its high order bit).
>  .LP
> -A 94-character set is designated as G\fIn\fP character set
> +A 94\-character set is designated as G\fIn\fP character set
>  by an escape sequence ESC ( xx (for G0), ESC ) xx (for G1),
>  ESC * xx (for G2), ESC + xx (for G3), where xx is a symbol
>  or a pair of symbols found in the ISO 2375 International
> @@ -338,7 +308,7 @@ instead of currency sign), ESC ( M selects a character set
>  for African languages, ESC ( ! A selects the Cuban character
>  set, and so on.
>  .LP
> -A 96-character set is designated as G\fIn\fP character set
> +A 96\-character set is designated as G\fIn\fP character set
>  by an escape sequence ESC \- xx (for G1), ESC . xx (for G2)
>  or ESC / xx (for G3).
>  For example, ESC \- G selects the Hebrew alphabet as G1.
> @@ -357,9 +327,8 @@ In particular, \fB^N\fP and \fB^O\fP are not used anymore, ESC ( xx
>  can be used only with xx=B, and ESC ) xx, ESC * xx, ESC + xx
>  are equivalent to ESC \- xx, ESC . xx, ESC / xx, respectively.
>  .SH SEE ALSO
> +.BR iconv (1),
>  .BR console (4),
> -.BR console_codes (4),
> -.BR console_ioctl (4),
>  .BR ascii (7),
>  .BR iso_8859-1 (7),
>  .BR unicode (7),
> --
> 1.7.1
> 
> Thanks,
> 

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html