[PATCH] man7: improve example in utf-8(7) page

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The first example provided in the utf-8(7) page (encoding for char
"0xa9") is not a very good one to use.  The man page reports:

   The Unicode character 0xa9 = 1010 1001 (the copyright sign) is
   encoded in UTF-8 as

      11000010 10101001 = 0xc2 0xa9

This might have the reader believe that the UTF-8 encoding for any
ISO-8859-1 char is "0xc2 <char>".  This is actually a coincidence
that this is true, since "1010" is both starting the second UTF-8
byte and top nibble of "0xa9", according to the "Encoding" section
for symbols in 0x80-0x7ff range.

Instead, use another character for the example that does not have
"10" as the top bits of the first nibble.  Emphasize the encoded
bits in the examples to make it clear which bits are holding the
character and which are the UTF-8 format.

Signed-off-by: Andreas Dilger <adilger@xxxxxxxxx>
---
 man7/utf-8.7 | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/man7/utf-8.7 b/man7/utf-8.7
index 015d4b746..6e96c2f27 100644
--- a/man7/utf-8.7
+++ b/man7/utf-8.7
@@ -123,18 +123,29 @@ The UCS code values 0xd800\[en]0xdfff (UTF-16 surrogates) as well as 0xfffe and
 According to RFC 3629 no point above U+10FFFF should be used,
 which limits characters to four bytes.
 .SS Example
-The Unicode character 0xa9 = 1010 1001 (the copyright sign) is encoded
-in UTF-8 as
+The Unicode character 0x0d7 =
+.I 00 1101 0101
+(the multiplication sign) is encoded in UTF-8 with two bytes (high bits
+.IR 110 )
+as:
 .PP
 .RS
-11000010 10101001 = 0xc2 0xa9
+.RI 110 00011
+.RI 10 010101
+= 0xc3 0x97
 .RE
 .PP
-and character 0x2260 = 0010 0010 0110 0000 (the "not equal" symbol) is
-encoded as:
+and character 0x2260 =
+.I 0010 0010 0110 0000
+(the "not equal" symbol) is encoded in UTF-8 with three bytes (high bits
+.IR 1110 )
+as:
 .PP
 .RS
-11100010 10001001 10100000 = 0xe2 0x89 0xa0
+.RI 1110 0010
+.RI 10 001001
+.RI 10 100000
+= 0xe2 0x89 0xa0
 .RE
 .SS Application notes
 Users have to select a UTF-8 locale, for example with
--
2.31.1

Attachment: signature.asc
Description: Message signed with OpenPGP


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux