[PATCH] utf-8: include RFC 3629 and clarify endianness which is left ambiguous

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The endianness is suggested by the order the bytes are displayed, but the
text is ambiguous.
---
 man7/utf-8.7 | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/man7/utf-8.7 b/man7/utf-8.7
index 597fad4..bbb016c 100644
--- a/man7/utf-8.7
+++ b/man7/utf-8.7
@@ -133,12 +133,14 @@ The sequence to be used depends on the UCS code number of the character:
 The
 .I xxx
 bit positions are filled with the bits of the character code number in
-binary representation.
+binary representation, most significant bit first (big-endian).
 Only the shortest possible multibyte sequence
 which can represent the code number of the character can be used.
 .PP
 The UCS code values 0xd800\(en0xdfff (UTF-16 surrogates) as well as 0xfffe and
-0xffff (UCS noncharacters) should not appear in conforming UTF-8 streams.
+0xffff (UCS noncharacters) should not appear in conforming UTF-8 streams. According
+to RFC 3629 no point above U+10FFFF should be used, which limits characters to four
+bytes.
 .SS Example
 The Unicode character 0xa9 = 1010 1001 (the copyright sign) is encoded
 in UTF-8 as
-- 
2.2.1.209.g41e5f3a

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux