On 05/26/2015 08:53 AM, Shawn Landden wrote: > The endianness is suggested by the order the bytes are displayed, but the > text is ambiguous. Thanks, Shawn. Applied. Cheers, Michael > --- > man7/utf-8.7 | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/man7/utf-8.7 b/man7/utf-8.7 > index 597fad4..bbb016c 100644 > --- a/man7/utf-8.7 > +++ b/man7/utf-8.7 > @@ -133,12 +133,14 @@ The sequence to be used depends on the UCS code number of the character: > The > .I xxx > bit positions are filled with the bits of the character code number in > -binary representation. > +binary representation, most significant bit first (big-endian). > Only the shortest possible multibyte sequence > which can represent the code number of the character can be used. > .PP > The UCS code values 0xd800\(en0xdfff (UTF-16 surrogates) as well as 0xfffe and > -0xffff (UCS noncharacters) should not appear in conforming UTF-8 streams. > +0xffff (UCS noncharacters) should not appear in conforming UTF-8 streams. According > +to RFC 3629 no point above U+10FFFF should be used, which limits characters to four > +bytes. > .SS Example > The Unicode character 0xa9 = 1010 1001 (the copyright sign) is encoded > in UTF-8 as > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html