Re: [PATCH] tty/vt: UTF-8 parsing update according to RFC 3629, modern Unicode

Jiri Slaby <jirislaby@xxxxxxxxxx> · Tue, 12 Dec 2023 10:20:18 +0100

On 12. 12. 23, 8:40, Roman Zilka wrote:
vc_translate_unicode(), vc_sanitize_unicode():
1. Limit codepoint space to 0x10FFFF. The old algorithm followed an ancient
    version of Unicode.
2. Corrected vc_translate_unicode() doc (@rescan).
3. "Noncharacters", such as U+FFFE, U+FFFF, are no longer invalid in Unicode -
    - accept them. Another option was to complete the set of noncharacters (used
    to be those two, now there's more) and preserve the substitution. This is
    indeed what Unicode suggests (v15.1, chap. 23.7) (not requires), but most
    codepoints are !iswprint(), so substituting just the noncharacters seemed
    futile. Also, I've never seen noncharacters treated in a special way.
4. Moved what remained of vc_sanitize_unicode() into vc_translate_unicode().

Whatever the patch contains (a _packed_ attachment really?), you should 
spell "Why" part in here.

thanks,
--
js