On 12. 12. 23, 8:40, Roman Zilka wrote:
vc_translate_unicode(), vc_sanitize_unicode(): 1. Limit codepoint space to 0x10FFFF. The old algorithm followed an ancient version of Unicode. 2. Corrected vc_translate_unicode() doc (@rescan). 3. "Noncharacters", such as U+FFFE, U+FFFF, are no longer invalid in Unicode - - accept them. Another option was to complete the set of noncharacters (used to be those two, now there's more) and preserve the substitution. This is indeed what Unicode suggests (v15.1, chap. 23.7) (not requires), but most codepoints are !iswprint(), so substituting just the noncharacters seemed futile. Also, I've never seen noncharacters treated in a special way. 4. Moved what remained of vc_sanitize_unicode() into vc_translate_unicode().
Whatever the patch contains (a _packed_ attachment really?), you should spell "Why" part in here.
thanks, -- js