From: Darrick J. Wong <djwong@xxxxxxxxxx> I missed a few non-rendering code points in the "zero width" classification code. Add them now, and sort the list. Finding them is an annoyingly manual process because there are various code points that are not supposed to affect the rendering of a string of text but are not explicitly named as such. There are other code points that, when surrounded by code points from the same chart, actually /do/ affect the rendering. IOWs, the only way to figure this out is to grep the likely code points and then go figure out how each of them render by reading the Unicode spec or trying it. $ wget https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt $ grep -E '(separator|zero width|invisible|joiner|application)' -i UnicodeData.txt Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> --- scrub/unicrash.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/scrub/unicrash.c b/scrub/unicrash.c index 96e20114c..edc32d55c 100644 --- a/scrub/unicrash.c +++ b/scrub/unicrash.c @@ -351,15 +351,19 @@ name_entry_examine( while ((uchr = uiter_next32(&uiter)) != U_SENTINEL) { /* zero width character sequences */ switch (uchr) { + case 0x034F: /* combining grapheme joiner */ case 0x200B: /* zero width space */ case 0x200C: /* zero width non-joiner */ case 0x200D: /* zero width joiner */ - case 0xFEFF: /* zero width non breaking space */ + case 0x2028: /* line separator */ + case 0x2029: /* paragraph separator */ case 0x2060: /* word joiner */ case 0x2061: /* function application */ case 0x2062: /* invisible times (multiply) */ case 0x2063: /* invisible separator (comma) */ case 0x2064: /* invisible plus (addition) */ + case 0x2D7F: /* tifinagh consonant joiner */ + case 0xFEFF: /* zero width non breaking space */ *badflags |= UNICRASH_ZERO_WIDTH; break; }