Torsten Bögershausen:
Some of the code points which have "0 length on the display" are called
"combining", others are called "vowels" or "accents".
E.g. 5BF is not marked any of them, but if you look at the glyph, it should
be combining (please correct me if that is wrong).
All combining characters has a non-zero combining class in
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt (fourth field,
called Canonical_Combining_Class in
http://www.unicode.org/reports/tr44/ ). For instance, the aforementioned
U+05BF is defined as follows:
05BF;HEBREW POINT RAFE;Mn;23;NSM;;;;;N;;;;;
The combining class is 23, so this is a combining character.
There is a difference between non-spacing combining marks ("Mn" in the
third column (General_Category)) and others ("Mc" for spacing marks
and "Me" for enclosing marks), so they might need specifial handling.
Additionally, you have the "zero-width" characters, such as U+200B
Zero Width Space. These have the "Cf" class, although it also contains
visible characters IIRC.
--
\\// Peter - http://www.softwolves.pp.se/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html