On 16/04/2014 22:58, Torsten Bögershausen wrote:
Excellent, thanks for the pointers.
Running the script below shows that
"0X00AD SOFT HYPHEN" should have zero length (and some others too).
I wonder if that is really the case, and which one of the last 2 lines
in the script is the right one.
What does this mean for us:
"Cf Format a format control character"
Maybe dig back through the Git logs to check the original logic, but the
comments suggest that "Cf" characters have been viewed as zero-width.
That makes sense - they're usually markers indicating things like
bidirectional text flow, so won't be taking space. (Although they may be
causing even more extreme layout effects...)
Soft-hyphen is noted as an explicit exception to the rule in the utf8.c
comments. As of Unicode 4.0, it's supposed to be a character indicating
a point where a hyphen could be placed if a line-wrap occurs, and if
that wrap happens, then it can actually take up 1 space, otherwise not.
So its width could be either 0 or 1, depending. Or, quite likely, the
terminal doesn't treat it specially, and it always just looks like a
hyphen... Thus we err on the safe side and give it width 1.
See http://en.wikipedia.org/wiki/Soft_hyphen for background.
The comments suggest adding "-00AD +1160-11FF" to the uniset command
line for that tweak and for composing Hangul. (The +200B tweak isn't
necessary any more - Zero-Width Space U+200B became Cf officially in
Unicode 4.0.1:
http://en.wikipedia.org/wiki/Zero-width_space
http://www.unicode.org/review/resolved-pri.html#pri21
)
All of this is only really an approximation - a best-effort attempt to
figure out the width of a string without any actual communication with
the display device. So it'll never be perfect. The choice between double
and single width in particular will often be unpredictable, unless you
had deeper locale knowledge.
Actually, while doing this, I've realised that this was originally
Markus Kuhn's implementation, and that is acknowledged at the top of the
file:
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
Good, because he knows what he's doing.
Kevin
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html