Re: [PATCH v2] fetch: align new ref summary printout in UTF-8 locales

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 06 Sep 2012 11:08:12 -0700

Torsten Bögershausen <tboegi@xxxxxx> writes:

> I try to re-phrase my question:
>
> Do installations still exist which use e.g. BIG5 or any other
> multi byte encoding which is not UTF-8?

Yes.

> Do we want to support other encodings than ASCII or UTF-8?
> (Because then the screen width needs to be calculate different, I think)

That depends on who "we" are and what timeframe you have in mind.

Do our developers care about these encodings so much that we would
reject "ASCCI/UTF-8 only" patch and wait until we enhance it to do
the right thing for other encodings that we do not use ourselves?
No.  That does not make any sense, especially when we know we will
not have a good test coverage on the additional parts that we will
not be using ourselves.

"This change only helps people with ASCII or UTF-8 and does not help
others" alone is never a valid reason to reject a change, but we
still try to be nicer to "others" that may come after we leave this
topic behind by doing a few things:

 - If the change will make things worse than it currently is for
   "others", we would try to minimize the regression for them.

 - If the change will make the code harder to update later to
   enhance with additional change to support "others", we would try
   to anticipate what kind of changes are needed on top, and
   structure the code in such a way that future changes can be made
   cleanly.

For the first point, for multi-byte encodings (e.g. ISO-2022), the
display columns and byte length do not match and in general byte
length is longer than the display columns in the current code.  With
the current code that measures the required columns across elements
by taking the maximum of byte length, they will see wrong number of
filler, so they are already getting a wrong alignment.  With the
"UTF-8 only" change, the required columns and the filler will be
calculated by assuming that the string is in UTF-8, which may make
the computation off in a different way, and if we underestimate the
display columns for a string, they may see the strings truncated,
which is bad.

So as long as gettext_width() punts and returns strlen() for a
malformed UTF-8, it would be OK [*1*].

For the second point, I think the API "here is a string, give me the
number of display columns it will occupy, as I am interested in
aligning them" is a good abstraction that can be later enhanced to
other encodings fairly easily, so I do not see a problem in the
patch that goes in that direction.

[Footnote]

*1* If the patch used utf_strwidth() (which I didn't bother to go
back and check, though), it should be OK.  The underlying
utf8_width() will reject a malformed UTF-8 sequence and the code
falls back to strlen().
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html