Re: Odd encoding issue with UTF-8 + gettext yields ? on non-ASCII

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Mon, 30 Aug 2010 14:13:21 +0000

On Mon, Aug 30, 2010 at 14:00, Marcin Cieslak <saper@xxxxxxxxxx> wrote:
> On Mon, 30 Aug 2010, Ævar Arnfjörð Bjarmason wrote:
>> On Sun, Aug 29, 2010 at 20:45, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:
>> A would be preferred for correctness, and with a fallback BSD printf()
>> we can avoid the GNU libc bug, however that'll mean using LC_CTYPE,
>> which'll have some small side-effects for the rest of the code.
>
> The real problem is that you are probably using same functions
> (locale-enable) for the user-facing side as well as for the backend (talking
> to repository). Some projects decided to use
> some special encoding internally (like UCS-2 in case of Java
> and Python 2.x, Unicode ordinals in Python 3.x). Otherwise
> you may end up in some incompatibilities in the on-disk on on-network
> format. I don't think you want to keep telling all bug reporters for few
> years - "Can you try that again with env LANG=C,
> please?" :)

Yeah, those programs can probably get away with it too because they
either implement their own string functions, or don't use setlocale()
at all for their localizations.

> Bringing Unicode onboard means that simple strlen() is no longer
> what you normally think it does.

I'm pretty sure strlen() always gives you the number of
null-terminated bytes regardless of locale settings. wcslen is the
wide-characted equivalent.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html