Re: Odd encoding issue with UTF-8 + gettext yields ? on non-ASCII

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 30 Aug 2010, Ævar Arnfjörð Bjarmason wrote:

On Sun, Aug 29, 2010 at 20:45, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:
A would be preferred for correctness, and with a fallback BSD printf()
we can avoid the GNU libc bug, however that'll mean using LC_CTYPE,
which'll have some small side-effects for the rest of the code.

The real problem is that you are probably using same functions
(locale-enable) for the user-facing side as well as for the backend (talking to repository). Some projects decided to use
some special encoding internally (like UCS-2 in case of Java
and Python 2.x, Unicode ordinals in Python 3.x). Otherwise
you may end up in some incompatibilities in the on-disk on on-network format. I don't think you want to keep telling all bug reporters for few years - "Can you try that again with env LANG=C,
please?" :)

Bringing Unicode onboard means that simple strlen() is no longer
what you normally think it does.

On Mon, 30 Aug 2010, Jonathan Nieder wrote:

Ævar Arnfjörð Bjarmason wrote:

We can even keep the "Content-Type: text/plain; charset=UTF-8\n" and
*not* use LC_CTYPE if we add a bind_textdomain_codeset("git", "UTF-8")
call to gettext.

Oh!  I'd personally prefer to do that for now. :)  (Not because of the
known printf problem but because I like to reduce possible unknowns.)

Well, in this case everybody will be force to have UTF-8 in output
on-screen, not useful for people using ISO8859-*, KOI8-R and similar
things...

--Marcin

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]