Odd encoding issue with UTF-8 + gettext yields ? on non-ASCII

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm having an odd encoding issue with gettext on my
gettextize-git-mainporcelain branch that hadn't been turned up before
because none of the existing messages used non-ASCII translations.

With this in is.po (full version at [is.po]):

    "Content-Type: text/plain; charset=UTF-8\n"
    "Content-Transfer-Encoding: 8bit\n

I do:

    $ msgfmt -o /opt/git/next-gettext/share/locale/is/LC_MESSAGES/git.mo is.po

Which, under an Icelandic locale gives me:

    $ rm -rf /tmp/meh; LANGUAGE= LC_ALL= LANG=is_IS.UTF-8 git init /tmp/meh
    Bj? til t?ma Git lind ? /tmp/meh/.git/

Those "?" characters are actual ASCII question marks.

But if I don't specify an encoding msgfmt will complain:

    $ msgfmt -o /opt/git/next-gettext/share/locale/is/LC_MESSAGES/git.mo is.po
    is.po: warning: Charset missing in header.
                    Message conversion to user's charset will not work.

But git will now emit the non-ASCII characters from its message
catalogue. Probably because some component now doesn't try to be smart
about encoding.

    $ rm -rf /tmp/meh; LANGUAGE= LC_ALL= LANG=is_IS.UTF-8 git init /tmp/meh
    Bjó til tóma Git lind í /tmp/meh/.git/

That'd probably break under a non-UTF-8 locale, like an ISO-8859-1 one
though.

A `hexdump -C` of the two `.mo` files is exactly the same, aside from
the charset header. I.e. both contain valid UTF-8 sequences, so the
issue is somewhere between the `*.mo` file being read and it being
emitted by `libintl` and the `gettext` function.

We're not doing anything odd in our [gettext.c] that I can see that
could explain this.

To reproduce it, do:

    git clone --reference ~/g/git git://github.com/avar/git.git next-gettext
    cd next-gettext
    git checkout -t origin/gettextize-git-mainporcelain
    make -j 4 prefix=/tmp/git all install
    rm -rf /tmp/meh; LANGUAGE= LANG=is_IS.utf8 /tmp/git/bin/git init /tmp/meh

Which'll give (as mentioned above):

    Bj? til t?ma Git lind ? /tmp/meh/.git/

But editing out the Content-Type line gives:

    Bjó til tóma Git lind í /tmp/meh/.git/

[gettextize-git-mainporcelain]:
http://github.com/avar/git/tree/gettextize-git-mainporcelain]
[is.po]: http://github.com/avar/git/blob/gettextize-git-mainporcelain/po/is.po
[gettext.c]: http://github.com/avar/git/blob/gettextize-git-mainporcelain/gettext.c
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]