Re: [PATCH/RFC 0/5] Add internationalization support to Git

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, May 30, 2010 at 01:46, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:
> Hi Ævar,

Hi, and thanks for taking the time to review this.

> Ævar Arnfjörð Bjarmason wrote:
>
>>     I made three strings in git-pull.sh translatable as a proof of
>>     concept. One problem that I ran into is that xgettext(1) seems
>>     very particular when picking up translation strings. It accepts
>>     this:
>>
>>         gettext "hello world"; echo
>
> Does ‘gettext -s "hello world"’ work, too?  (Just curious.)

No, that just makes "-s" translatable. Even options that gettext
accepts don't work either, you have to use eval_gettext "\$foo"
instead of gettext -e "\$foo". The xgettext program is quite naïve
like that.

>>     but not this:
> [...]
>>
>>         gettext <<"END";
>> hello world
>> END
>>
>>     Maybe there's a way to make it play nice. But I just used a large
>>     multiline string as a workaround.
>
> Not so nice, but it seems that gettext expects a message id as
> an argument (i.e., it will only replace echo and not cat).

Yes. I mailed the maintainer about this. gettext would need to accept
text on STDIN and xgettext would need to find the messages for it to
work.

In the meantime we could just use multiline strings. It works for the
test suite.

>>     I don't know what to do about
>>     'die gettext' other than define a 'die_gettext' wrapper function
>>     and use `xgettext --keyword=die_gettext'.
>
> Sounds sensible.
>
>> One thing I haven't done is to try to go ahead and make massive
>> changes to the Git source code to make everything translatable.
>
> I am vaguely worried about performance.  Suppose a function does
>
>        for (i = 0; i < 1000000; i++)
>                printf(_("Some interesting label: %s\n"), foo(i));
>
> Will this compile to the equivalent of
>
>        const char *s = _("Some interesting label: %s\n");
>        for (i = 0; i < 1000000; i++)
>                printf(s, foo(i));
>
> Suppose someone decides to make that change by hand (maybe the
> loop is too large for the compiler to notice the potential
> winnings).  Then presumably gcc cannot be able to type-check the
> format any more.  Is there some way around this that avoids
> both speed regressions and loss of type-safety?

Any level of indirection is of course going to be slower, there's no
way around that. I made two test programs to test this out:

test-in-loop.c:

    #include <stdio.h>
    #include <stdlib.h>
    #include <locale.h>
    #include <libintl.h>

    #define _(s) gettext(s)

    int foo(long int x) {
        return x * x;
    }

    int main(void) {
        const char *podir = "/usr/local/share/locale";
        if(!podir) puts("zomg error");
        char *ret = bindtextdomain("git", podir);
        ret = setlocale(LC_MESSAGES, "");
        ret = setlocale(LC_CTYPE, "");
        ret = textdomain("git");

        for (long int i = 0; i < 10000000; i++) {
            printf(_("Some interesting label: %ld\n"), foo(i));
        }

        return 0;
    }

test-outside-loop.c:

    #include <stdio.h>
    #include <stdlib.h>
    #include <locale.h>
    #include <libintl.h>

    #define _(s) gettext(s)

    int foo(long int x) {
        return x * x;
    }

    int main(void) {
        const char *podir = "/usr/local/share/locale";
        if(!podir) puts("zomg error");
        char *ret = bindtextdomain("git", podir);
        ret = setlocale(LC_MESSAGES, "");
        ret = setlocale(LC_CTYPE, "");
        ret = textdomain("git");

        const char *s = _("Some interesting label: %ld\n");
        for (long int i = 0; i < 10000000; i++)
            printf(s, foo(i));

        return 0;
    }

Note that I use 10 million iterations, not 1 million like in your
example.

Here's how they compile:

    $ gcc -std=c99 -o test-in-loop test-in-loop.c ; gcc -std=c99 -o
test-outside-loop test-outside-loop.c
    test-in-loop.c: In function ‘main’:
    test-in-loop.c:21: warning: format ‘%ld’ expects type ‘long int’,
but argument 2 has type ‘int’

I.e. your concerns are valid. GCC won't catch an invalid format
specifier in this case.

And even though gettext tries to make cases like these fast
(http://www.gnu.org/software/hello/manual/gettext/Optimized-gettext.html)
it's still a lot slower than hardcoded English:

    perl -MBenchmark=:all -MData::Dump=dump -E 'cmpthese(10, {
         outside => sub { system "./test-outside-loop >/dev/null" },
         inside =>  sub { system "./test-in-loop >/dev/null" },
    });'

            s/iter  inside outside
    inside    13.4      --    -83%
    outside   2.26    495%      --

> Apologies if this was already answered in the earlier discussion.

What you can do (and this was covered) is to use msgfmt to check that
no translations use different format specifiers. But hopefully cases
where you have messages like these in tight loops and the message
lookup itself is a significant contributor to the program time will be
so rare as to not be an issue.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]