Re: [Gimp-developer] Re: GIMP Tip of the Day messages

Christian Rose <menthos@xxxxxxxxxxx> · Sun, 07 Oct 2001 15:32:34 +0200

Daniel Egger wrote:
> IF we need to add another dependency it has to be worth it. Solving
> problems by using XML for everything seems only clever to me. It does
> not make sense to use XML for tip files while plug-ins still keep
> in beeing broken (in the localisation context).
> 
> > Someone mentioned how well Dia seemed to be doing in that respect.
> > Well, Dia puts the text strings for a sheet in a different file per
> > sheet. Even with only 8 supported languages, this already looks
> > totally cluttered to me.
> 
> Really? Everything is were it belongs to and nothing is used within
> wrong context and, last but not least, its extensible and that even
> easily.

Dia uses intltool/xml-i18n-tools for sheet files.

> > The tips file is 9 kB now. With 15 supported languages (how many on
> > the way?), that would become 135 kB.
> 
> In contrary to po files untranslated messages are simply nonexistant.
> And you forget one thing: All .po files together are by definition
> bigger since the original text is repeated within every single file.

And that is for a good reason (see below)...

> > You cannot expect translators to wade through 30 lines of other
> > languages to be able to add his/her own translation (30 lines per
> > string to be translated, that is), so that translators do need to work
> > on separate files.
> 
> Why not?

Because one of the fundamentals of easy translation is simply to have
the original text handy. This is so you can easily compare the original
and the translation, and ensure that the translation is entirely
correct. I have to visually compare the strings many times during the
translation of a single message, and at least twice: first to interpret
the message I'm about to translate, and finally to compare what I wrote
with the original so nothing got lost or added or any meaning changed in
the translation.
This means that the original string and the translation should be as
close as possible to each other, and this is why po format has the
messages this way: First the original, and immediately below the space
for the translation.
If you add a large number of translations to a single file and expect me
to edit it, I have to skip a large number of unrelevant "garbage" (since
I'm usually not at all interested in the other translations) just to
compare the original and my translation. This makes the process of
visually verifying translations harder.

Another more dangerous thing is encodings. Multiple encodings in a
single file don't mix well. I've got bitten too many times by other
translators accidentally saving the whole file with their encoding and
thus ruining my and many other's translations. Actually this was one of
the most important reasons why we went away from editing .desktop files
directly in GNOME: With hundreds of translators, the danger of someone
accidentally doing this became very imminent (happened quite frequently)
and it became a pain to ensure that translations weren't broken because
of simple "accidents" like this. Also, it became a mess to "clean up"
since effectively all translators had to be contacted to verify that
their translations were still correct after such an accident.

While enforcing the use of UTF-8 solves the encodings problem, it is not
feasible for many other reasons. One is simply the lack of support in
many editors and many other text processing tools (and terminals and so
on). Effectively enforcing a particular editor hasn't worked yet, and
probably never will, and it will probably take more time until all
editors natively support UTF-8. Also, many translators use "translation
memories", that is large po format databases with existing translations,
created and managed by special translation memory. I use such a memory
with all my existing translations (it's 6.4 MB of text) to automatically
generate a skeleton for all new po format translations, with messages
similar to existing translations already translated. Aside from the fact
that this won't work if you don't use po format, this points out the
encodings problem again: If you force me to use UTF-8, I have to
maintain several translation memories instead of a single one, one for
each encoding.
So while the storage of all translations in UTF-8 solves its shares of
problems, it creates new ones for translators. This is why intltool lets
translators use their encoding when translating, and converts it to
UTF-8 when needed.

> And where do you get the 30 from? If you have 15 languages then
> you'll have at maximum 15 times the original text to skip.

And that is still a problem, as explained above. 15 lines of irrelevant
text inbetween every single message and its translation into my language
makes verifying translations an unnecessary difficult burden.

> Beeing a translator myself (and in fact also one of the one of the DIA sheets)
> I can tell that this is not as evil as it might look.

Dia uses intltool now, so it seems they have recognized the problems the
translators had.

> > so I expect you have got a tool for the translators in mind?
> 
> If necessary I can hack something up but it should not be necessary.
> I really don't see the big difference to hacking a .po file.

It is necessary. po format and gettext have many important features that
translators depend upon, something I have previously experienced that
almost every translator knew.

If you do an alternative "hack", it better support most of the features
that gettext has and translators need. More of this in another letter.

> > gettext is also a standard.
> 
> Great. Show me the specs... I'm not talking about de-facto or so
> called "industry-standards". gettext is such a crap that I really
> doubt there was a standarisation process which led to a proper
> specification.

gettext has evolved. It has much of the features that translators need.
And, as you admit, it's industry standard. If you want to replace it,
you'd better write a better and fully compatible alternative (since a
lot of tools across many platforms are designed to work with this
industry standard), while keeping all existing features. I beleive this
is where people use the phrase "show me the code".

Christian