Re: [Gimp-developer] Re: GIMP Tip of the Day messages

Christian Rose <menthos@xxxxxxxxxxx> · Sun, 07 Oct 2001 22:13:11 +0200

Daniel Egger wrote:
> > Whatever the solution regarding GIMP tips turns out to be, translators
> > want to be able to translate them from within po files. I hope everyone
> > has agreed on that :)
> 
> not really.

Okay, but that really makes you an exception among translators. This
discussion isn't new, it has been repeated for ages and happens every
time a developer does not understand why po format should be used, but
rather wants his own "brilliant" hack to reinvent the wheel, without
understanding why po format is essential to the majority of translators.
The short answer is "the tools". gettext is industry standard, and there
are a huge amount of tools for creating, maintaining and reusing
translations in this format. Also the few tools included in GNU gettext
itself has many important features.
As far as I know, no translator has ever objected to the need of po
format for these reasons, and we have discussed this extensively. The
problem of people inventing more and more different formats to keep
their software messages in (.oaf, .sheet, various xml formats, .desktop,
.soundlist, .directory, etc etc) in GNOME was a major pain to
translators, and that eventually resulted in the development of
xml-i18n-tools as a middle layer, allowing developers to use their
formats (with those advantages that gives) while on the same time
allowing translators to use their format (with those advantages that it
gives).

Currently it's used for the majority of GNOME modules and the plan is to
use it for all of them. There's no disagreement about that, not that I
know of at least.

> > If you go for XML, I'd recommend using intltool. It's a set of tools
> > designed exactly for this purpose. Since gettext itself doesn't have a
> > clue about XML, intltool works as a middle layer that extracts strings
> > marked for translation in the XML and adds them to header files, so that
> > xgettext can extract them and put them in a pot. The reverse process is
> > usually done at build time, and all the translations merged back into
> > the original XML file.
> 
> > You can find intltool in the xml-i18n-tools module in CVS.
> 
> Okay, so why would one want this heavy conversion action? If the only
> purpose is to have only one editable catalog instead of several files
> and people really need that then okay...

I have already mentioned the disadvantages of a single translation file
in my previous mail, but there are many more advantages to po format
than that.

Basically it amounts to the fact that there's much more to translation
than just creating a translation. In many cases, creating the initial
translation is the easiest part time-wise: maintaining the translation
as the software evolves (often for many years) and updating and adding
translations of individual messages as they get added to the source over
time, usually takes more effort over a much longer period of time.
This is the single largest weakness of your proposal, it doesn't mention
anything of how this is to be solved, while gettext already has features
for this.

For the initial creation of a translation, the technology with using a
translation memory is becoming more common. This is a single large
collection of all existing translations in po format, that are re-used
for the new translation by running a special tool. My memory is
currently more than 6 MB of text, and gives up to 25% - 30% (depending
on the pot file) of exact matches in a new translation. That means 25%
to 30% less work for me when creating the translation, which usually
amounts to many hours of saved work. Also, even if the number of exact
matches are smaller, the number of close matches ("fuzzy" matches) are
usually large, and these close matches usually save much time when
translating (I don't have to do a complete translation of this message
from scratch but usually only have to make smaller adjustments) and also
helps improving consistency in translations, so that they use the same
translations of identical terminology and writing.

Translation memories can also be used for maintaining translations - as
new messages are added, you can re-run the translation against the
translation memory and match them against existing translations this
way. I myself don't use them this way but solely for new translations,
but I know other people that also use them this way.
Nevertheless, these translation memory tools use the po format since
this is what is used across free software translations, and if you have
decided upon another format, you have to deal with making existing
translation memory tools usable with it. Anything else is a step
backwards.

However, that was only the problems of the cration of translations,
while I previously mentioned that maintaining is the main work. Among
other things, the gettext tools themselves help with the following
issues related to updating translations:

* Fuzzymarking of changed messages. This is a really important feature.
If and when an original message is changed, translators need to easily
be notified about it, to be able to update their translation
accordingly. This is automatically handled by gettext, and messages that
have changed are marked "fuzzy" until the translator updates the
translation.
* Fuzzymarking of new messages. In a similar manner, new messages that
are added to the sources are matched against existing messages with
translations, and if they have similarities the most closely matching
translation is automatically picked and marked fuzzy, so that the
translator can make only the appropriate changes, instead of having to
re-translate this message from scratch. This feature is most essential
when translating any larger message.

There are more features, but the above are the essential ones in this
case. They are unfortunately not trivial to reimplement, and on the same
time very essential to effective translation.
Even if all this should be reimplemented and the wheel reinvented, the
issue remains with compability with all existing tools. I have already
mentioned translation memory tools and other translation tools, but
there's a lot more that depends on, and is designed for, the po format.
One such thing would be simply translation statistics. Translation
statistics are important to translators in that it is an essential tool
when deciding on where to devote work at the moment (have a look at the
http://developer.gnome.org/projects/gtp/status/ pages). These statistics
are all based on the use of po format, where statistics for individual
translations are easily available by querying msgfmt, and a change in
translation format would also require a change also to these statistics
tools to be usable with regards to translation status.

> > I as a translator also prefer po format... I doubt there is any
> > translator that wouldn't.
> 
> I don't. I don't care which format the translations have to be in.
> XML is about as easy as .po...

Only if you disregard everything else than just the method of inputting
text, and even that has its problems with an XML file with all
translations thrown together. For all the reasons given in this thread,
I cannot see an alternative to the po format as a reasonable
alternative, at least not without the backup of some significant amount
of code that isn't actually a step backwards for translators.

I hope we can agree on the solution using intltool that Sven proposed,
and that we can finish this thread.

Christian