Re: [Gimp-developer] Re: GIMP Tip of the Day messages

Christian Rose <menthos@xxxxxxxxxxx> · Mon, 08 Oct 2001 03:39:53 +0200

Daniel Egger wrote:
> > Why should I have to use a special XML editor?
> 
> You don't have to, that's the trick.

Ok, I got the impression from your message that this was the case.

> > How does the editor know what language I want to edit,
> 
> Easy, you tell it.

So this is an extra step that I have to do whenever editing a
translation with your scheme?

> > and how does it automatically filter away
> > everything else except the original string and my language?
> 
> Black magic but easy to hack.

I'd like to see that hack first.

> > Do you believe everyone uses VIM or has the desire to be forced to do so?
> 
> No, but having the right tools will always have a big impact on the
> efficiency of your work.

Nothing to argue about that.

> I really don't want to know that you're using
> to edit textfiles but if you're not skilled with one of the major
> ones (No, MicroSoft Notepad and Wordpad are not major) then you're most
> likely wasting your time.

No, I'm using Emacs pure and simple. And by the reasoning in your
previous mail, you implied that it's an inferior tool, just because it
doesn't natively support UTF-8 yet. I can tell you that this is not
true, it certainly is a capable editor, and it shares the state of many
other popular and common editors.
Native support for UTF-8 is uncommon and of course that is bad and
should get better, but that doesn't change the fact that forcing
translators to use UTF-8 today causes real and practical problems for
translators.
Editors aside, simply looking at and otherwise using console text tools
on UTF-8 files with non-ASCII content, usually has little if any
support.

> > Surely all translators are bound to agree with you. All incompatible
> > 8bit encodings are a nightmare, and UTF-8 is the future. But that
> > doesn't change the fact that we live with the tools we have today. We
> > cannot stop translating or reduce the pace of translations until we live
> > in a UTF-8 clean world, because simply there may never be such a world,
> > and it is out of our control as translators, and for most idividual
> > developers too, I'd imagine. It will be a long conversion process, and
> > we have to support multiple encodings during that time. We can already
> > store translations in UTF-8 today, but editing them is another matter.
> 
> This is free and open software, no one is used to walk the hard way just
> because there are only a few tools available; pick one of the available
> tools or improve others instead of living in pain.

We are not talking about some change that will give new functionality.
We are talking about a proposed forced change that for all intents and
purposes will give no benefit to translators (although you like to label
them as such) but rather the opposite - instead of helping translators
you want to make what they do more difficult, as has been already
extensively discussed in this thread, with questionable gains at best.

Note that I'm not at all against the use of XML, on the contrary if it
helps developers or is more extensible or has any other development
benefit I'm all for it. I'm against the forcing of translators to use
this format for editing (as per your proposal) instead of using intltool
as middle layer and letting translators use their tools.

I'm not a developer and I rather devote my time on translating than on
hacking code. Thus I have no interest in devoting time to make your
proposed system usable by translators and reinvent the wheel, rather
than using what's already available and working (intltool).

> > Also, the encoding problem remains to be only one of the problems with
> > your solution.
> 
> Now we're getting closer to the point I hope.
> 
> > So you're saying that Emacs and many other editors are bad? Please don't
> > turn this into an editor war.
> 
> Emacs can't do UTF-8?

No it can't. It's a planned feature, but it has remained so for a long
time, at least as long as I have kept checking out the roadmap and
feature announcements.

> (minutes later) Hm, okay I imagined a major editor like emacs would
> support UTF-8 natively but I was proven wrong.

I'm sure you'll find out many other surprises when you check what text
tools in any major GNU/Linux distribution actually fully supports UTF-8,
and how many of the common ones you have to leave aside. This is the
problem I'm talking about.

> Though there is a package here
> ftp://ftp.cs.ust.hk/pub/ipe/oc-unicode-0.72.2.tar.gz

I have previously tried to use both of these hacks (there are two ones)
but with little success. Also, they appearantly have problems of their
own. If I remember correctly, at least one of them only supported
viewing of UTF-8 and not editing.
If they had been acceptable I'm sure they would have already been
incorporated into the main Emacs distribution long ago... :-(

> that adds the missing support to version 20.4 (20.6 preferred).
> Actually I really don't care what people are using and I'm not saying
> any editor is worse or better than any other (well, the most obvious
> ones excluded).

Ok, then I take it that you take back your previous statement about
tools not supporting UTF-8 automatically being inferior.

Sure the tools need to get updated in the end, but it's a very slow
process that has already taken years with little happening and surely
many years remain to come, and in the meantime forcing translators to
have to use UTF-8 is a big practical problem for translators. Note that
I'm not against the use of UTF-8 for storing the translations in an
application-accessible way, on the contrary, using UTF-8 and having
UTF-8-cleanliness in the application itself usually solves many
localization problems (as witnessed in Evolution). However to force
translators to *edit* the translations in UTF-8 causes problems today,
and isn't necessary with intltool as it then gets automatically
converted to UTF-8. That's why I advocate intltool also for this reason.

> > Escaping isn't realistic at all. I suspect your experience with having
> > to write large amounts of text in your native language and escaping all
> > non-ASCII characters is limited.
> 
> I have to do that all the time, Umlauts in LaTeX are preferrable escaped
> by an ", and in HTML or DocBook I also have to use the escaped versions,
> so what's the matter?

I suspect you should have read what I wrote to the end.

> > It plain sucks for more ways than is possible to count.
> 
> Welcome to reality, sucks eh?

Well, when I have to choose between a solution that is usable and one
that sucks, guess which one I would prefer.

> > Escaping everything reduces typing speed (and makes
> > your fingers hurt),
> 
> Well, some editors can do it for you but then there're people who
> prefer american keyboard layout and thus have no choice.

I have used editors that do it for me but that still doesn't solve the
problems I mentioned just below...

> > makes it hard to read, introduces a greater danger
> > of errors (by forgetting to escape, or incomplete escaping) or wrong
> > escaping (one different escape sequence is similar to all others, while
> > the result may be entirely different. In other words, a pain to verify
> > without unescaping).
> 
> Again, this is something you shouldn't have to care about; the
> ridicoluous speed increases of computerearchitectures also have their
> good sides.

I'm not sure what you are suggesting here. Yes, I do actually care about
my translations being correct, so "not worrying" about reviewing is out
of the question. 
And yes, although you can possibly generate it to have the
text/translation shown in human-readable form, this is yet another
(unnecessary) step I would have to do, with no gains.
Also, when I send out translations for review, I usually prefer if I can
send out the source of the translation in a readable format (that
usually means po files), so that any feedback I get can easily be
applied to it. Any system where I have to convert between formats just
to make the translation readable and in the reverse process apply
changes to it is a step back from the self-contained and readable po
format.

> > because of the tools. Translation memories and fuzzy matching are most
> > important parts of this.
> 
> Fuzzy matches often lead to wrong translations unfortunately.

Not that often, it all depends on how experienced the translator is I'd
say. It usually saves a lot of time though and is as such a very
important feature, time that can be better spent with translation of
more software. It's also a very common feature since most software have
messages that are rather similar to each other, and it thus saves a lot
of time this way when translating.

Also, I have hardly seen any such error getting past the review process.
Are you experienced in regularily using the help of reviewers, or
regularily reviewing other people's translations? Both are useful
lessons and practices that I wholeheartedly recommend.

> And one of
> the major problems with the catalogs of big applications is that one
> cannot easily translate a phrase differently in differing contexts.

As I have previously said, this can be solved by using the Q_() macro of
intltool instead of N_() or _().

> > I hardly would consider any change of this pace to the worse, because
> > some developer decided that his homebrewn (but for all practical
> > purposes inferior) and incompatible translation scheme should be used,
> > for any progress. And this is what I'm afraid of.
> 
> Heh, I you think work is lost then you're probably underestimating us,
> of course someone will hack up a script to convert from the old to
> the new style.

That still won't solve the problems:
	1) Your proposed solution still doesn't have the functionality
	of gettext as you described it,
	2) Your proposed solution still doesn't solve the one-file
	problems,
	3) Your proposed solution still doesn't solve the problems of
	no other tools supporting this format, including translation
	memories and statistics tools,
	4) Your proposed solution still remains vaporware for the time
	being.

It's not the matter of a "simple script", and I sure hope that you do
not still beleive so.

> Just the handling in the future would differ and as I
> already said it's more the fear of a change then a real deterioriation
> that would be seen here.

You're certainly right about fear. Fear about a single person claiming
that he can in no time replace software that has been evolved for
decades, and that he intends to do so and enforce this change, all while
this person openly admits that the proposed "solution" won't have hardly
any of the essential features of the system to be replaced, and admits
that he does hardly do any translation work at all, or understand why
the features are necessary.

> Moving to a better machine readable format
> also has the advantage that the machine can support the user much better.

Sure, and we can do this already today by using the combination of XML
and intltool.

> > Yes, it's a fear. Fear of having an inferior translation scheme enforced
> > by someone not completely understanding the problems and difficulties of
> > daily translation work by translators, and willing only to replace some
> > limited functionality while ignoring much else desperately needed
> > functionality. I'm not saying that this is necessarily the case here,
> > but it sure reminds me of every such previous discussion I've had.
> 
> Okay, I will silently accept your POV here since it doesn't make any
> sense to elaborate this any further to me.

Ok. I will answer the rest of your points, and hope that we can let this
discussion rest after that.

> > I agree that this is a drawback of gettext (solvable by using the Q_()
> > macro in intltool instead of _() or N_()),
> 
> Ugh, _() and N_() were invented to reduce the overhead to a minimum and
> still many developpers haven't understood the idea behind it.

Most developers I have met do understand the difference between them,
but I suppose we simply have different experiences regarding that.

> Q_() is
> really about the biggest bullshit I've seen for quite a while and it
> surely will not make the concepts easier to understand and the software
> buggier.

I fail to see how Q_() will make software buggier (on the contrary I'd
say), and I'm sure the persons responsible would like more constructive
criticism and suggestions than "it's bullshit".

> While I do agree with Marc that XML is not the all-purpose solution I
> really think it's going to help in the case of localisation by the
> consistent use of UTF-8 and other concepts like includeable files and
> overrideable tags.

Please explain (we can do that in private or preferrably on gnome-i18n
since that surely will become off-topic), because although I'm highly
sceptical to this as a solution for any forseeable future, I'm curious
to what you have in mind.

> Also having cluttered files definitely helps the
> one-phrase-several-meanings problem

Care to explain how?

> though I see that it's hard to
> understand that several files don't automatically mean a deterioriation
> in comfort given that the tools to support the people would be easier to
> write than a reply to such a mail and they would have functions no
> one would probably expect from a .po editor with it's fuzzy messages
> and translation pool.

I must admit I had trouble parsing this sentence.

> > but do you really expect the
> > same tip to occur more than once in the tips XML file, and require
> > different translations at the same time?
> 
> No, I was talking about translations in general here.

Sure, but for all intents and purposes, this thread was about the GIMP
tips.

Christian