Daniel Egger wrote: > > Dia uses intltool/xml-i18n-tools for sheet files. > > That's new then. They didn't when I was translating the sheets. Then you should take a new look. It certainly does today. > > Because one of the fundamentals of easy translation is simply to have > > the original text handy. This is so you can easily compare the original > > and the translation, and ensure that the translation is entirely > > correct. I have to visually compare the strings many times during the > > translation of a single message, and at least twice: first to interpret > > the message I'm about to translate, and finally to compare what I wrote > > with the original so nothing got lost or added or any meaning changed in > > the translation. > > That's the same with the proposed XML format, just that all translations > are within one file and thus the unnecessary redundancy is gone. It's not the same, for the reasons I mentioned below. > > If you add a large number of translations to a single file and expect me > > to edit it, I have to skip a large number of unrelevant "garbage" (since > > I'm usually not at all interested in the other translations) just to > > compare the original and my translation. This makes the process of > > visually verifying translations harder. > > Okay, but then again it's just a matter of tools. I believe EMACS can > fold the unwanted ones away as can do VIM 6.0 and if necessary you > can use any XML editor of choice. Why should I have to use a special XML editor? How does the editor know what language I want to edit, and how does it automatically filter away everything else except the original string and my language? Do you believe everyone uses VIM or has the desire to be forced to do so? > > Another more dangerous thing is encodings. Multiple encodings in a > > single file don't mix well. > > UTF-8 rules. While I have to agree, you should probably have read a bit further. I mentioned the problems of using UTF-8 further down. > > I've got bitten too many times by other > > translators accidentally saving the whole file with their encoding and > > thus ruining my and many other's translations. Actually this was one of > > the most important reasons why we went away from editing .desktop files > > directly in GNOME: With hundreds of translators, the danger of someone > > accidentally doing this became very imminent (happened quite frequently) > > and it became a pain to ensure that translations weren't broken because > > of simple "accidents" like this. Also, it became a mess to "clean up" > > since effectively all translators had to be contacted to verify that > > their translations were still correct after such an accident. > > We live in a global world and we should act like that. The different > 8bit encodings are from a time where people cared very much about space > and not so much about internationalisation but this is not longer the > case; I believe that anything but UTF-8, Unicode and ASCII is futile and > will even be more in the future. Surely all translators are bound to agree with you. All incompatible 8bit encodings are a nightmare, and UTF-8 is the future. But that doesn't change the fact that we live with the tools we have today. We cannot stop translating or reduce the pace of translations until we live in a UTF-8 clean world, because simply there may never be such a world, and it is out of our control as translators, and for most idividual developers too, I'd imagine. It will be a long conversion process, and we have to support multiple encodings during that time. We can already store translations in UTF-8 today, but editing them is another matter. Also, the encoding problem remains to be only one of the problems with your solution. > > While enforcing the use of UTF-8 solves the encodings problem, it is not > > feasible for many other reasons. One is simply the lack of support in > > many editors and many other text processing tools (and terminals and so > > on). > > That's true. But what you're suggesting will only work for 8bit charsets > anyway; this broken software will most likely also fail for 2byte > charsets. Or do you want to exclude them? I do not understand what you are suggesting here. What's broken? iconv (which gettext and intltool uses) handles conversion of multibyte characters just fine. > > Effectively enforcing a particular editor hasn't worked yet, and > > probably never will, and it will probably take more time until all > > editors natively support UTF-8. > > The good ones already do and the bad ones never will; So you're saying that Emacs and many other editors are bad? Please don't turn this into an editor war. > there's still the possibility to escape those characters and then you'll have > pure ASCII. Escaping isn't realistic at all. I suspect your experience with having to write large amounts of text in your native language and escaping all non-ASCII characters is limited. It plain sucks for more ways than is possible to count. Escaping everything reduces typing speed (and makes your fingers hurt), makes it hard to read, introduces a greater danger of errors (by forgetting to escape, or incomplete escaping) or wrong escaping (one different escape sequence is similar to all others, while the result may be entirely different. In other words, a pain to verify without unescaping). > You can even let them convert automatically if you really want. And this is what intltool does. Back to square one. > > encodings problem again: If you force me to use UTF-8, I have to > > maintain several translation memories instead of a single one, one for > > each encoding. > > Huh? You're trying to tell me that UTF-8 will mess this up? How are you > handling this right now then with different encodings? I use only one, iso-8859-1. That's all I have to use. It contains all characters needed for Swedish, is supported by all modern software, and is automatically converted to UTF-8 when there's a need to do so. > > So while the storage of all translations in UTF-8 solves its shares of > > problems, it creates new ones for translators. This is why intltool lets > > translators use their encoding when translating, and converts it to > > UTF-8 when needed. > > Okay, that's fine with me. Wowsa, we agree on something! :-) > > And that is still a problem, as explained above. 15 lines of irrelevant > > text inbetween every single message and its translation into my language > > makes verifying translations an unnecessary difficult burden. > > See above. Try the folding feature of vim 6.0, it's really cool. I don't use vi or vim, and do not plan doing so for the forseeable future. > > Dia uses intltool now, so it seems they have recognized the problems the > > translators had. > > I haven't noticed that "problems", maybe it's only my imagination that > XML is easy to handle. For a developer, XML is usually a dream, and I understand why. But that doesn't mean that using a raw XML format for editing translations is ideal by any means. > > It is necessary. po format and gettext have many important features that > > translators depend upon, something I have previously experienced that > > almost every translator knew. > > important features in gettext? I will outline them in a seperate mail, I promise. > > gettext has evolved. It has much of the features that translators need. > > And, as you admit, it's industry standard. If you want to replace it, > > you'd better write a better and fully compatible alternative (since a > > lot of tools across many platforms are designed to work with this > > industry standard), while keeping all existing features. I beleive this > > is where people use the phrase "show me the code". > > Okay, I can hack up an application using XML for that purpose in almost > no time. While XML won't solve all problems here (I've not suggested to > replace gettext completely if you remember) it comes in quite handy > sometimes. That's the problem. You are only prepared to replace some limited gettext functionality, ignoring the rest. > I believe it's the righttime to say it again: I don't have anything > against the xml-i18n-tools; if people think it's much easier to use > them an .po files feel free to go ahead, however most of the brought > arguments are pretty bogus. They are not. How much do you translate a day? Much of what you base your conterarguments on is certainly doable, but not without extra work and hassle, effectively making translating more work than it has to be today. Today I'm ususally providing several completely new translations a week, while on the same time updating many older ones daily (If you don't believe me, have a look in cvs logs). The reason I can do this is simply because of the tools. Translation memories and fuzzy matching are most important parts of this. I hardly would consider any change of this pace to the worse, because some developer decided that his homebrewn (but for all practical purposes inferior) and incompatible translation scheme should be used, for any progress. And this is what I'm afraid of. > I think it's more the fear of a change then > any technical reason not to go for the whole thing. Yes, it's a fear. Fear of having an inferior translation scheme enforced by someone not completely understanding the problems and difficulties of daily translation work by translators, and willing only to replace some limited functionality while ignoring much else desperately needed functionality. I'm not saying that this is necessarily the case here, but it sure reminds me of every such previous discussion I've had. > Using .po files being translated into XML files has one big disadvantage: You > can't use different translations for the same phrase that have a different > meaning in a different context. I agree that this is a drawback of gettext (solvable by using the Q_() macro in intltool instead of _() or N_()), but do you really expect the same tip to occur more than once in the tips XML file, and require different translations at the same time? Christian