Hi, I'd like to propose a small change that involves many Fedora packages. (First I thought I'd put it in bugzilla, but I don't know what the right component would be.) The proposed change is the following: when building RPM packages, let's convert all .mo files (gettext translations) to UTF-8. Why? - As Fedora is a fully UTF-8 system, applications are likely to request translations in UTF-8. (There might be a few applications that are exceptions, and some users may have special setup or special wrappers to run certain applications in some other charset, but in the vast majority of the cases gettext is required to return UTF-8 string.) If the .mo file is already in UTF-8, the gettext() call simply returns a pointer pointing somewhere in the area where the .mo file is mmap()ed to. This can simply be checked with strace. This way no run-time conversion happens and no per-proecess memory is involved; translations are shared by all the processes that use the same message catalog. If, however, .mo file uses a different encoding, gettext() has to allocate memory for the converted string and has to perform the conversion. This way if more processes display the same localized string, they all allocate their own memory area to store the UTF-8 version of the string and they all perform the charset conversion. And actually they all load the corresponding gconv module which could be avoided, too. To summarize, having all the .mo files in UTF-8 would save both memory and CPU time. - Currently the encoding of the .mo files is completely arbitrary; it is always what the software developers or the translators happened to use. With this change, it would be consistently UTF-8. This would make it easier to find which package ships a particular translation. It often happens that I want to locate which package a particular message comes from. It might happen because a word is misspelled, or because the whole message shouldn't appear and I'd like to fix the buggy package. The obvious solution is to do a recursive grep on /usr/share/locale/<lang>. If all the .mo files of the distribution are converted to UTF-8, I can do it simply, without having to worry about accented characters. (grep in UTF-8 mode works fine and finds the matching UTF-8 .mo files even though they are not fully valid UTF-8 files, the UTF-8 strings are surrounded by other binary data.) However, if multiple encodings are used, there is no straightforward way to find accented letters, it becomes a much harder job. How? - Due to RPM's flexibility, none of the packages needs to be modified, only the RPM macros. I recommend to perform the conversion on the .mo files after the '%install' step (in '%__install_post' or whatever it's called), this way this whole story is independent from the package's build procedure (does it use autotools or not; does it re-generate .mo files from .po or ships pre-built .mo files; no need to worry about faulty and hence skipped .po files; no need to take care of non-standard places of po/mo files within the source tree; etc...) - The only thing that needs to be done is an "msgunfmt" followed by "msgconv -t UTF-8" and finally "msgfmt" for all the .mo files under the standard locale directories. - So, after all, it is _very_ easy to implement it. Is it safe? - The encoding inside the .mo files is completely transparent to the applications as gettext() and its friends always convert the strings to the charset requested by the application. So applications won't notice any change. - We performed this step when building all the packages of the UHU-Linux 2.0 distribution, which was released a half year ago, and so far no known problems arised. (During the test period there was only 1 package (namely coreutils) where the converted .mo files were corrupted, but as it turned out, it was caused by a bug in msgunfmt in gettext-0.15, which is already fixed in gettext-0.16.) Any drawbacks? - Not known by me, except for a negligible growth in the packages' sizes. Well, I hope you like my idea :-) bye, Egmont -- fedora-devel-list mailing list fedora-devel-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-devel-list