On Thu, Apr 19, 2018 at 09:53:01AM +0200, Ján Tomko wrote: > On Thu, Apr 12, 2018 at 02:28:22PM +0100, Daniel P. Berrangé wrote: > > Similar to the libvirt.pot, .po files contain line numbers and file > > names identifying where in the source a translatable string comes from. > > The source locations in the .po files are thrown away and replaced with > > content from the libvirt.pot whenever msgmerge is run, so this is not > > precious information that needs to be stored in git. > > > > When msgmerge processes a .po file, it will add in any msgids from the > > libvirt.pot that were not already present. Thus, if a particular msgid > > currently has no translation, it can be considered redundant and again > > does not need storing in git. > > > > When msgmerge processes a .po file and can't find an exact existing > > translation match, it will try todo fuzzy matching instead, marking such > > entries with a "# fuzzy" comment to alert the translator to take a > > look and either discard, edit or accept the match. Looking at the > > existing fuzzy matches in .po files shows that the quality is awful, > > with many having a completely different set of printf format specifiers > > between the msgid and fuzzy msgstr entry. Fortunately when msgfmt > > generates the .gmo, the fuzzy entries are all ignored anyway. The fuzzy > > entries could be useful to translators if they were working on the .po > > files directly from git, but Libvirt outsourced translation to the > > Fedora Zanata system, so keeping fuzzy matches in git is not much help. > > > > Finally, by default msgids are sorted based on source location. Thus, if > > a bit of code with translatable text is moved from one file to another, > > it may shift around in the .po file, despite the msgid not itself changing. > > If the msgids were sorted alphabetically, the .po files would have > > stable ordering when code is refactored. > > > > This patch takes advantage of the above observations to canonicalize > > and minimize the content stored for .po files in git. Instead of storing > > the real .po files, we now store .mini.po files. > > > > The .mini.po files are the same file format as .po files, but have no > > source location comments, are sorted alphabetically, and all fuzzy > > msgstrs and msgids with no translation are discarded. This cuts the size > > of content in the po directory from 109MB to 19MB. > > > > Users working from a libvirt git checkout who need the full .po files > > can run "make update-po", which merges the libvirt.pot and .mini.po > > file to create a .po file containing all the content previously stored > > in git. > > > > Conversely if a full .po file has been modified, for example, by > > downloading new content from Zanata, the .mini.po files can be updated > > by running "make update-mini-po". The resulting diffs of the .mini.po > > file will clearly show the changed translations without any of the noise > > that previously obscured content. Being able to see content changes > > clearly actually identified a bug in the zanata python client where it > > was adding bogus "fuzzy" annotations to many messages: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1564497 > > > > Users working from libvirt releases should not see any difference in > > behaviour, since the tarballs only contain the full .po files, not the > > .mini.po files. > > > > As an added benefit, generating tarballs with "make dist", will no > > longer cause creation of dirty files in git, since it won't touch the > > .mini.po files, only the .po files which are no longer kept in git. > > > > To avoid creating a single commit 100+MB in size, each language is > > minimized separately in a following commit. > > From a brief look at those, the few Slovak "translations" are all in > English and many of the translation team pages still point to transifex, > but I assume that data comes from Zanata. Yeah there's a few other languages too where, for unknown reasons, the english has been duplicated into the translation. I could go clicky-clicky and kill that in Zanata UI but there's alot, so I want to figure out a way to automatically extract that list of bad translations & cull them all in one go via the API. Good point about the translation URLs pointing to transifex. I'll submit another patch for that too. > > Signed-off-by: Daniel P. Berrangé <berrange@xxxxxxxxxx> > > --- > > .gitignore | 3 +++ > > build-aux/minimize-po.pl | 37 +++++++++++++++++++++++++++++++++ > > po/Makefile.am | 30 ++++++++++++++------------- > > po/README.md | 53 +++++++++++++++++++++++++++++++++++++++++------- > > 4 files changed, 102 insertions(+), 21 deletions(-) > > create mode 100755 build-aux/minimize-po.pl > > > > Reviewed-by: Ján Tomko <jtomko@xxxxxxxxxx> > > Jano Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list