Re: [libvirt PATCH 00/51] Use permutable format strings in translations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 10, 2023 at 05:43:46PM +0100, Jiri Denemark wrote:
> On Fri, Mar 10, 2023 at 16:29:52 +0000, Daniel P. Berrangé wrote:
> > On Fri, Mar 10, 2023 at 05:17:21PM +0100, Jiri Denemark wrote:
> > > On Fri, Mar 10, 2023 at 16:14:00 +0000, Daniel P. Berrangé wrote:
> > > > On Fri, Mar 10, 2023 at 05:09:16PM +0100, Jiri Denemark wrote:
> > > > > See 01/51 for rationale. Enforced by the last two patches of this
> > > > > series. The rest is quite boring mechanical update, partially done using
> > > > > a perl oneliner
> > > > > 
> > > > >     perl -pe 'for (my $i=1; $i<=12; $i++) { s/(N?_\("[^"]*?%)([^%$ ]*[a-zA-Z][^"]*")/\1$i\$\2/; }'
> > > > > 
> > > > > and tuned manually to fix cases not covered by the regexp above and to
> > > > > merge multiline messages into a single line. I merged only those that
> > > > > were touched anyway. Some very long messages consisting of several
> > > > > sentences were merged only partially and split on sentence boundary.
> > > > > 
> > > > > I will also update libvirt.pot once this is pushed.
> > > > 
> > > > Are we *100% sure* weblate is going to handle this intelligently.
> > > > 
> > > > This will change almost all of the msgid strings in libvirt.pot,
> > > > and translations are associatd with msgid strings.
> > > > 
> > > > IOW, this risks throwing all our translations away putting us back
> > > > to near zero translation coverage, unless weblate is intelligent
> > > > enough to map numbered format strings, to non-numbered format
> > > > strings, and I'm not convinced that it can do that.
> > > 
> > > I don't know is there a way to check this? Technically we should be able
> > > to update the translations as well to make sure we don't lose any work
> > > done by translators. But can that be pushed into weblate somehow (I
> > > guess it must have some kind of import in case you do the translation in
> > > a separate tool).
> > 
> > Yes, there is a mechanism to import that I used when first setting
> > up weblate, but I can't remember exactly what it was now. I do
> > recall, however, that it was *immensely* slow and continually
> > pushed weblate into OOM death due to the large number of string
> > in libvirt.pot. Took me days to get everything imported :-(
> > 
> > 
> > Technically we should really not have to modify the .pot at all,
> > as it is valid to use numbered format strings in the translation
> > regardless of whether the .pot uses them.
> > 
> > The main downside is that weblates c-format check is broken so
> > will complain that the translation format is wrong, despite being
> > correct. 
> 
> Not sure whether the check is disabled by default, but it allows you to
> use numbered format strings when msgid doesn't use it. The problem is
> that translators cannot just copy&paste a format string to the right
> place, they need to invent it. And sometimes they apparently use tools
> that do not even allow using numbered format strings when they are not
> present in msgid. Which then causes regressions as updates undoing the
> correct formatting are pushed via weblate. Which is what happened just
> now with https://gitlab.com/libvirt/libvirt/-/merge_requests/232

The c-format check is currently marked enforcing, and that prevents
translators from confirming strings that have mis-matched formats.
When the string remains in the 'needs editting' state in weblate, it
gets marked as 'fuzzy' in the .po file, and msgfmt will then throw it
away when building the .mo file.

IOW, we should in fact be safe from any crash problems with mis-matched
format strings, despite them appearing in the .po file.

Looking at weblate we have about 100,000 strings in the 'needs editting'
state, which is almost 20% of our total translations in the .po files
not being used. Not all of this is due to c format ordering badness,
in some cases there are formats entirely missing !

We didn't actually have a regression in thue MR 232 because the old
translation with numbered formats was marked fuzzy too, so not being
used.

None the less we need to make this work. The c-format check is good
because it protects us from crashing. Ideally that would be fixed
to permit numbered formats in msgstrs, even when msgid does not use
them.

Even if fixed, it might be worth switching the .pot file anyway, but
this can't be done without us bulk updating the translations, and
bulk re-importing them, which will be challenging. We'll almost
certainly want to try this on a throw-away repo in weblate first,
not our main repo.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux