On a Friday in 2020, Daniel P. Berrangé wrote:
On Fri, Jul 17, 2020 at 05:01:47PM +0200, Pino Toscano wrote:Hi, I recently took a look at the UI/user visible messages from libvirt, which are translated using gettext. They are extracted in a single libvirt.pot catalog, which includes messages from libvirt.so itself (mostly, if not all, errors), the separate daemons, the helper tools, and from virsh. I noticed there is plently of room for improvements: what strikes is the lack of consistency among the messages. Let me state first: I understand that not all the people are native English speakers (I am not), so I'm not picking against anyone.Yes, the lack of consistency is pretty bad and makes more work for our translators.
Also, I'm sure a portion of our translatable strings are in unreachable error paths (i.e. we are looking up some data that we just succesfully put there a few lines above) and by Murphy's law, there are code paths missing an error completely or having an undescriptive message. Hopefuly aborting on OOM will help us erase more messages.
Some examples: a) different capitalization: - "cannot open %s" - "Cannot open %s"
I vote for the capitalized version, see below.
b) different quoting for files/identifiers/etc: - "Cannot open %s" - "Cannot open '%s'"
Yes, sometimes the error is worded in a way that prevents this, e.g. current vcpus count must be an integer for <vcpu current='x'> We could even pass the hardcoded identifiers via %s, e.g. _("Invalid value of '%s': %s"), cpuset, tmp instead of: _("Invalid value of 'cpuset': %s"), tmp to prevent the identifier from being translated.
c) different verbs for failed actions: - "Cannot frobnicate ..." - "Could not frobnicate ..." - "Did not frobnicate ..." - "Failed to frobnicate ..."
"Failed to" seems most factual here
- "Unable to frobnicate ..." depending on the message, also "frobbing failed"
Frobbing failed takes one extra character compared to that.
d) sometimes contractions ("couldn't", "don't", etc), sometimes not ("could not", "do not", etc) e) what QEMU/etc supports: - "... by this QEMU binary" - "... for this QEMU binary" - "... in this QEMU binary" - "... with this QEMU binary" - "... by this QEMU" - "... for this QEMU" - "... with this QEMU" - "... with this binary" [in a QEMU file] - "... [supported] by qemu"
There are possibly subtle nuances there: "by this QEMU binary" -> the particular QEMU does not support it at all - it was not impleneted yet or it was compiled out "with this QEMU binary" -> it might but libvirt does not bother to do the legacy part "by qemu" -> not in QEMU at the moment of writing this error message "for this QEMU binary" just sounds wrong to me, maybe a native speaker can correct me on that? (but I bet most of the uses did not care about those and just copied and pasted it from somewhere) Also, does 'QEMU binary' vs. 'QEMU' bring any extra clarity?
there is also "qemu does not support ...", which I think it can stay
Most of these are quarded by QEMU_CAPS so they fall into one of the first two categories above. I think I found only 'accel2d' that was never intended to be supported by QEMU.
for now; also both "available [by/for/etc]" and "supported [by/for/etc]" are used
That should be 'supported for <functionality>', not 'supported for QEMU'.
I can give it a try in fixing the messages to be more consistent all around; before I start the mass editing, I need to know which style to follow:
If you put the style in writing first, other people might help too.
a) it seems like the virError fields @message, @str1, @str2 and @str3 are joined together in reporting/log strings like "error: <text>"; hence, should they be not capitalized? It may look OK in English, but less nice and hard to fix in translations. Obviously, sentences as shown in tools (e.g. virsh) definitely need to be properly capitalized.I think there is no correct answer here, because even with the error messages, the <text> is not always used in the "error: <text>" scenario. eg an application like virt-manager will merely display "<text>" in a dialog box. On the one hand I'd suggest lowercase text for error mesages, but if the message is multiple sentances that would involve a capital. Probably don't have many of the latter though, so standardizing in lowecase is likely fine.
Starting with a lowercase letter feels more UNIX-like and helps if the message starts with a lowercase identifier, but if some apps use the text on their own, starting with uppercase would be more consistent.
b) should identifiers such as filenames, paths, XML tags, JSON fields, etc be always quoted?Generally user data that may go missing should be quoted because it makes it more obvious when there is an accidentally empty string provided. I've gone back to add quotes every time I've debugged a problem where the empty string was involved. To make it easier as a policy, it is fine to expand that to all filenames/path, regardless of whether they come from the user data or not. For XML / JSON field names, if it is just a bare word, then I'd probably suggest quoting too, as some field names could accidentally lead to grammatically correct but misleading error messages if unquoted.c) which verb to use when something failed? "could not" is a subjective thing, not a past action; "failed" seems to imply that something was attempted; "did not" seems to imply that it was not done, but nothing whether it was attempted; the rest sort of indicate the ability to do something.
This one seems like more complicated question than the others and should not let us from e.g. quoting the identifiers first. Jano
I don't especially care which we use, as long as we're pretty consistent. Perhaps the thing todo is just see which is the most popular usage today, so we invalidate the fewest translations when changing.d) allow contractions or not? They are generally used in spoken/informal language, and while libvirt is not that formal it should not be that colloquial either IMHO; also, they make the text slightly harder to understand by non-native speakers, and they are lost when translating. A POV on the matter is: https://www.businesswritingblog.com/business_writing/2006/04/dont_use_contra.htmlYeah, I think I've seen enough recommendations about not using contractions, that we should apply that rule.e) which message to use to indicate that QEMU does not support something?I don't have a strong preference. Perhaps again just let a popularity contest decide it. I wonder if there's any clever python code we can pull in that reports on "similar" strings that we could usefully run across the pot file to identify candidates for sanitizing. Also if there are many cases where we use roughly the same string message, then that's a candidate for creating a wrapper function to standardize on message text. eg we added a virReportEnumRangeError() so that we got guaranteed identical error messages for all enum range problems. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Attachment:
signature.asc
Description: PGP signature