On Fri, Aug 25, 2017 at 06:03:15PM -0400, John Ferlan wrote:
On 08/25/2017 07:21 AM, Martin Kletzander wrote:We always truncated the name at 20 bytes instead of characters. In case 20 bytes were in the middle of a multi-byte character, then the string became invalid and various parts of the code would error out (e.g. XML parsing of that string). Let's instead properly truncate it after 20 characters instead. We cannot test this in our test suite because we would need to know what locales are installed on the system where the tests are ran and if there is supported one (most probably there will be, but we cannot be 100% sure), we could initialize gettext in qemuxml2argvtest, but there would still be a chance of getting two different (both valid, though) results.Yeah - I tried - it got ugly fast.... Although I suppose in a test environment would could just "pick" a charset like en_US.UTF-8 and just ensure that things work as expected with that. Perhaps even make the shortened name tests go at the bottom with a FAILURE to compare argv results before setting the locale and a success after a call on the same test by name. Not something for this series because I'm not even sure it would work properly. Maybe something for the byte sized tasks (rather literally too ;-)!)
The problem is that we can't pick any charset except "C", which is the default. There is no guarantee that you will have the charset installed in the system. For example I only have en_GB.utf8, cs_CZ.utf8 and the defaults, C and POSIX. We could, maybe, programmatically construct the locale in the test itself, but that's way big of a hammer IMO.
In order to test this it is enough to start a machine with a name for which trimming it after 20 bytes would create invalid sequence (e.g. 1234567890123456789č where č is any multi-byte character). Then start the domain and restart libvirtd. The domain would disappear because such illegal sequence will not go through the XML parser. And that's not a bug of the parser, it should not be in the XML in the first place, but since we don't use any sophisticated formatter, just mash some strings together, the formatting succeeds.If the domain was started before these patches, then the domain is still hidden since the shortened path names won't match... Such is life...
Such domain will never be loaded. It's not because the paths won't match. We don't generate the paths for running domains, but we get them from the state XML (that's needed so that we can change the way we shorten characters and still be able to find domains with older "shortening rules"). The problem with the domains that get lost is that the XML consists of an invalid string literal and it is already saved on the disk. The parser (libxml2) will fail (rightfully) with an error that such XML is invalid. We can't do anything with that. And, honestly, I can't say I care that much about that as this really isn't someone would notice in production.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1448766 Signed-off-by: Martin Kletzander <mkletzan@xxxxxxxxxx> --- src/conf/domain_conf.c | 45 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 42 insertions(+), 3 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 47eba4dbb315..dd73158f028b 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -27131,6 +27131,8 @@ virDomainDefHasMemballoon(const virDomainDef *def) } +#define VIR_DOMAIN_SHORT_NAME_MAX 20 + /** * virDomainObjGetShortName: * @vm: Machine for which to get a name @@ -27141,15 +27143,52 @@ virDomainDefHasMemballoon(const virDomainDef *def) char * virDomainObjGetShortName(const virDomainDef *def) { - const int dommaxlen = 20; + wchar_t wshortname[VIR_DOMAIN_SHORT_NAME_MAX + 1] = {0}; + size_t len = 0; + char *shortname = NULL; char *ret = NULL; - ignore_value(virAsprintf(&ret, "%d-%.*s", - def->id, dommaxlen, def->name)); + /* No need to do the whole conversion thing when there are no multibyte + * characters. The same applies for illegal sequences as they can occur + * with incompatible locales. */ + len = mbstowcs(NULL, def->name, 0); + if ((len == (size_t) -1 && errno == EILSEQ) || + len == strlen(def->name)) { + ignore_value(virAsprintf(&ret, "%d-%.*s", def->id, + VIR_DOMAIN_SHORT_NAME_MAX, def->name)); + return ret;consistently speaking return NULL;
Not really, ret will be NULL here if and only if virAsprintf() failed. This is not an error path, but rather a fallback path if either the locale is incompatible or there are no wide characters (your suggestion). I hoped I explained that enough in the comment.
+ } + + if (len == (size_t) -1 || + mbstowcs(wshortname, def->name, VIR_DOMAIN_SHORT_NAME_MAX) == (size_t) -1) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("Cannot convert domain name to wide character string"));You could go with something like : virReportSystemError(errno, _("Cannot convert domain name to wide character string, len=%zd"), len); That way you'd know if you failed because len == -1 or the second mbstowcs call failed and you'd know why. Not sure ->name at this point could be considered an invalid argument, but getting the system error would certainly point you in the right direction.
Good point, I like that.
FWIW: There may be an off by 1 error. With your this, 1234567890123456789č would be shortened to 19 characters (it's at 21), thus if you have two domains: 1234567890123456789č and 1234567890123456789 They'd return the same answer
They won't. That's the point of this whole function. Such domains will differ by their ID that's unique, the shortened name is ID-name.
Of course so would 123456789012345678901 and 1234567890123456789012, but at least for those two the length is > 20. Not sure if it matters or not as 19 or 20 characters for uniqueness is a lot. Still the bz had 7 multibyte characters that took up 21 char's - so it would seem things would be more limited with that charset. If that last character was the differentiator between all domains, then the short name is too short using only 18 chars that appear as 6 multibyte chars for the short name.
Not really. I don't shorten the name to 20 bytes, but rather to 20 characters. Keep in mind that mbstowcs()'s third parameter is the number of wide characters that you want to convert, not the number of bytes.
Likewise a name using fully wide chars, e.g. ÄèÎØÛÄèÎØÛÄèÎØÛÄèÎØÛÄèÎØÛ would be shortened to ÄèÎØÛÄèÎØÛ. At this point I'm ambivalent as to whether that's considered a problem.
There should not be any. (after writing this sentence, the phrase "famous last words" springs to mind for an _unknown_ reason :) )
+ return NULL; + } + + len = wcstombs(NULL, wshortname, 0); + if (len == (size_t) -1) + return NULL;Would returning NULL here elicit a useful error message?
According to the manual pages there should not be any error case for this, but I'll add the error reporting, just in case.
+ if (len > VIR_DOMAIN_SHORT_NAME_MAX) + len = VIR_DOMAIN_SHORT_NAME_MAX;
Ah, the remnants of an unfinished patch... This should not be here at all.
Given a name fully using a wide character set if this were altered to: if (len > VIR_DOMAIN_SHORT_NAME_MAX*2) len = VIR_DOMAIN_SHORT_NAME_MAX*2; Then ÄèÎØÛÄèÎØÛÄèÎØÛÄèÎØÛÄèÎØÛ shortens to ÄèÎØÛÄèÎØÛÄèÎØÛÄèÎØÛ, but 1234567890123456789č! doesn't get shortened to 1234567890123456789č. The multibyte charsets would need what * 3? It's somewhat ludicrous - but why I suggested originally to compare strlen() and mbstowcs() - if they're the same, less then 20, or having an error on the mbstowcs(), then just stick with the old logic. Only fall into the new logic when wide or multibyte chars are present.
I hope this is explained by previous paragraphs.
Of course the really harsh route is to count the width of each character in the array using mblen, then somehow mathematically get a result of 20 "printed characters" of uniqueness. Again, IDC really, but since I was looking and thinking about it I figured I'd at least mention it. So, I think the only required thing to fix would be the NULL return if len == -1 without an error message. Beyond that altering the return to be NULL instead of ret and altering to use virReportSystemError are may as well do it since you need to fix the first - with that done: Reviewed-by: John Ferlan <jferlan@xxxxxxxxxx>
Thanks for the review. I'll read the other reviews in this series, fix it up and push (if appropriate).
John Whether the shortened name displayed "apparent" length rabbit hole needs adjustment - is up to you. I'm fine with it the way it is. I don't believe we ever documented our shortening "rules", but wide and multichar wide charsets really shorten things a lot. Perhaps that's a cross that bridge when someone complains type thing!+ + if (VIR_ALLOC_N(shortname, len + 1) < 0) + return NULL; + + if (wcstombs(shortname, wshortname, len) == (size_t) -1) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("Cannot convert domain name from wide character string")); + goto cleanup; + } + + ignore_value(virAsprintf(&ret, "%d-%s", def->id, shortname)); + cleanup: + VIR_FREE(shortname); return ret; } +#undef VIR_DOMAIN_SHORT_NAME_MAX int virDomainGetBlkioParametersAssignFromDef(virDomainDefPtr def,
Attachment:
signature.asc
Description: Digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list