On 08/25/2017 07:21 AM, Martin Kletzander wrote: > We always truncated the name at 20 bytes instead of characters. In > case 20 bytes were in the middle of a multi-byte character, then the > string became invalid and various parts of the code would error > out (e.g. XML parsing of that string). Let's instead properly > truncate it after 20 characters instead. > > We cannot test this in our test suite because we would need to know > what locales are installed on the system where the tests are ran and > if there is supported one (most probably there will be, but we cannot > be 100% sure), we could initialize gettext in qemuxml2argvtest, but > there would still be a chance of getting two different (both valid, > though) results. Yeah - I tried - it got ugly fast.... Although I suppose in a test environment would could just "pick" a charset like en_US.UTF-8 and just ensure that things work as expected with that. Perhaps even make the shortened name tests go at the bottom with a FAILURE to compare argv results before setting the locale and a success after a call on the same test by name. Not something for this series because I'm not even sure it would work properly. Maybe something for the byte sized tasks (rather literally too ;-)!) > > In order to test this it is enough to start a machine with a name for > which trimming it after 20 bytes would create invalid sequence (e.g. > 1234567890123456789č where č is any multi-byte character). Then start > the domain and restart libvirtd. The domain would disappear because > such illegal sequence will not go through the XML parser. And that's > not a bug of the parser, it should not be in the XML in the first > place, but since we don't use any sophisticated formatter, just > mash some strings together, the formatting succeeds. If the domain was started before these patches, then the domain is still hidden since the shortened path names won't match... Such is life... > > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1448766 > > Signed-off-by: Martin Kletzander <mkletzan@xxxxxxxxxx> > --- > src/conf/domain_conf.c | 45 ++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 42 insertions(+), 3 deletions(-) > > diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c > index 47eba4dbb315..dd73158f028b 100644 > --- a/src/conf/domain_conf.c > +++ b/src/conf/domain_conf.c > @@ -27131,6 +27131,8 @@ virDomainDefHasMemballoon(const virDomainDef *def) > } > > > +#define VIR_DOMAIN_SHORT_NAME_MAX 20 > + > /** > * virDomainObjGetShortName: > * @vm: Machine for which to get a name > @@ -27141,15 +27143,52 @@ virDomainDefHasMemballoon(const virDomainDef *def) > char * > virDomainObjGetShortName(const virDomainDef *def) > { > - const int dommaxlen = 20; > + wchar_t wshortname[VIR_DOMAIN_SHORT_NAME_MAX + 1] = {0}; > + size_t len = 0; > + char *shortname = NULL; > char *ret = NULL; > > - ignore_value(virAsprintf(&ret, "%d-%.*s", > - def->id, dommaxlen, def->name)); > + /* No need to do the whole conversion thing when there are no multibyte > + * characters. The same applies for illegal sequences as they can occur > + * with incompatible locales. */ > + len = mbstowcs(NULL, def->name, 0); > + if ((len == (size_t) -1 && errno == EILSEQ) || > + len == strlen(def->name)) { > + ignore_value(virAsprintf(&ret, "%d-%.*s", def->id, > + VIR_DOMAIN_SHORT_NAME_MAX, def->name)); > + return ret; consistently speaking return NULL; > + } > + > + if (len == (size_t) -1 || > + mbstowcs(wshortname, def->name, VIR_DOMAIN_SHORT_NAME_MAX) == (size_t) -1) { > + virReportError(VIR_ERR_INVALID_ARG, "%s", > + _("Cannot convert domain name to wide character string")); You could go with something like : virReportSystemError(errno, _("Cannot convert domain name to wide character string, len=%zd"), len); That way you'd know if you failed because len == -1 or the second mbstowcs call failed and you'd know why. Not sure ->name at this point could be considered an invalid argument, but getting the system error would certainly point you in the right direction. FWIW: There may be an off by 1 error. With your this, 1234567890123456789č would be shortened to 19 characters (it's at 21), thus if you have two domains: 1234567890123456789č and 1234567890123456789 They'd return the same answer Of course so would 123456789012345678901 and 1234567890123456789012, but at least for those two the length is > 20. Not sure if it matters or not as 19 or 20 characters for uniqueness is a lot. Still the bz had 7 multibyte characters that took up 21 char's - so it would seem things would be more limited with that charset. If that last character was the differentiator between all domains, then the short name is too short using only 18 chars that appear as 6 multibyte chars for the short name. Likewise a name using fully wide chars, e.g. ÄèÎØÛÄèÎØÛÄèÎØÛÄèÎØÛÄèÎØÛ would be shortened to ÄèÎØÛÄèÎØÛ. At this point I'm ambivalent as to whether that's considered a problem. > + return NULL; > + } > + > + len = wcstombs(NULL, wshortname, 0); > + if (len == (size_t) -1) > + return NULL; Would returning NULL here elicit a useful error message? > > + if (len > VIR_DOMAIN_SHORT_NAME_MAX) > + len = VIR_DOMAIN_SHORT_NAME_MAX; Given a name fully using a wide character set if this were altered to: if (len > VIR_DOMAIN_SHORT_NAME_MAX*2) len = VIR_DOMAIN_SHORT_NAME_MAX*2; Then ÄèÎØÛÄèÎØÛÄèÎØÛÄèÎØÛÄèÎØÛ shortens to ÄèÎØÛÄèÎØÛÄèÎØÛÄèÎØÛ, but 1234567890123456789č! doesn't get shortened to 1234567890123456789č. The multibyte charsets would need what * 3? It's somewhat ludicrous - but why I suggested originally to compare strlen() and mbstowcs() - if they're the same, less then 20, or having an error on the mbstowcs(), then just stick with the old logic. Only fall into the new logic when wide or multibyte chars are present. Of course the really harsh route is to count the width of each character in the array using mblen, then somehow mathematically get a result of 20 "printed characters" of uniqueness. Again, IDC really, but since I was looking and thinking about it I figured I'd at least mention it. So, I think the only required thing to fix would be the NULL return if len == -1 without an error message. Beyond that altering the return to be NULL instead of ret and altering to use virReportSystemError are may as well do it since you need to fix the first - with that done: Reviewed-by: John Ferlan <jferlan@xxxxxxxxxx> John Whether the shortened name displayed "apparent" length rabbit hole needs adjustment - is up to you. I'm fine with it the way it is. I don't believe we ever documented our shortening "rules", but wide and multichar wide charsets really shorten things a lot. Perhaps that's a cross that bridge when someone complains type thing! > + > + if (VIR_ALLOC_N(shortname, len + 1) < 0) > + return NULL; > + > + if (wcstombs(shortname, wshortname, len) == (size_t) -1) { > + virReportError(VIR_ERR_INVALID_ARG, "%s", > + _("Cannot convert domain name from wide character string")); > + goto cleanup; > + } > + > + ignore_value(virAsprintf(&ret, "%d-%s", def->id, shortname)); > + cleanup: > + VIR_FREE(shortname); > return ret; > } > > +#undef VIR_DOMAIN_SHORT_NAME_MAX > > int > virDomainGetBlkioParametersAssignFromDef(virDomainDefPtr def, > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list