Re: [PATCH v2 1/3] conf: Properly truncate wide character names in virDomainObjGetShortName

John Ferlan <jferlan@xxxxxxxxxx> · Fri, 25 Aug 2017 18:03:15 -0400

On 08/25/2017 07:21 AM, Martin Kletzander wrote:
> We always truncated the name at 20 bytes instead of characters.  In
> case 20 bytes were in the middle of a multi-byte character, then the
> string became invalid and various parts of the code would error
> out (e.g. XML parsing of that string).  Let's instead properly
> truncate it after 20 characters instead.
> 
> We cannot test this in our test suite because we would need to know
> what locales are installed on the system where the tests are ran and
> if there is supported one (most probably there will be, but we cannot
> be 100% sure), we could initialize gettext in qemuxml2argvtest, but
> there would still be a chance of getting two different (both valid,
> though) results.

Yeah - I tried - it got ugly fast.... Although I suppose in a test
environment would could just "pick" a charset like en_US.UTF-8 and just
ensure that things work as expected with that. Perhaps even make the
shortened name tests go at the bottom with a FAILURE to compare argv
results before setting the locale and a success after a call on the same
test by name. Not something for this series because I'm not even sure it
would work properly.  Maybe something for the byte sized tasks (rather
literally too ;-)!)

> 
> In order to test this it is enough to start a machine with a name for
> which trimming it after 20 bytes would create invalid sequence (e.g.
> 1234567890123456789č where č is any multi-byte character).  Then start
> the domain and restart libvirtd.  The domain would disappear because
> such illegal sequence will not go through the XML parser.  And that's
> not a bug of the parser, it should not be in the XML in the first
> place, but since we don't use any sophisticated formatter, just
> mash some strings together, the formatting succeeds.

If the domain was started before these patches, then the domain is still
hidden since the shortened path names won't match... Such is life...

> 
> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1448766
> 
> Signed-off-by: Martin Kletzander <mkletzan@xxxxxxxxxx>
> ---
>  src/conf/domain_conf.c | 45 ++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 42 insertions(+), 3 deletions(-)
> 
> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
> index 47eba4dbb315..dd73158f028b 100644
> --- a/src/conf/domain_conf.c
> +++ b/src/conf/domain_conf.c
> @@ -27131,6 +27131,8 @@ virDomainDefHasMemballoon(const virDomainDef *def)
>  }
>  
>  
> +#define VIR_DOMAIN_SHORT_NAME_MAX 20
> +
>  /**
>   * virDomainObjGetShortName:
>   * @vm: Machine for which to get a name
> @@ -27141,15 +27143,52 @@ virDomainDefHasMemballoon(const virDomainDef *def)
>  char *
>  virDomainObjGetShortName(const virDomainDef *def)
>  {
> -    const int dommaxlen = 20;
> +    wchar_t wshortname[VIR_DOMAIN_SHORT_NAME_MAX + 1] = {0};
> +    size_t len = 0;
> +    char *shortname = NULL;
>      char *ret = NULL;
>  
> -    ignore_value(virAsprintf(&ret, "%d-%.*s",
> -                             def->id, dommaxlen, def->name));
> +    /* No need to do the whole conversion thing when there are no multibyte
> +     * characters.  The same applies for illegal sequences as they can occur
> +     * with incompatible locales. */
> +    len = mbstowcs(NULL, def->name, 0);
> +    if ((len == (size_t) -1 && errno == EILSEQ) ||
> +        len == strlen(def->name)) {
> +        ignore_value(virAsprintf(&ret, "%d-%.*s", def->id,
> +                                 VIR_DOMAIN_SHORT_NAME_MAX, def->name));
> +        return ret;

consistently speaking

    return NULL;

> +    }
> +
> +    if (len == (size_t) -1 ||
> +        mbstowcs(wshortname, def->name, VIR_DOMAIN_SHORT_NAME_MAX) == (size_t) -1) {
> +        virReportError(VIR_ERR_INVALID_ARG, "%s",
> +                       _("Cannot convert domain name to wide character string"));

You could go with something like :

virReportSystemError(errno,
      _("Cannot convert domain name to wide character string, len=%zd"),
      len);

That way you'd know if you failed because len == -1 or the second
mbstowcs call failed and you'd know why. Not sure ->name at this point
could be considered an invalid argument, but getting the system error
would certainly point you in the right direction.

FWIW:
There may be an off by 1 error. With your this, 1234567890123456789č
would be shortened to 19 characters (it's at 21), thus if you have two
domains:

1234567890123456789č

and

1234567890123456789

They'd return the same answer

Of course so would 123456789012345678901 and 1234567890123456789012, but
at least for those two the length is > 20.  Not sure if it matters or
not as 19 or 20 characters for uniqueness is a lot.

Still the bz had 7 multibyte characters that took up 21 char's - so it
would seem things would be more limited with that charset. If that last
character was the differentiator between all domains, then the short
name is too short using only 18 chars that appear as 6 multibyte chars
for the short name.

Likewise a name using fully wide chars, e.g. ÄèÎØÛÄèÎØÛÄèÎØÛÄèÎØÛÄèÎØÛ
would be shortened to ÄèÎØÛÄèÎØÛ.

At this point I'm ambivalent as to whether that's considered a problem.

> +        return NULL;
> +    }
> +
> +    len = wcstombs(NULL, wshortname, 0);
> +    if (len == (size_t) -1)
> +        return NULL;

Would returning NULL here elicit a useful error message?

>  
> +    if (len > VIR_DOMAIN_SHORT_NAME_MAX)
> +        len = VIR_DOMAIN_SHORT_NAME_MAX;

Given a name fully using a wide character set if this were altered to:

    if (len > VIR_DOMAIN_SHORT_NAME_MAX*2)
        len = VIR_DOMAIN_SHORT_NAME_MAX*2;

Then ÄèÎØÛÄèÎØÛÄèÎØÛÄèÎØÛÄèÎØÛ shortens to ÄèÎØÛÄèÎØÛÄèÎØÛÄèÎØÛ, but
1234567890123456789č! doesn't get shortened to 1234567890123456789č. The
multibyte charsets would need what * 3?  It's somewhat ludicrous - but
why I suggested originally to compare strlen() and mbstowcs() - if
they're the same, less then 20, or having an error on the mbstowcs(),
then just stick with the old logic.  Only fall into the new logic when
wide or multibyte chars are present.

Of course the really harsh route is to count the width of each character
in the array using mblen, then somehow mathematically get a result of 20
"printed characters" of uniqueness.  Again, IDC really, but since I was
looking and thinking about it I figured I'd at least mention it.

So, I think the only required thing to fix would be the NULL return if
len == -1 without an error message.  Beyond that altering the return to
be NULL instead of ret and altering to use virReportSystemError are may
as well do it since you need to fix the first - with that done:

Reviewed-by: John Ferlan <jferlan@xxxxxxxxxx>

John

Whether the shortened name displayed "apparent" length rabbit hole needs
adjustment - is up to you. I'm fine with it the way it is. I don't
believe we ever documented our shortening "rules", but wide and
multichar wide charsets really shorten things a lot.  Perhaps that's a
cross that bridge when someone complains type thing!

> +
> +    if (VIR_ALLOC_N(shortname, len + 1) < 0)
> +        return NULL;
> +
> +    if (wcstombs(shortname, wshortname, len) == (size_t) -1) {
> +        virReportError(VIR_ERR_INVALID_ARG, "%s",
> +                       _("Cannot convert domain name from wide character string"));
> +        goto cleanup;
> +    }
> +
> +    ignore_value(virAsprintf(&ret, "%d-%s", def->id, shortname));
> + cleanup:
> +    VIR_FREE(shortname);
>      return ret;
>  }
>  
> +#undef VIR_DOMAIN_SHORT_NAME_MAX
>  
>  int
>  virDomainGetBlkioParametersAssignFromDef(virDomainDefPtr def,
> 

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list