Re: [PATCH v2] qemu: Introduce caching whether /dev/kvm is accessible

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 01, 2018 at 09:31 AM +0100, Martin Kletzander <mkletzan@xxxxxxxxxx> wrote:
> On Tue, Oct 30, 2018 at 03:07:59PM +0000, Daniel P. Berrangé wrote:
>>On Tue, Oct 30, 2018 at 03:45:36PM +0100, Michal Privoznik wrote:
>>> On 10/30/2018 02:46 PM, Michal Privoznik wrote:
>>> > On 10/30/2018 01:55 PM, Daniel P. Berrangé wrote:
>>> >> On Tue, Oct 30, 2018 at 10:32:08AM +0000, Daniel P. Berrangé wrote:
>>> >>> On Tue, Oct 30, 2018 at 11:08:45AM +0100, Michal Privoznik wrote:
>>> >>>> On 10/30/2018 10:35 AM, Daniel P. Berrangé wrote:
>>> >>>>> On Tue, Oct 30, 2018 at 09:13:50AM +0100, Michal Privoznik wrote:
>>> >>>>>> On 10/29/2018 06:34 PM, Marc Hartmayer wrote:
>>> >>>>>>> Introduce caching whether /dev/kvm is usable as the QEMU user:QEMU
>>> >>>>>>> group. This reduces the overhead of the QEMU capabilities cache
>>> >>>>>>> lookup. Before this patch there were many fork() calls used for
>>> >>>>>>> checking whether /dev/kvm is accessible. Now we store the result
>>> >>>>>>> whether /dev/kvm is accessible or not and we only need to re-run the
>>> >>>>>>> virFileAccessibleAs check if the ctime of /dev/kvm has changed.
>>> >>>>>>>
>>> >>>>>>> Suggested-by: Daniel P. Berrangé <berrange@xxxxxxxxxx>
>>> >>>>>>> Signed-off-by: Marc Hartmayer <mhartmay@xxxxxxxxxxxxx>
>>> >>>>>>> ---
>>> >>>>>>>  src/qemu/qemu_capabilities.c | 54 ++++++++++++++++++++++++++++++++++--
>>> >>>>>>>  1 file changed, 52 insertions(+), 2 deletions(-)
>>> >>>>>>>
>>> >>>>>>> diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
>>> >>>>>>> index e228f52ec0bb..85516954149b 100644
>>> >>>>>>> --- a/src/qemu/qemu_capabilities.c
>>> >>>>>>> +++ b/src/qemu/qemu_capabilities.c
>>> >>>>>>> @@ -3238,6 +3238,10 @@ struct _virQEMUCapsCachePriv {
>>> >>>>>>>      virArch hostArch;
>>> >>>>>>>      unsigned int microcodeVersion;
>>> >>>>>>>      char *kernelVersion;
>>> >>>>>>> +
>>> >>>>>>> +    /* cache whether /dev/kvm is usable as runUid:runGuid */
>>> >>>>>>> +    virTristateBool kvmUsable;
>>> >>>>>>> +    time_t kvmCtime;
>>> >>>>>>>  };
>>> >>>>>>>  typedef struct _virQEMUCapsCachePriv virQEMUCapsCachePriv;
>>> >>>>>>>  typedef virQEMUCapsCachePriv *virQEMUCapsCachePrivPtr;
>>> >>>>>>> @@ -3824,6 +3828,52 @@ virQEMUCapsSaveFile(void *data,
>>> >>>>>>>  }
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> +/* Determine whether '/dev/kvm' is usable as QEMU user:QEMU group. */
>>> >>>>>>> +static bool
>>> >>>>>>> +virQEMUCapsKVMUsable(virQEMUCapsCachePrivPtr priv)
>>> >>>>>>> +{
>>> >>>>>>> +    struct stat sb;
>>> >>>>>>> +    static const char *kvm_device = "/dev/kvm";
>>> >>>>>>> +    virTristateBool value;
>>> >>>>>>> +    virTristateBool cached_value = priv->kvmUsable;
>>> >>>>>>> +    time_t kvm_ctime;
>>> >>>>>>> +    time_t cached_kvm_ctime = priv->kvmCtime;
>>> >>>>>>> +
>>> >>>>>>> +    if (stat(kvm_device, &sb) < 0) {
>>> >>>>>>> +        virReportSystemError(errno,
>>> >>>>>>> +                             _("Failed to stat %s"), kvm_device);
>>> >>>>>>> +        return false;
>>> >>>>>>> +    }
>>> >>>>>>> +    kvm_ctime = sb.st_ctime;
>>> >>>>>>
>>> >>>>>> This doesn't feel right. /dev/kvm ctime is changed every time qemu is
>>> >>>>>> started or powered off (try running stat over it before and after a
>>> >>>>>> domain is started/shut off). So effectively we will fork more often than
>>> >>>>>> we would think. Should we cache inode number instead? Because for all
>>> >>>>>> that we care is simply if the file is there.
>>> >>>>>
>>> >>>>> Urgh, that is a bit strange and not the usual semantics for timestamps :-(
>>> >>>>
>>> >>>> Indeed.
>>> >>>>
>>> >>>>>
>>> >>>>> We can't stat the inode - the code was explicitly trying to cope with the
>>> >>>>> way /dev/kvm can change permissions when udev rules get applied. We would
>>> >>>>> have to compare the user, group, permissions mask and even ACL, or a hash
>>> >>>>> of those.
>>> >>>>
>>> >>>> Well, we can use ctime as suggested and post a patch for kernel to fix
>>> >>>> ctime behaviour. Until the patch is merged our behaviour would be
>>> >>>> suboptimal, but still better than it is now.
>>> >>>
>>> >>> I guess lets talk to KVM team for their input on this and then decide
>>> >>> what todo.
>>> >>
>>> >> Hmm, I wonder if it is not actually a kernel problem, but rather something
>>> >> in userspace genuinely touching the device in a way that caues these
>>> >> timestamps to be updated.
>>> >>
>>> >
>>> > It is kernel problem. In my testing, the moment I call:
>>> >
>>> >  ioctl(kvm, KVM_CREATE_VM, 0);
>>>
>>> Okay, I have to retract this claim. 'udevadm monitor' shows some events:
>>>
>>> KERNEL[3631.129645] change   /devices/virtual/misc/kvm (misc)
>>> UDEV  [3631.130816] change   /devices/virtual/misc/kvm (misc)
>>>
>>> and stopping udevd leaves all three times untouched. So it is udev after
>>> all. I just don't know how to find the rule that is causing the issue.
>>> Anyway, as for this patch, I think we can merge it in the end, can't we?
>>
>>Not really. Udev is in use everywhere, so this behaviour makes the
>>patch useless in practice, even though it is technically right in
>>theory :-(
>>
>
> Does it?  With this behaviour we still do the "expensive" work after any machine
> has started.  But for one machine starting it still has the effect of running it
> only once.  And we *need* to run it for each machine unless we also cache the
> result per (at least) user:group of that machine as every machine can run under
> different user:group.

How can you run a machine/QEMU VM under a different user:group other
than changing the user:group in qemu.conf and restart/reload libvirtd?

As soon as a VM is running we have not to verify /dev/kvm access, no?
(so there should be no problem when libvirtd tries to “reconnect” to
already running VMs).

>
> I'll go through the patch again (just skimmed it the first time), but I think
> this actually still makes it better (although not perfect).  I wonder why the
> udev rule does not fire only on change (why doesn't it say ACTION=="change").
>
>>Regards,
>>Daniel
>>--
>>|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
>>|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
>>|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>>
>>--
>>libvir-list mailing list
>>libvir-list@xxxxxxxxxx
>>https://www.redhat.com/mailman/listinfo/libvir-list
--
Kind regards / Beste Grüße
   Marc Hartmayer

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294


--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux