On Thu, Nov 01, 2018 at 09:31 AM +0100, Martin Kletzander <mkletzan@xxxxxxxxxx> wrote: > On Tue, Oct 30, 2018 at 03:07:59PM +0000, Daniel P. Berrangé wrote: >>On Tue, Oct 30, 2018 at 03:45:36PM +0100, Michal Privoznik wrote: >>> On 10/30/2018 02:46 PM, Michal Privoznik wrote: >>> > On 10/30/2018 01:55 PM, Daniel P. Berrangé wrote: >>> >> On Tue, Oct 30, 2018 at 10:32:08AM +0000, Daniel P. Berrangé wrote: >>> >>> On Tue, Oct 30, 2018 at 11:08:45AM +0100, Michal Privoznik wrote: >>> >>>> On 10/30/2018 10:35 AM, Daniel P. Berrangé wrote: >>> >>>>> On Tue, Oct 30, 2018 at 09:13:50AM +0100, Michal Privoznik wrote: >>> >>>>>> On 10/29/2018 06:34 PM, Marc Hartmayer wrote: >>> >>>>>>> Introduce caching whether /dev/kvm is usable as the QEMU user:QEMU >>> >>>>>>> group. This reduces the overhead of the QEMU capabilities cache >>> >>>>>>> lookup. Before this patch there were many fork() calls used for >>> >>>>>>> checking whether /dev/kvm is accessible. Now we store the result >>> >>>>>>> whether /dev/kvm is accessible or not and we only need to re-run the >>> >>>>>>> virFileAccessibleAs check if the ctime of /dev/kvm has changed. >>> >>>>>>> >>> >>>>>>> Suggested-by: Daniel P. Berrangé <berrange@xxxxxxxxxx> >>> >>>>>>> Signed-off-by: Marc Hartmayer <mhartmay@xxxxxxxxxxxxx> >>> >>>>>>> --- >>> >>>>>>> src/qemu/qemu_capabilities.c | 54 ++++++++++++++++++++++++++++++++++-- >>> >>>>>>> 1 file changed, 52 insertions(+), 2 deletions(-) >>> >>>>>>> >>> >>>>>>> diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c >>> >>>>>>> index e228f52ec0bb..85516954149b 100644 >>> >>>>>>> --- a/src/qemu/qemu_capabilities.c >>> >>>>>>> +++ b/src/qemu/qemu_capabilities.c >>> >>>>>>> @@ -3238,6 +3238,10 @@ struct _virQEMUCapsCachePriv { >>> >>>>>>> virArch hostArch; >>> >>>>>>> unsigned int microcodeVersion; >>> >>>>>>> char *kernelVersion; >>> >>>>>>> + >>> >>>>>>> + /* cache whether /dev/kvm is usable as runUid:runGuid */ >>> >>>>>>> + virTristateBool kvmUsable; >>> >>>>>>> + time_t kvmCtime; >>> >>>>>>> }; >>> >>>>>>> typedef struct _virQEMUCapsCachePriv virQEMUCapsCachePriv; >>> >>>>>>> typedef virQEMUCapsCachePriv *virQEMUCapsCachePrivPtr; >>> >>>>>>> @@ -3824,6 +3828,52 @@ virQEMUCapsSaveFile(void *data, >>> >>>>>>> } >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> +/* Determine whether '/dev/kvm' is usable as QEMU user:QEMU group. */ >>> >>>>>>> +static bool >>> >>>>>>> +virQEMUCapsKVMUsable(virQEMUCapsCachePrivPtr priv) >>> >>>>>>> +{ >>> >>>>>>> + struct stat sb; >>> >>>>>>> + static const char *kvm_device = "/dev/kvm"; >>> >>>>>>> + virTristateBool value; >>> >>>>>>> + virTristateBool cached_value = priv->kvmUsable; >>> >>>>>>> + time_t kvm_ctime; >>> >>>>>>> + time_t cached_kvm_ctime = priv->kvmCtime; >>> >>>>>>> + >>> >>>>>>> + if (stat(kvm_device, &sb) < 0) { >>> >>>>>>> + virReportSystemError(errno, >>> >>>>>>> + _("Failed to stat %s"), kvm_device); >>> >>>>>>> + return false; >>> >>>>>>> + } >>> >>>>>>> + kvm_ctime = sb.st_ctime; >>> >>>>>> >>> >>>>>> This doesn't feel right. /dev/kvm ctime is changed every time qemu is >>> >>>>>> started or powered off (try running stat over it before and after a >>> >>>>>> domain is started/shut off). So effectively we will fork more often than >>> >>>>>> we would think. Should we cache inode number instead? Because for all >>> >>>>>> that we care is simply if the file is there. >>> >>>>> >>> >>>>> Urgh, that is a bit strange and not the usual semantics for timestamps :-( >>> >>>> >>> >>>> Indeed. >>> >>>> >>> >>>>> >>> >>>>> We can't stat the inode - the code was explicitly trying to cope with the >>> >>>>> way /dev/kvm can change permissions when udev rules get applied. We would >>> >>>>> have to compare the user, group, permissions mask and even ACL, or a hash >>> >>>>> of those. >>> >>>> >>> >>>> Well, we can use ctime as suggested and post a patch for kernel to fix >>> >>>> ctime behaviour. Until the patch is merged our behaviour would be >>> >>>> suboptimal, but still better than it is now. >>> >>> >>> >>> I guess lets talk to KVM team for their input on this and then decide >>> >>> what todo. >>> >> >>> >> Hmm, I wonder if it is not actually a kernel problem, but rather something >>> >> in userspace genuinely touching the device in a way that caues these >>> >> timestamps to be updated. >>> >> >>> > >>> > It is kernel problem. In my testing, the moment I call: >>> > >>> > ioctl(kvm, KVM_CREATE_VM, 0); >>> >>> Okay, I have to retract this claim. 'udevadm monitor' shows some events: >>> >>> KERNEL[3631.129645] change /devices/virtual/misc/kvm (misc) >>> UDEV [3631.130816] change /devices/virtual/misc/kvm (misc) >>> >>> and stopping udevd leaves all three times untouched. So it is udev after >>> all. I just don't know how to find the rule that is causing the issue. >>> Anyway, as for this patch, I think we can merge it in the end, can't we? >> >>Not really. Udev is in use everywhere, so this behaviour makes the >>patch useless in practice, even though it is technically right in >>theory :-( >> > > Does it? With this behaviour we still do the "expensive" work after any machine > has started. But for one machine starting it still has the effect of running it > only once. And we *need* to run it for each machine unless we also cache the > result per (at least) user:group of that machine as every machine can run under > different user:group. How can you run a machine/QEMU VM under a different user:group other than changing the user:group in qemu.conf and restart/reload libvirtd? As soon as a VM is running we have not to verify /dev/kvm access, no? (so there should be no problem when libvirtd tries to “reconnect” to already running VMs). > > I'll go through the patch again (just skimmed it the first time), but I think > this actually still makes it better (although not perfect). I wonder why the > udev rule does not fire only on change (why doesn't it say ACTION=="change"). > >>Regards, >>Daniel >>-- >>|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| >>|: https://libvirt.org -o- https://fstop138.berrange.com :| >>|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| >> >>-- >>libvir-list mailing list >>libvir-list@xxxxxxxxxx >>https://www.redhat.com/mailman/listinfo/libvir-list -- Kind regards / Beste Grüße Marc Hartmayer IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list