Re: [PATCH] x86/kvm: fix condition to update kvm master clocks

Radim Krčmář <rkrcmar@xxxxxxxxxx> · Fri, 27 May 2016 20:11:40 +0200

2016-05-27 20:28+0300, Roman Kagan:
> On Thu, May 26, 2016 at 10:19:36PM +0200, Radim Krčmář wrote:
>> >  	    atomic_read(&kvm_guest_has_master_clock) != 0)
>> 
>> And I don't see why we don't want to enable master clock if the host
>> switches back to TSC.
> 
> Agreed (even though I guess it's not very likely: AFAICS once switched
> to a different clocksource, the host can switch back to TSC only upon
> human manipulating /sys/devices/system/clocksource).

Yeah, it's a corner case.  Human would have to switch from tsc as well,
automatic switch happens only when tsc is not useable anymore, AFAIK.

>> >  		queue_work(system_long_wq, &pvclock_gtod_work);
>> 
>> Queueing unconditionally seems to be the correct thing to do.
> 
> The notifier is registered at kvm module init, so the work will be
> scheduled even when there are no VMs at all.

Good point, we don't want to call pvclock_gtod_notify in that case
either.  Registering (unregistering) with the first (last) VM should be
good enough ... what about adding something based on this?

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 37af23052470..0779f0f01523 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -655,6 +655,8 @@ static struct kvm *kvm_create_vm(unsigned long type)
 		goto out_err;
 
 	spin_lock(&kvm_lock);
+	if (list_empty(&kvm->vm_list))
+		kvm_arch_create_first_vm(kvm);
 	list_add(&kvm->vm_list, &vm_list);
 	spin_unlock(&kvm_lock);
 
@@ -709,6 +711,8 @@ static void kvm_destroy_vm(struct kvm *kvm)
 	kvm_arch_sync_events(kvm);
 	spin_lock(&kvm_lock);
 	list_del(&kvm->vm_list);
+	if (list_empty(&kvm->vm_list))
+		kvm_arch_destroy_last_vm(kvm);
 	spin_unlock(&kvm_lock);
 	kvm_free_irq_routing(kvm);
 	for (i = 0; i < KVM_NR_BUSES; i++)

>> Interaction between kvm_gen_update_masterclock(), pvclock_gtod_work(),
>> and NTP could be a problem:  kvm_gen_update_masterclock() only has to
>> run once per VM, but pvclock_gtod_work() calls it on every VCPU, so
>> frequent NTP updates on bigger guests could kill performance.
> 
> Unfortunately, things are worse than that: this stuff is updated on
> every *tick* on the timekeeping CPU, so, as long as you keep at least
> one of your CPUs busy, the update rate can reach HZ.  The frequency of
> NTP updates is unimportant; it happens without NTP updates at all.
> 
> So I tend to agree that we're perhaps better off not fixing this bug and
> leaving the kvm_clocks to drift until we figure out how to do it with
> acceptable overhead.

Yuck ... the hunk below could help a bit.
I haven't checked if the timekeeping code updates gtod and therefore
sets 'was_set' even when the resulting time hasn't changed, so we might
need to do more to avoid useless situations.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a8c7ca34ee5d..37ed0a342bf1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5802,12 +5802,15 @@ static DECLARE_WORK(pvclock_gtod_work, pvclock_gtod_update_fn);
 /*
  * Notification about pvclock gtod data update.
  */
-static int pvclock_gtod_notify(struct notifier_block *nb, unsigned long unused,
+static int pvclock_gtod_notify(struct notifier_block *nb, unsigned long was_set,
 			       void *priv)
 {
 	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
 	struct timekeeper *tk = priv;
 
+	if (!was_set)
+		return 0;
+
 	update_pvclock_gtod(tk);
 
 	/* disable master clock if host does not trust, or does not
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html