Re: [PATCH] x86/kvm: fix condition to update kvm master clocks

Roman Kagan <rkagan@xxxxxxxxxxxxx> · Fri, 27 May 2016 21:46:40 +0300

On Fri, May 27, 2016 at 08:11:40PM +0200, Radim Krčmář wrote:
> 2016-05-27 20:28+0300, Roman Kagan:
> >> Queueing unconditionally seems to be the correct thing to do.
> > 
> > The notifier is registered at kvm module init, so the work will be
> > scheduled even when there are no VMs at all.
> 
> Good point, we don't want to call pvclock_gtod_notify in that case
> either.  Registering (unregistering) with the first (last) VM should be
> good enough ... what about adding something based on this?
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 37af23052470..0779f0f01523 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -655,6 +655,8 @@ static struct kvm *kvm_create_vm(unsigned long type)
>  		goto out_err;
>  
>  	spin_lock(&kvm_lock);
> +	if (list_empty(&kvm->vm_list))
> +		kvm_arch_create_first_vm(kvm);
>  	list_add(&kvm->vm_list, &vm_list);
>  	spin_unlock(&kvm_lock);
>  
> @@ -709,6 +711,8 @@ static void kvm_destroy_vm(struct kvm *kvm)
>  	kvm_arch_sync_events(kvm);
>  	spin_lock(&kvm_lock);
>  	list_del(&kvm->vm_list);
> +	if (list_empty(&kvm->vm_list))
> +		kvm_arch_destroy_last_vm(kvm);
>  	spin_unlock(&kvm_lock);
>  	kvm_free_irq_routing(kvm);
>  	for (i = 0; i < KVM_NR_BUSES; i++)

Makes perfect sense IMO.

> >> Interaction between kvm_gen_update_masterclock(), pvclock_gtod_work(),
> >> and NTP could be a problem:  kvm_gen_update_masterclock() only has to
> >> run once per VM, but pvclock_gtod_work() calls it on every VCPU, so
> >> frequent NTP updates on bigger guests could kill performance.
> > 
> > Unfortunately, things are worse than that: this stuff is updated on
> > every *tick* on the timekeeping CPU, so, as long as you keep at least
> > one of your CPUs busy, the update rate can reach HZ.  The frequency of
> > NTP updates is unimportant; it happens without NTP updates at all.
> > 
> > So I tend to agree that we're perhaps better off not fixing this bug and
> > leaving the kvm_clocks to drift until we figure out how to do it with
> > acceptable overhead.
> 
> Yuck ... the hunk below could help a bit.
> I haven't checked if the timekeeping code updates gtod and therefore
> sets 'was_set' even when the resulting time hasn't changed, so we might
> need to do more to avoid useless situations.
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a8c7ca34ee5d..37ed0a342bf1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5802,12 +5802,15 @@ static DECLARE_WORK(pvclock_gtod_work, pvclock_gtod_update_fn);
>  /*
>   * Notification about pvclock gtod data update.
>   */
> -static int pvclock_gtod_notify(struct notifier_block *nb, unsigned long unused,
> +static int pvclock_gtod_notify(struct notifier_block *nb, unsigned long was_set,
>  			       void *priv)
>  {
>  	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
>  	struct timekeeper *tk = priv;
>  
> +	if (!was_set)
> +		return 0;
> +
>  	update_pvclock_gtod(tk);
>  

Nope, this parameter is only set when there's a step-like change in the
time.  The timekeeper itself is always updated.  I guess we could
mitigate the costs somewhat if we skipped updating the gtod copy until
the accumulated error reaches certain limit; not sure if that's gonna
help though.

Roman.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html