Re: [PATCH] x86: add cpuidle_kvm driver to allow guest side halt polling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 20, 2019 at 01:51:57PM +0200, Paolo Bonzini wrote:
> On 17/05/19 19:48, Marcelo Tosatti wrote:
> > 
> > The cpuidle_kvm driver allows the guest vcpus to poll for a specified
> > amount of time before halting. This provides the following benefits
> > to host side polling:
> > 
> > 	1) The POLL flag is set while polling is performed, which allows
> > 	   a remote vCPU to avoid sending an IPI (and the associated
> >  	   cost of handling the IPI) when performing a wakeup.
> > 
> > 	2) The HLT VM-exit cost can be avoided.
> > 
> > The downside of guest side polling is that polling is performed
> > even with other runnable tasks in the host.
> > 
> > Results comparing halt_poll_ns and server/client application
> > where a small packet is ping-ponged:
> > 
> > host                                        --> 31.33	
> > halt_poll_ns=300000 / no guest busy spin    --> 33.40	(93.8%)
> > halt_poll_ns=0 / guest_halt_poll_ns=300000  --> 32.73	(95.7%)
> > 
> > For the SAP HANA benchmarks (where idle_spin is a parameter 
> > of the previous version of the patch, results should be the
> > same):
> > 
> > hpns == halt_poll_ns
> > 
> >                           idle_spin=0/   idle_spin=800/	   idle_spin=0/
> > 			  hpns=200000    hpns=0            hpns=800000
> > DeleteC06T03 (100 thread) 1.76           1.71 (-3%)        1.78	  (+1%)
> > InsertC16T02 (100 thread) 2.14     	 2.07 (-3%)        2.18   (+1.8%)
> > DeleteC00T01 (1 thread)   1.34 		 1.28 (-4.5%)	   1.29   (-3.7%)
> > UpdateC00T03 (1 thread)	  4.72		 4.18 (-12%)	   4.53   (-5%)
> 
> Hi Marcelo,
> 
> some quick observations:
> 
> 1) This is actually not KVM-specific, so the name and placement of the
> docs should be adjusted.

Agreed. Will call it: cpuidle_halt_poll, move it to drivers/cpuidle/

> 2) Regarding KVM-specific code, however, we could add an MSR so that KVM
> disables halt_poll_ns for this VM when this is active in the guest?

Sure.

> 3) The spin time could use the same adaptive algorithm that KVM uses in
> the host.

Agreed... This can be done later, i suppose (the current fixed
setting works sufficiently well for our needs).

> Thanks,
> 
> Paolo
> 
> 
> > ---
> >  Documentation/virtual/kvm/guest-halt-polling.txt |   39 ++++++++
> >  arch/x86/Kconfig                                 |    9 +
> >  arch/x86/kernel/Makefile                         |    1 
> >  arch/x86/kernel/cpuidle_kvm.c                    |  105 +++++++++++++++++++++++
> >  arch/x86/kernel/process.c                        |    2 
> >  5 files changed, 155 insertions(+), 1 deletion(-)
> > 
> > Index: linux-2.6.git/arch/x86/Kconfig
> > ===================================================================
> > --- linux-2.6.git.orig/arch/x86/Kconfig	2019-04-22 13:49:42.858303265 -0300
> > +++ linux-2.6.git/arch/x86/Kconfig	2019-05-16 14:18:41.254852745 -0300
> > @@ -805,6 +805,15 @@
> >  	  underlying device model, the host provides the guest with
> >  	  timing infrastructure such as time of day, and system time
> >  
> > +config KVM_CPUIDLE
> > +	tristate "KVM cpuidle driver"
> > +	depends on KVM_GUEST
> > +	default y
> > +	help
> > +	  This option enables KVM cpuidle driver, which allows to poll
> > +	  before halting in the guest (more efficient than polling in the
> > +	  host via halt_poll_ns for some scenarios).
> > +
> >  config PVH
> >  	bool "Support for running PVH guests"
> >  	---help---
> > Index: linux-2.6.git/arch/x86/kernel/Makefile
> > ===================================================================
> > --- linux-2.6.git.orig/arch/x86/kernel/Makefile	2019-04-22 13:49:42.869303331 -0300
> > +++ linux-2.6.git/arch/x86/kernel/Makefile	2019-05-17 12:59:51.673274881 -0300
> > @@ -112,6 +112,7 @@
> >  obj-$(CONFIG_DEBUG_NMI_SELFTEST) += nmi_selftest.o
> >  
> >  obj-$(CONFIG_KVM_GUEST)		+= kvm.o kvmclock.o
> > +obj-$(CONFIG_KVM_CPUIDLE)	+= cpuidle_kvm.o
> >  obj-$(CONFIG_PARAVIRT)		+= paravirt.o paravirt_patch_$(BITS).o
> >  obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o
> >  obj-$(CONFIG_PARAVIRT_CLOCK)	+= pvclock.o
> > Index: linux-2.6.git/arch/x86/kernel/process.c
> > ===================================================================
> > --- linux-2.6.git.orig/arch/x86/kernel/process.c	2019-04-22 13:49:42.876303374 -0300
> > +++ linux-2.6.git/arch/x86/kernel/process.c	2019-05-17 13:19:18.055435117 -0300
> > @@ -580,7 +580,7 @@
> >  	safe_halt();
> >  	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
> >  }
> > -#ifdef CONFIG_APM_MODULE
> > +#if defined(CONFIG_APM_MODULE) || defined(CONFIG_KVM_CPUIDLE_MODULE)
> >  EXPORT_SYMBOL(default_idle);
> >  #endif
> >  
> > Index: linux-2.6.git/arch/x86/kernel/cpuidle_kvm.c
> > ===================================================================
> > --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> > +++ linux-2.6.git/arch/x86/kernel/cpuidle_kvm.c	2019-05-17 13:38:02.553941356 -0300
> > @@ -0,0 +1,105 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * cpuidle driver for KVM guests.
> > + *
> > + * Copyright 2019 Red Hat, Inc. and/or its affiliates.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2.  See
> > + * the COPYING file in the top-level directory.
> > + *
> > + * Authors: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
> > + */
> > +
> > +#include <linux/init.h>
> > +#include <linux/cpuidle.h>
> > +#include <linux/module.h>
> > +#include <linux/timekeeping.h>
> > +#include <linux/sched/idle.h>
> > +
> > +unsigned int guest_halt_poll_ns;
> > +module_param(guest_halt_poll_ns, uint, 0644);
> > +
> > +static int kvm_enter_idle(struct cpuidle_device *dev,
> > +			  struct cpuidle_driver *drv, int index)
> > +{
> > +	int do_halt = 0;
> > +
> > +	/* No polling */
> > +	if (guest_halt_poll_ns == 0) {
> > +		if (current_clr_polling_and_test()) {
> > +			local_irq_enable();
> > +			return index;
> > +		}
> > +		default_idle();
> > +		return index;
> > +	}
> > +
> > +	local_irq_enable();
> > +	if (!current_set_polling_and_test()) {
> > +		ktime_t now, end_spin;
> > +
> > +		now = ktime_get();
> > +		end_spin = ktime_add_ns(now, guest_halt_poll_ns);
> > +
> > +		while (!need_resched()) {
> > +			cpu_relax();
> > +			now = ktime_get();
> > +
> > +			if (!ktime_before(now, end_spin)) {
> > +				do_halt = 1;
> > +				break;
> > +			}
> > +		}
> > +	}
> > +
> > +	if (do_halt) {
> > +		/*
> > +		 * No events while busy spin window passed,
> > +		 * halt.
> > +		 */
> > +		local_irq_disable();
> > +		if (current_clr_polling_and_test()) {
> > +			local_irq_enable();
> > +			return index;
> > +		}
> > +		default_idle();
> > +	} else {
> > +		current_clr_polling();
> > +	}
> > +
> > +	return index;
> > +}
> > +
> > +static struct cpuidle_driver kvm_idle_driver = {
> > +	.name = "kvm_idle",
> > +	.owner = THIS_MODULE,
> > +	.states = {
> > +		{ /* entry 0 is for polling */ },
> > +		{
> > +			.enter			= kvm_enter_idle,
> > +			.exit_latency		= 0,
> > +			.target_residency	= 0,
> > +			.power_usage		= -1,
> > +			.name			= "KVM",
> > +			.desc			= "KVM idle",
> > +		},
> > +	},
> > +	.safe_state_index = 0,
> > +	.state_count = 2,
> > +};
> > +
> > +static int __init kvm_cpuidle_init(void)
> > +{
> > +	return cpuidle_register(&kvm_idle_driver, NULL);
> > +}
> > +
> > +static void __exit kvm_cpuidle_exit(void)
> > +{
> > +	cpuidle_unregister(&kvm_idle_driver);
> > +}
> > +
> > +module_init(kvm_cpuidle_init);
> > +module_exit(kvm_cpuidle_exit);
> > +MODULE_LICENSE("GPL");
> > +MODULE_AUTHOR("Marcelo Tosatti <mtosatti@xxxxxxxxxx>");
> > +
> > Index: linux-2.6.git/Documentation/virtual/kvm/guest-halt-polling.txt
> > ===================================================================
> > --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> > +++ linux-2.6.git/Documentation/virtual/kvm/guest-halt-polling.txt	2019-05-17 13:36:39.274703710 -0300
> > @@ -0,0 +1,39 @@
> > +KVM guest halt polling
> > +======================
> > +
> > +The cpuidle_kvm driver allows the guest vcpus to poll for a specified
> > +amount of time before halting. This provides the following benefits
> > +to host side polling:
> > +
> > +	1) The POLL flag is set while polling is performed, which allows
> > +	   a remote vCPU to avoid sending an IPI (and the associated
> > + 	   cost of handling the IPI) when performing a wakeup.
> > +
> > +	2) The HLT VM-exit cost can be avoided.
> > +
> > +The downside of guest side polling is that polling is performed
> > +even with other runnable tasks in the host.
> > +
> > +Module Parameters
> > +=================
> > +
> > +The cpuidle_kvm module has 1 tuneable module parameter: guest_halt_poll_ns,
> > +the amount of time, in nanoseconds, that polling is performed before
> > +halting.
> > +
> > +This module parameter can be set from the debugfs files in:
> > +
> > +	/sys/module/cpuidle_kvm/parameters/
> > +
> > +Further Notes
> > +=============
> > +
> > +- Care should be taken when setting the guest_halt_poll_ns parameter as a
> > +large value has the potential to drive the cpu usage to 100% on a machine which
> > +would be almost entirely idle otherwise.
> > +
> > +- The effective amount of time that polling is performed is the host poll
> > +value (see halt-polling.txt) plus guest_halt_poll_ns. If all guests
> > +on a host system support and have properly configured guest_halt_poll_ns,
> > +then setting halt_poll_ns to 0 in the host is probably the best choice.
> > +
> > 



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux