Re: [PATCH] x86: add cpuidle_kvm driver to allow guest side halt polling

Marcelo Tosatti <mtosatti@xxxxxxxxxx> · Wed, 22 May 2019 12:04:54 -0300

On Mon, May 20, 2019 at 01:51:57PM +0200, Paolo Bonzini wrote:
> On 17/05/19 19:48, Marcelo Tosatti wrote:
> > 
> > The cpuidle_kvm driver allows the guest vcpus to poll for a specified
> > amount of time before halting. This provides the following benefits
> > to host side polling:
> > 
> > 	1) The POLL flag is set while polling is performed, which allows
> > 	   a remote vCPU to avoid sending an IPI (and the associated
> >  	   cost of handling the IPI) when performing a wakeup.
> > 
> > 	2) The HLT VM-exit cost can be avoided.
> > 
> > The downside of guest side polling is that polling is performed
> > even with other runnable tasks in the host.
> > 
> > Results comparing halt_poll_ns and server/client application
> > where a small packet is ping-ponged:
> > 
> > host                                        --> 31.33	
> > halt_poll_ns=300000 / no guest busy spin    --> 33.40	(93.8%)
> > halt_poll_ns=0 / guest_halt_poll_ns=300000  --> 32.73	(95.7%)
> > 
> > For the SAP HANA benchmarks (where idle_spin is a parameter 
> > of the previous version of the patch, results should be the
> > same):
> > 
> > hpns == halt_poll_ns
> > 
> >                           idle_spin=0/   idle_spin=800/	   idle_spin=0/
> > 			  hpns=200000    hpns=0            hpns=800000
> > DeleteC06T03 (100 thread) 1.76           1.71 (-3%)        1.78	  (+1%)
> > InsertC16T02 (100 thread) 2.14     	 2.07 (-3%)        2.18   (+1.8%)
> > DeleteC00T01 (1 thread)   1.34 		 1.28 (-4.5%)	   1.29   (-3.7%)
> > UpdateC00T03 (1 thread)	  4.72		 4.18 (-12%)	   4.53   (-5%)
> 
> Hi Marcelo,
> 
> some quick observations:
> 
> 1) This is actually not KVM-specific, so the name and placement of the
> docs should be adjusted.
> 
> 2) Regarding KVM-specific code, however, we could add an MSR so that KVM
> disables halt_poll_ns for this VM when this is active in the guest?
> 
> 3) The spin time could use the same adaptive algorithm that KVM uses in
> the host.

Hi Paolo,

Consider sequence of wakeup events as follows:
20us, 200us, 20us, 200us...

1) halt_poll_ns=250us   v->halt_poll_ns=0us     wakeup=20us

grow sets v->halt_poll_ns = 20us

2) halt_poll_ns=250us   v->halt_poll_ns=20us     wakeup=200us

grow sets v->halt_poll_ns = 40us

3) halt_poll_ns=250us   v->halt_poll_ns=40us     wakeup=20us

v->halt_poll_ns untouched

Doubling repeats until

v->halt_poll_ns=80, 160, 250us.

N) halt_poll_ns=250us   v->halt_poll_ns=250us   wakeup=20us

If in the middle of the 20us,200us,20us... sequence you block
for a value larger than halt_poll_ns (250 in this case),
the logic today will either:

        1) set v->halt_poll_ns to zero.

        2) set halt_poll_ns to 125us (if you set shrink to 2).

In either case, you lose (one missed event any time
block_time > halt_poll_ns).

If one enables guest halt polling in the first place,
then the energy/performance tradeoff is bend towards
performance, and such misses are harmful.

So going to add something along the lines of:

"If, after 50 consecutive times, block_time is much larger than
halt_poll_ns, then set cpu->halt_poll_ns to zero".

Restore user halt_poll_ns value once one smaller block_time
is observed.

This should cover the full idle case, and cause minimal
harm to performance.

Is that OK or is there any other characteristic of
adaptive halt poll you are looking for?