Re: [PATCH v4] KVM: halt-polling: poll for the upcoming fire timers

Yang Zhang <yang.zhang.wz@xxxxxxxxx> · Wed, 25 May 2016 10:10:12 +0800

On 2016/5/25 7:37, David Matlack wrote:
On Tue, May 24, 2016 at 4:11 PM, Wanpeng Li <kernellwp@xxxxxxxxx> wrote:
2016-05-25 6:38 GMT+08:00 David Matlack <dmatlack@xxxxxxxxxx>:
On Tue, May 24, 2016 at 12:57 AM, Wanpeng Li <kernellwp@xxxxxxxxx> wrote:
From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>

If an emulated lapic timer will fire soon(in the scope of 10us the
base of dynamic halt-polling, lower-end of message passing workload
latency TCP_RR's poll time < 10us) we can treat it as a short halt,
and poll to wait it fire, the fire callback apic_timer_fn() will set
KVM_REQ_PENDING_TIMER, and this flag will be check during busy poll.
This can avoid context switch overhead and the latency which we wake
up vCPU.

This feature is slightly different from current advance expiration
way. Advance expiration rely on the vCPU is running(do polling before
vmentry). But in some cases, the timer interrupt may be blocked by
other thread(i.e., IF bit is clear) and vCPU cannot be scheduled to
run immediately. So even advance the timer early, vCPU may still see
the latency. But polling is different, it ensures the vCPU to aware
the timer expiration before schedule out.

echo HRTICK > /sys/kernel/debug/sched_features in dynticks guests.

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
kernel     Linux 4.6.0+ 7.9800   11.0   10.8   14.6 9.4300    13.0    10.2 vanilla
kernel     Linux 4.6.0+   15.3   13.6   10.7   12.5 9.0000    12.8 7.38000 poll

These results aren't very compelling. Sometimes polling is faster,
sometimes vanilla is faster, sometimes they are about the same.

More processes and bigger cache footprints can get benefit from the
result since I open the hrtimer for the precision preemption.

The VCPU is halted (idle), so the timer interrupt is not preempting
anything. Also I would not expect any preemption in a context
switching benchmark, the threads should be handing off execution to
one another.

I'm confused why timers would play any role in the performance of this
benchmark. Any idea why there's a speedup in the 8p/16K and 16p/64K
runs?

Actually
I try to emulate Yang's workload, https://lkml.org/lkml/2016/5/22/162.
And his real workload can get more benefit as he mentioned,
https://lkml.org/lkml/2016/5/19/667.

I imagine there are hyper sensitive workloads which cannot tolerate a
long tail in timer latency (e.g. realtime workloads). I would expect a
patch like this to provide a "smoothing effect", reducing that tail.
But for cloud/server workloads, I would not expect any sensitivity to
jitter in timer latency (especially while the VCPU is halted).

Yang's is real cloud workload.

I have 2 issues with optimizing for Yang's workload. Yang, please
correct me if I am mis-characterizing it.
1. The delay in timer interrupts is caused by something disabling the
interrupts on the CPU for more than a millisecond. It seems that is
the real issue. I'm wary of using polling as a workaround.

Yes, this is the most likely case.

2. The delay is caused by a separate task. Halt-polling would not help
in that scenario, it would yield the CPU to that task.

In some cases, the separate task is migrated from other CPU after CPU 
enter idle state. So Halt-polling may still help. And the delay is 
caused by two context switches(VCPU schedule out and migrate VCPU to 
another idle CPU).

Note that while halt-polling happens when the CPU is idle, it's still
not free. It constricts the scheduler's cpu load balancer, because the
CPU appears to be busy. In KVM's default configuration, I'd prefer to
only add more polling when the gain is clear. If there are guest
workloads that want this patch, I'd suggest polling for timers be
default-off. At minimum, there should be a module parameter to control
it (like Christian Borntraeger suggested).

Yeah, I will add the module parameter in order to enable/disable.

Regards,
Wanpeng Li

--
best regards
yang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html