Re: [PATCH] Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Wed, 11 Sep 2019 15:04:31 +0200

On 11/09/19 06:25, Waiman Long wrote:
> On 9/10/19 6:56 AM, Wanpeng Li wrote:
>> On Mon, 9 Sep 2019 at 18:56, Waiman Long <longman@xxxxxxxxxx> wrote:
>>> On 9/9/19 2:40 AM, Wanpeng Li wrote:
>>>> From: Wanpeng Li <wanpengli@xxxxxxxxxxx>
>>>>
>>>> This patch reverts commit 75437bb304b20 (locking/pvqspinlock: Don't wait if
>>>> vCPU is preempted), we found great regression caused by this commit.
>>>>
>>>> Xeon Skylake box, 2 sockets, 40 cores, 80 threads, three VMs, each is 80 vCPUs.
>>>> The score of ebizzy -M can reduce from 13000-14000 records/s to 1700-1800
>>>> records/s with this commit.
>>>>
>>>>           Host                       Guest                score
>>>>
>>>> vanilla + w/o kvm optimizes     vanilla               1700-1800 records/s
>>>> vanilla + w/o kvm optimizes     vanilla + revert      13000-14000 records/s
>>>> vanilla + w/ kvm optimizes      vanilla               4500-5000 records/s
>>>> vanilla + w/ kvm optimizes      vanilla + revert      14000-15500 records/s
>>>>
>>>> Exit from aggressive wait-early mechanism can result in yield premature and
>>>> incur extra scheduling latency in over-subscribe scenario.
>>>>
>>>> kvm optimizes:
>>>> [1] commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts)
>>>> [2] commit 266e85a5ec9 (KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption)
>>>>
>>>> Tested-by: loobinliu@xxxxxxxxxxx
>>>> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>>>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>>>> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
>>>> Cc: Waiman Long <longman@xxxxxxxxxx>
>>>> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>>>> Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx>
>>>> Cc: loobinliu@xxxxxxxxxxx
>>>> Cc: stable@xxxxxxxxxxxxxxx
>>>> Fixes: 75437bb304b20 (locking/pvqspinlock: Don't wait if vCPU is preempted)
>>>> Signed-off-by: Wanpeng Li <wanpengli@xxxxxxxxxxx>
>>>> ---
>>>>  kernel/locking/qspinlock_paravirt.h | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
>>>> index 89bab07..e84d21a 100644
>>>> --- a/kernel/locking/qspinlock_paravirt.h
>>>> +++ b/kernel/locking/qspinlock_paravirt.h
>>>> @@ -269,7 +269,7 @@ pv_wait_early(struct pv_node *prev, int loop)
>>>>       if ((loop & PV_PREV_CHECK_MASK) != 0)
>>>>               return false;
>>>>
>>>> -     return READ_ONCE(prev->state) != vcpu_running || vcpu_is_preempted(prev->cpu);
>>>> +     return READ_ONCE(prev->state) != vcpu_running;
>>>>  }
>>>>
>>>>  /*
>>> There are several possibilities for this performance regression:
>>>
>>> 1) Multiple vcpus calling vcpu_is_preempted() repeatedly may cause some
>>> cacheline contention issue depending on how that callback is implemented.
>>>
>>> 2) KVM may set the preempt flag for a short period whenver an vmexit
>>> happens even if a vmenter is executed shortly after. In this case, we
>>> may want to use a more durable vcpu suspend flag that indicates the vcpu
>>> won't get a real vcpu back for a longer period of time.
>>>
>>> Perhaps you can add a lock event counter to count the number of
>>> wait_early events caused by vcpu_is_preempted() being true to see if it
>>> really cause a lot more wait_early than without the vcpu_is_preempted()
>>> call.
>> pv_wait_again:1:179
>> pv_wait_early:1:189429
>> pv_wait_head:1:263
>> pv_wait_node:1:189429
>> pv_vcpu_is_preempted:1:45588
>> =========sleep 5============
>> pv_wait_again:1:181
>> pv_wait_early:1:202574
>> pv_wait_head:1:267
>> pv_wait_node:1:202590
>> pv_vcpu_is_preempted:1:46336
>>
>> The sampling period is 5s, 6% of wait_early events caused by
>> vcpu_is_preempted() being true.
> 
> 6% isn't that high. However, when one vCPU voluntarily releases its
> vCPU, all the subsequently waiters in the queue will do the same. It is
> a cascading effect. Perhaps we wait early too aggressive with the
> original patch.
> 
> I also look up the email chain of the original commit. The patch
> submitter did not provide any performance data to support this change.
> The patch just looked reasonable at that time. So there was no
> objection. Given that we now have hard evidence that this was not a good
> idea. I think we should revert it.
> 
> Reviewed-by: Waiman Long <longman@xxxxxxxxxx>
> 
> Thanks,
> Longman
> 

Queued, thanks.

Paolo