Re: [PATCH] use unfair spinlock when running on hypervisor.

john cooper <john.cooper@xxxxxxxxxxxxxxxxxx> · Tue, 01 Jun 2010 13:54:07 -0400

Avi Kivity wrote:
> On 06/01/2010 07:38 PM, Andi Kleen wrote:
>>>> Your new code would starve again, right?
>>>>
>>>>        
>>> Yes, of course it may starve with unfair spinlock. Since vcpus are not
>>> always running there is much smaller chance then vcpu on remote memory
>>> node will starve forever. Old kernels with unfair spinlocks are running
>>> fine in VMs on NUMA machines with various loads.
>>>      
>> Try it on a NUMA system with unfair memory.
>>    
> 
> We are running everything on NUMA (since all modern machines are now
> NUMA).  At what scale do the issues become observable?
> 
>>> I understand that reason and do not propose to get back to old spinlock
>>> on physical HW! But with virtualization performance hit is unbearable.
>>>      
>> Extreme unfairness can be unbearable too.
>>    
> 
> Well, the question is what happens first.  In our experience, vcpu
> overcommit is a lot more painful.  People will never see the NUMA
> unfairness issue if they can't use kvm due to the vcpu overcommit problem.

Gleb's observed performance hit seems to be a rather mild
throughput depression compared with creating a worst case by
enforcing vcpu overcommit.  Running a single guest with 2:1
overcommit on a 4 core machine I saw over an order of magnitude
slowdown vs. 1:1 commit with the same kernel build test.
Others have reported similar results.

How close you'll get to that scenario depends on host
scheduling dynamics, and statistically the number of opened
and stalled lock held paths waiting to be contended.  So
I'd expect to see quite variable numbers for guest-guest
aggravation of this problem.

> What I'd like to see eventually is a short-term-unfair, long-term-fair
> spinlock.  Might make sense for bare metal as well.  But it won't be
> easy to write.

Collecting the contention/usage statistics on a per spinlock
basis seems complex.  I believe a practical approximation
to this are adaptive mutexes where upon hitting a spin
time threshold, punt and let the scheduler reconcile fairness.

-john

-- 
john.cooper@xxxxxxxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html