Avi Kivity wrote: > On 06/01/2010 07:38 PM, Andi Kleen wrote: >>>> Your new code would starve again, right? >>>> >>>> >>> Yes, of course it may starve with unfair spinlock. Since vcpus are not >>> always running there is much smaller chance then vcpu on remote memory >>> node will starve forever. Old kernels with unfair spinlocks are running >>> fine in VMs on NUMA machines with various loads. >>> >> Try it on a NUMA system with unfair memory. >> > > We are running everything on NUMA (since all modern machines are now > NUMA). At what scale do the issues become observable? > >>> I understand that reason and do not propose to get back to old spinlock >>> on physical HW! But with virtualization performance hit is unbearable. >>> >> Extreme unfairness can be unbearable too. >> > > Well, the question is what happens first. In our experience, vcpu > overcommit is a lot more painful. People will never see the NUMA > unfairness issue if they can't use kvm due to the vcpu overcommit problem. Gleb's observed performance hit seems to be a rather mild throughput depression compared with creating a worst case by enforcing vcpu overcommit. Running a single guest with 2:1 overcommit on a 4 core machine I saw over an order of magnitude slowdown vs. 1:1 commit with the same kernel build test. Others have reported similar results. How close you'll get to that scenario depends on host scheduling dynamics, and statistically the number of opened and stalled lock held paths waiting to be contended. So I'd expect to see quite variable numbers for guest-guest aggravation of this problem. > What I'd like to see eventually is a short-term-unfair, long-term-fair > spinlock. Might make sense for bare metal as well. But it won't be > easy to write. Collecting the contention/usage statistics on a per spinlock basis seems complex. I believe a practical approximation to this are adaptive mutexes where upon hitting a spin time threshold, punt and let the scheduler reconcile fairness. -john -- john.cooper@xxxxxxxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html