On 03/31/2012 01:07 AM, Thomas Gleixner wrote: > On Fri, 30 Mar 2012, H. Peter Anvin wrote: > > > What is the current status of this patchset? I haven't looked at it too > > closely because I have been focused on 3.4 up until now... > > The real question is whether these heuristics are the correct approach > or not. > > If I look at it from the non virtualized kernel side then this is ass > backwards. We know already that we are holding a spinlock which might > cause other (v)cpus going into eternal spin. The non virtualized > kernel solves this by disabling preemption and therefor getting out of > the critical section as fast as possible, > > The virtualization problem reminds me a lot of the problem which RT > kernels are observing where non raw spinlocks are turned into > "sleeping spinlocks" and therefor can cause throughput issues for non > RT workloads. > > Though the virtualized situation is even worse. Any preempted guest > section which holds a spinlock is prone to cause unbound delays. > > The paravirt ticketlock solution can only mitigate the problem, but > not solve it. With massive overcommit there is always a way to trigger > worst case scenarious unless you are educating the scheduler to cope > with that. > > So if we need to fiddle with the scheduler and frankly that's the only > way to get a real gain (the numbers, which are achieved by this > patches, are not that impressive) then the question arises whether we > should turn the whole thing around. > > I know that Peter is going to go berserk on me, but if we are running > a paravirt guest then it's simple to provide a mechanism which allows > the host (aka hypervisor) to check that in the guest just by looking > at some global state. > > So if a guest exits due to an external event it's easy to inspect the > state of that guest and avoid to schedule away when it was interrupted > in a spinlock held section. That guest/host shared state needs to be > modified to indicate the guest to invoke an exit when the last nested > lock has been released. Interesting idea (I think it has been raised before btw, don't recall by who). One thing about it is that it can give many false positives. Consider a fine-grained spinlock that is being accessed by many threads. That is, the lock is taken and released with high frequency, but there is no contention, because each vcpu is accessing a different instance. So the host scheduler will needlessly delay preemption of vcpus that happen to be holding a lock, even though this gains nothing. A second issue may happen with a lock that is taken and released with high frequency, with a high hold percentage. The host scheduler may always sample the guest in a held state, leading it to conclude that it's exceeding its timeout when in fact the lock is held for a short time only. > Of course this needs to be time bound, so a rogue guest cannot > monopolize the cpu forever, but that's the least to worry about > problem simply because a guest which does not get out of a spinlocked > region within a certain amount of time is borked and elegible to > killing anyway. Hopefully not killing! Just because a guest doesn't scale well, or even if it's deadlocked, doesn't mean it should be killed. Just preempt it. > Thoughts ? It's certainly interesting. Maybe a combination is worthwhile - prevent lockholder preemption for a short period of time AND put waiters to sleep in case that period is insufficient to release the lock. -- error compiling committee.c: too many arguments to function _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization