On Sat, Mar 31, 2012 at 12:07:58AM +0200, Thomas Gleixner wrote: > On Fri, 30 Mar 2012, H. Peter Anvin wrote: > > > What is the current status of this patchset? I haven't looked at it too > > closely because I have been focused on 3.4 up until now... > > The real question is whether these heuristics are the correct approach > or not. > > If I look at it from the non virtualized kernel side then this is ass > backwards. We know already that we are holding a spinlock which might > cause other (v)cpus going into eternal spin. The non virtualized > kernel solves this by disabling preemption and therefor getting out of > the critical section as fast as possible, > > The virtualization problem reminds me a lot of the problem which RT > kernels are observing where non raw spinlocks are turned into > "sleeping spinlocks" and therefor can cause throughput issues for non > RT workloads. > > Though the virtualized situation is even worse. Any preempted guest > section which holds a spinlock is prone to cause unbound delays. > > The paravirt ticketlock solution can only mitigate the problem, but > not solve it. With massive overcommit there is always a way to trigger > worst case scenarious unless you are educating the scheduler to cope > with that. > > So if we need to fiddle with the scheduler and frankly that's the only > way to get a real gain (the numbers, which are achieved by this > patches, are not that impressive) then the question arises whether we > should turn the whole thing around. > > I know that Peter is going to go berserk on me, but if we are running > a paravirt guest then it's simple to provide a mechanism which allows > the host (aka hypervisor) to check that in the guest just by looking > at some global state. > > So if a guest exits due to an external event it's easy to inspect the > state of that guest and avoid to schedule away when it was interrupted > in a spinlock held section. That guest/host shared state needs to be > modified to indicate the guest to invoke an exit when the last nested > lock has been released. Remember that the host is scheduling other processes than vcpus of guests. The case where a higher priority task (whatever that task is) interrupts a vcpu which holds a spinlock should be frequent, in a overcommit scenario. Whenever that is the case, other vcpus _must_ be able to stop spinning. Now extrapolate that to guests with large number of vcpus. There is no replacement for sleep-in-hypervisor-instead-of-spin. > Of course this needs to be time bound, so a rogue guest cannot > monopolize the cpu forever, but that's the least to worry about > problem simply because a guest which does not get out of a spinlocked > region within a certain amount of time is borked and elegible to > killing anyway. > > Thoughts ? > > Thanks, > > tglx -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html