(tl;dr: paravirtualization could be better than unfair qspinlock) 2014-05-07 11:01-0400, Waiman Long: > Locking is always an issue in a virtualized environment because of 2 > different types of problems: > 1) Lock holder preemption > 2) Lock waiter preemption Paravirtualized ticketlocks have a shortcoming; we don't know which VCPU the ticket belongs to, so the hypervisor can only blindly yield to runnable VCPUs after waiters halt in slowpath. There aren't enough "free" bits in ticket struct to improve, thus we have resorted to unfairness. Qspinlock is different. Most queued VCPUs already know the VCPU before it, so we have what it takes to mitigate lock waiter preemption: we can include preempted CPU id in hypercall, the hypervisor will schedule it, and we'll be woken up from unlock slowpath [1]. This still isn't perfect: we can wake up a VCPU that got preempted before it could hypercall, and these hypercalls will propagate one by one through our queue to the preempted lock holder. (We'd have to share the whole waiter-list to avoid this. We could also try to send holder's id instead and unconditionally kick next-in-line on unlock, I think it would be slower.) Lock holder problem is tougher because we don't always share who is it. The tail bits can be used for it as we don't really use them before a queue has formed. This would cost us one bit to differentiate between holder/tail CPU id [2] and complicate operations a little, but only for the paravirt case, where benefits are expected to be far greater. Hypercall from lock slowpath could schedule preempted VCPU right away. I think this could obsolete unfair locks and will prepare RFC patches soon-ish [3]. (If the idea isn't proved infeasible before.) --- 1: It is possible that we could avoid O(N) traversal and hypercall in unlock slowpath by scheduling VCPUs in the right order often. 2: Or even less. idx=3 is a bug: if we are spinning in NMI, we are almost deadlocked, so we should WARN/BUG if it were to happen; which leaves the combination free to mean that the CPU id is a sole holder, not a tail. (I prefer clean code though.) 3: I already tried and got quickly fed up by refactoring, so it might get postponed till the series gets merged. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html