Re: [PATCH v10 10/19] qspinlock, x86: Allow unfair spinlock in a virtual guest

Radim Krčmář <rkrcmar@xxxxxxxxxx> · Mon, 12 May 2014 20:57:39 +0200

(tl;dr: paravirtualization could be better than unfair qspinlock)

2014-05-07 11:01-0400, Waiman Long:
> Locking is always an issue in a virtualized environment because of 2
> different types of problems:
>  1) Lock holder preemption
>  2) Lock waiter preemption

Paravirtualized ticketlocks have a shortcoming;
we don't know which VCPU the ticket belongs to, so the hypervisor can
only blindly yield to runnable VCPUs after waiters halt in slowpath.
There aren't enough "free" bits in ticket struct to improve, thus we
have resorted to unfairness.

Qspinlock is different.

Most queued VCPUs already know the VCPU before it, so we have what it
takes to mitigate lock waiter preemption: we can include preempted CPU
id in hypercall, the hypervisor will schedule it, and we'll be woken up
from unlock slowpath [1].
This still isn't perfect: we can wake up a VCPU that got preempted
before it could hypercall, and these hypercalls will propagate one by
one through our queue to the preempted lock holder.
(We'd have to share the whole waiter-list to avoid this.
 We could also try to send holder's id instead and unconditionally kick
 next-in-line on unlock, I think it would be slower.)

Lock holder problem is tougher because we don't always share who is it.
The tail bits can be used for it as we don't really use them before a
queue has formed.  This would cost us one bit to differentiate between
holder/tail CPU id [2] and complicate operations a little, but only for
the paravirt case, where benefits are expected to be far greater.
Hypercall from lock slowpath could schedule preempted VCPU right away.

I think this could obsolete unfair locks and will prepare RFC patches
soon-ish [3]. (If the idea isn't proved infeasible before.)

---
1: It is possible that we could avoid O(N) traversal and hypercall in
   unlock slowpath by scheduling VCPUs in the right order often.
2: Or even less. idx=3 is a bug: if we are spinning in NMI, we are
   almost deadlocked, so we should WARN/BUG if it were to happen; which
   leaves the combination free to mean that the CPU id is a sole holder,
   not a tail.  (I prefer clean code though.)
3: I already tried and got quickly fed up by refactoring, so it might
   get postponed till the series gets merged.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html