2014-05-13 15:47-0400, Waiman Long: > On 05/12/2014 11:22 AM, Radim Krčmář wrote: > >I think there is an unwanted scenario on virtual machines: > >1) VCPU sets the pending bit and start spinning. > >2) Pending VCPU gets descheduled. > > - we have PLE and lock holder isn't running [1] > > - the hypervisor randomly preempts us > >3) Lock holder unlocks while pending VCPU is waiting in queue. > >4) Subsequent lockers will see free lock with set pending bit and will > > loop in trylock's 'for (;;)' > > - the worst-case is lock starving [2] > > - PLE can save us from wasting whole timeslice > > > >Retry threshold is the easiest solution, regardless of its ugliness [4]. > > Yes, that can be a real issue. Some sort of retry threshold, as you said, > should be able to handle it. > > BTW, the relevant patch should be 16/19 where the PV spinlock stuff should > be discussed. This patch is perfectly fine. Ouch, my apology to Peter didn't make it ... Agreed, I should have split the comment under patches [06/19] (part quoted above; does not depend on PV), [16/19] (part quoted below) and [17/19] (general doubts). > >Another minor design flaw is that formerly first VCPU gets appended to > >the tail when it decides to queue; > >is the performance gain worth it? > > > >Thanks. > > Yes, the performance gain is worth it. The primary goal is to be not worse > than ticket spinlock in light load situation which is the most common case. > This feature is need to achieve that. Ok. I've seen merit in pvqspinlock even with slightly slower first-waiter, so I would have happily sacrificed those horrible branches. (I prefer elegant to optimized code, but I can see why we want to be strictly better than ticketlock.) Peter mentioned that we are focusing on bare-metal patches, so I'll withold my other paravirt rants until they are polished. And to forcefully bring this thread a little bit on-topic: Pending-bit is effectively a lock in a lock, so I was wondering why don't we use more pending bits; advantages are the same, just diminished by the probability of having an ideally contended lock: - waiter won't be blocked on RAM access if critical section (or more) ends sooner - some unlucky cacheline is not forgotten - faster unlock (no need for tail operations) (- ?) disadvantages are magnified: - increased complexity - intense cacheline sharing (I thought that this is the main disadvantage of ticketlock.) (- ?) One bit still improved performance, is it the best we got? Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html