On Thu, Apr 02, 2015 at 12:28:30PM -0400, Waiman Long wrote: > On 04/01/2015 05:03 PM, Peter Zijlstra wrote: > >On Wed, Apr 01, 2015 at 03:58:58PM -0400, Waiman Long wrote: > >>On 04/01/2015 02:48 PM, Peter Zijlstra wrote: > >>I am sorry that I don't quite get what you mean here. My point is that in > >>the hashing step, a cpu will need to scan an empty bucket to put the lock > >>in. In the interim, an previously used bucket before the empty one may get > >>freed. In the lookup step for that lock, the scanning will stop because of > >>an empty bucket in front of the target one. > >Right, that's broken. So we need to do something else to limit the > >lookup, because without that break, a lookup that needs to iterate the > >entire array in order to determine -ENOENT, which is expensive. > > > >So my alternative proposal is that IFF we can guarantee that every > >lookup will succeed -- the entry we're looking for is always there, we > >don't need the break on empty but can probe until we find the entry. > >This will be bound in cost to the same number if probes we required for > >insertion and avoids the full array scan. > > > >Now I think we can indeed do this, if as said earlier we do not clear > >the bucket on insert if the cmpxchg succeeds, in that case the unlock > >will observe _Q_SLOW_VAL and do the lookup, the lookup will then find > >the entry. And we then need the unlock to clear the entry. > >_Q_SLOW_VAL > >Does that explain this? Or should I try again with code? > > OK, I got your proposal now. However, there is still the issue that setting > the _Q_SLOW_VAL flag and the hash bucket are not atomic wrt each other. So? They're strictly ordered, that's sufficient. We first hash the lock, then we set _Q_SLOW_VAL. There's a full memory barrier between them. > It > is possible a CPU has set the _Q_SLOW_VAL flag but not yet filling in the > hash bucket while another one is trying to look for it. Nope. The unlock side does an xchg() on the locked value first, xchg also implies a full barrier, so that guarantees that if we observe _Q_SLOW_VAL we must also observe the hash bucket with the lock value. > So we need to have > some kind of synchronization mechanism to let the lookup CPU know when is a > good time to look up. No, its all already ordered and working. pv_wait_head(): pv_hash() /* MB as per cmpxchg */ cmpxchg(&l->locked, _Q_LOCKED_VAL, _Q_SLOW_VAL); VS __pv_queue_spin_unlock(): if (xchg(&l->locked, 0) != _Q_SLOW_VAL) return; /* MB as per xchg */ pv_hash_find(lock); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html