On Sat, 19 Jun 2021 10:30:09 +0000 Yunsheng Lin wrote: > When debugging pointed to the misordering between STATE_MISSED > setting/clearing and STATE_MISSED checking, only _after_atomic() > was added first, and it did not fix the misordering problem, > when both _before_atomic() and _after_atomic() were added, the > misordering problem disappeared. > > I suppose _before_atomic() matters because the STATE_MISSED > setting and the lock rechecking is only done when first check of > STATE_MISSED returns false. _before_atomic() is used to make sure > the first check returns correct result, if it does not return the > correct result, then we may have misordering problem too. > > cpu0 cpu1 > clear MISSED > _after_atomic() > dequeue > enqueue > first trylock() #false > MISSED check #*true* ? > > As above, even cpu1 has a _after_atomic() between clearing > STATE_MISSED and dequeuing, we might stiil need a barrier to > prevent cpu0 doing speculative MISSED checking before cpu1 > clearing MISSED? > > And the implicit load-acquire barrier contained in the first > trylock() does not seems to prevent the above case too. > > And there is no load-acquire barrier in pfifo_fast_dequeue() > too, which possibly make the above case more likely to happen. Ah, you're right. The test_bit() was not in the patch context, I forgot it's there... Both barriers are indeed needed.