On 07/24, Linus Torvalds wrote: > > I just realized that one thing we could do is to not even test the > page bit for the shared case in the wakeup path. > > Because anybody who uses the non-exclusive "wait_on_page_locked()" or > "wait_on_page_writeback()" isn't actually interested in the bit state > any more at that point. All they care about is that somebody cleared > it - not whether it was then re-taken again. > > So instead of keeping them on the list - or stopping the waitqueue > walk because somebody else got the bit - we could just mark them > successfully done, wake them up, and remove those entries from the > list. Heh. I too thought about this. And just in case, your patch looks correct to me. But I can't really comment this behavioural change. Perhaps it should come in a separate patch? In essense, this partly reverts your commit 3510ca20ece0150 ("Minor page waitqueue cleanups"). I mean this part: (b) we don't want to put the non-locking waiters always on the front of the queue, and the locking waiters always on the back. Not only is that unfair, it means that we wake up thousands of reading threads that will just end up being blocked by the writer later anyway. ... @@ -972,10 +976,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q, spin_lock_irq(&q->lock); if (likely(list_empty(&wait->entry))) { - if (lock) - __add_wait_queue_entry_tail_exclusive(q, wait); - else - __add_wait_queue(q, wait); + __add_wait_queue_entry_tail(q, wait); SetPageWaiters(page); } Oleg.