On Fri, Aug 7, 2020 at 11:41 AM Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > + > + /* > + * If we hoped to pass PG_locked on to the next locker, but found > + * no exclusive taker, then we must clear it before dropping q->lock. > + * Otherwise unlock_page() can race trylock_page_bit_common(), and if > + * PageWaiters was already set from before, then cmpxchg sees no > + * difference to send it back to wake_up_page_bit(). > + * > + * We may have already dropped and retaken q->lock on the way here, but > + * I think this works out because new entries are always added at tail. > + */ > + if (exclude && !exclusive) > + clear_bit(bit_nr, &page->flags); > + > spin_unlock_irqrestore(&q->lock, flags); Yeah, I started thinking about this, and that's when I decided to just throw away the patch. Because now it clears the bit *after* it has woken people up, and that just made me go "Eww". You did add a comment about the whole "always added to the tail" thing, and the spinlock should serialize everything, so I guess it's ok (because the spinlock should serialize things that care), but it just feels wrong. I also started worrying about other people waiting on the page lock (ie we now have that whole io_uring case), and interaction with the PG_writeback case etc, and it just ended up feeling very messy. I think it was all fine - other cases won't hit that exclusive case at all - but I had a hard time wrapping my little mind around the exact handoff rules, and the cmpxchg behavior when other bits changed (eg PG_waiters), so I gave up. > My home testing was, effectively, on top of c6fe44d96fc1 (v5.8 plus > your two patches): I did not have in the io_uring changes from the > current tree. In glancing there, I noticed one new and one previous > instance of SetPageWaiters() *after* __add_wait_queue_entry_tail(): > are those correct? I don't think SetPageWaiters() has any ordering constraints with the wait queue. Nobody looks at the waitqueue unless they already got to the slowpath. The only ordering constraint is with the testing of the bit we're going to wait on. Because doing if (test_and_set_bit()) SetPageWaiters(page); is wrong - there's a race in there when somebody can clear the bit, and not see that there are waiters. And obviously that needs to be done inside the spinlock, but once you do that, the ordering of the actual wait-queue is entirely irrelevant. The spinlock will order _that_ part. The only thing the spinlock won't order and serialize is the PG_lock bit and making sure we get to the slowpath in the first place. So if you're talking about __wait_on_page_locked_async(), then I think that part is ok. Or am I missing something? Linus