On Tue, Nov 24, 2020 at 3:24 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > I've applied your second patch (the smaller one that just takes a ref > around the critical section). If somebody comes up with some great > alternative, we can always revisit this. Hmm. I'm not sure about "great alternative", but it strikes me that we *could* move the clearing of the PG_writeback bit _into_ wake_up_page_bit(), under the page waitqueue lock. IOW, we could make the rule be that the bit isn't actually cleared before calling wake_up_page() at all, and we'd clear it with something like unsigned long flags = READ_ONCE(page->flags); // We can clear PG_writeback directly if PG_waiters isn't set while (!(flags & (1ul << PG_waiters))) { unsigned long new = flags & ~(1ul << PG_writeback); // PG_writeback was already clear??!!? if (WARN_ON_ONCE(new == flags)) return; new = cmpxchg(&page->flags, flags, new); if (likely(flags == new)) return; flags = new; } // Otherwise, clear the bit at the end - but under the // page waitqueue lock - inside wake_up_page_bit() return wake_up_page_bit(..); instead. That would basically make the bit clearing atomic wrt the PG_waiters flags - either using that atomic cmpxchg, or by doing it under the page queue lock so that it's atomic wrt any new waiters. This seems conceptually like the right thing to do - and if would also make the (fair) exclusive lock hand-off case atomic too, because the bit we're waking up on would never be cleared if it gets handed off directly. The above is entirely untested crap written in my MUA, and obviously requires that all callers of wake_up_page() be moved to that new world order, but I think we only have two cases: unlock_page() and end_page_writeback(). And unlock_page() already has that "clear_bit_unlock_is_negative_byte()" special case that is an ugly special case of PG_waiters atomicity. So we'd get rid of that, because the cmpxchg loop would be the better model. I'm not sure I'm willing to write and test the real patch, but it doesn't look _too_ nasty from just looking at the code. The bookmark thing makes it important to only actually clear the bit at the end (as does the handoff case anyway), but the way wake_up_page_bit() is written, that's actually very straightforward - just after the while-loop. That's when we've woken up everybody. So I'm sending this idea out to see if somebody can shoot it down, or even wants to possibly even try to do it.. Linus