On Wed, Jul 22, 2020 at 3:10 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > + bool first_time = true; > > bool thrashing = false; > > bool delayacct = false; > > unsigned long pflags; > > @@ -1134,7 +1135,12 @@ static inline int wait_on_page_bit_commo > > spin_lock_irq(&q->lock); > > > > if (likely(list_empty(&wait->entry))) { > > - __add_wait_queue_entry_tail(q, wait); > > + if (first_time) { > > + __add_wait_queue_entry_tail(q, wait); > > + first_time = false; > > + } else { > > + __add_wait_queue(q, wait); > > + } > > SetPageWaiters(page); > > } > > This seems very hacky. > > And in fact, looking closer, I'd say that there are more serious problems here. > > Look at that WQ_FLAG_EXCLUSIVE thing: non-exclusive waits should > always go at the head (because they're not going to steal the bit, > they just want to know when it got cleared), and exclusive waits > should always go at the tail (because of fairness). > > But that's not at all what we do. > > Your patch adds even more confusion to this nasty area. Actually, I think the right thing to do is to just say "we should never add ourselves back to the list". We have three cases: - DROP: we'll never loop - SHARED: we'll break if we see the bit clear - so if we're no longer on the list, we should break out - EXCLUSIVE: we should remain on the list until we get the lock. and we should just handle these three cases in the wakeup function instead, I feel. And then to make it fair to people, let's just always add things to the head of the queue, whether exclusive or not. AND NEVER RE-QUEUE. IOW, something like the attached patch. NOTE NOTE NOTE! This is both somewhat subtle code, and ENTIRELY UNTESTED. But it actually fixes the bug mentioned a few weeks ago, it makes the code simpler in one sense, and it avoids the whole re-queueuing thing. It removes all the "is the page locked" testing from the actual wait loop, and replaces it with "is the wait queue entry still on the list" instead. Comments? Oleg, this should fix the race you talked about too. Note that this patch really is meant as a RFC, and "something like this". It builds. I tried to think it through. But it might have some obvious thinko that means that it really really doesn't work. Linus
Attachment:
patch
Description: Binary data