On Wed, Jul 22, 2020 at 2:29 PM Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > -#define PAGE_WAIT_TABLE_BITS 8 > +#define PAGE_WAIT_TABLE_BITS 10 Well, that seems harmless even on small machines. > + bool first_time = true; > bool thrashing = false; > bool delayacct = false; > unsigned long pflags; > @@ -1134,7 +1135,12 @@ static inline int wait_on_page_bit_commo > spin_lock_irq(&q->lock); > > if (likely(list_empty(&wait->entry))) { > - __add_wait_queue_entry_tail(q, wait); > + if (first_time) { > + __add_wait_queue_entry_tail(q, wait); > + first_time = false; > + } else { > + __add_wait_queue(q, wait); > + } > SetPageWaiters(page); > } This seems very hacky. And in fact, looking closer, I'd say that there are more serious problems here. Look at that WQ_FLAG_EXCLUSIVE thing: non-exclusive waits should always go at the head (because they're not going to steal the bit, they just want to know when it got cleared), and exclusive waits should always go at the tail (because of fairness). But that's not at all what we do. Your patch adds even more confusion to this nasty area. And your third one: > + if (ret) > + woken++; > > - if (bookmark && (++cnt > WAITQUEUE_WALK_BREAK_CNT) && > + if (bookmark && (++cnt > WAITQUEUE_WALK_BREAK_CNT) && woken && I've got two reactions to this (a) we should not need a new "woken" variable, we should just set a high bit of "cnt" and make WAITQUEUE_WALK_BREAK_CNT contain that high bit (Tune "high bit" to whatever you want: it could be either the _real_ high bit of the variable, or it could be something like "128", which would mean that you'd break out after 128 non-waking entries). (b) Ugh, what hackery and magic behavior regardless I'm really starting to hate that wait_on_page_bit_common() function. See a few weeks ago how the function looks buggy to begin with https://lore.kernel.org/lkml/CAHk-=wjJA2Z3kUFb-5s=6+n0qbTs8ELqKFt9B3pH85a8fGD73w@xxxxxxxxxxxxxx/ and that never got resolved either (but probably never happens in practice). Linus