RE: [PATCH 1/2] sched/wait: Break up long wake list walk

"Liang, Kan" <kan.liang@xxxxxxxxx> · Fri, 18 Aug 2017 13:06:04 +0000

> On Thu, Aug 17, 2017 at 1:18 PM, Liang, Kan <kan.liang@xxxxxxxxx> wrote:
> >
> > Here is the call stack of wait_on_page_bit_common when the queue is
> > long (entries >1000).
> >
> > # Overhead  Trace output
> > # ........  ..................
> > #
> >    100.00%  (ffffffff931aefca)
> >             |
> >             ---wait_on_page_bit
> >                __migration_entry_wait
> >                migration_entry_wait
> >                do_swap_page
> >                __handle_mm_fault
> >                handle_mm_fault
> >                __do_page_fault
> >                do_page_fault
> >                page_fault
> 
> Hmm. Ok, so it does seem to very much be related to migration. Your
> wake_up_page_bit() profile made me suspect that, but this one seems to
> pretty much confirm it.
> 
> So it looks like that wait_on_page_locked() thing in __migration_entry_wait(),
> and what probably happens is that your load ends up triggering a lot of
> migration (or just migration of a very hot page), and then *every* thread
> ends up waiting for whatever page that ended up getting migrated.
> 
> And so the wait queue for that page grows hugely long.
> 
> Looking at the other profile, the thing that is locking the page (that everybody
> then ends up waiting on) would seem to be
> migrate_misplaced_transhuge_page(), so this is _presumably_ due to NUMA
> balancing.
> 
> Does the problem go away if you disable the NUMA balancing code?
> 

Yes, the problem goes away when NUMA balancing is disabled.

Thanks,
Kan
��.n������g����a����&ޖ)���)��h���&������梷�����Ǟ�m������)������^�����������v���O��zf������