On Sun, Jan 02, 2022 at 12:31:29PM +0900, skseofh@xxxxxxxxx wrote: > From: Daero Lee <skseofh@xxxxxxxxx> > > In kswapd_try_to_sleep function, to check whether kswapd can sleep, > the prepare_kswapd_sleep function is called twice. > > If free pages are below high-watermark in the first call, > the @remaining variable is not updated at 0 and the > prepare_kswapd_sleep function is called for the second time. > > I think it is necessary to set the initial value of the > @remaining to a non-zero value to prevent consecutive calls > to the same function. > > Signed-off-by: Daero Lee <skseofh@xxxxxxxxx> > --- > mm/vmscan.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 700434db5735..1217ecec5bbb 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -4331,7 +4331,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx) > /* > * Return the order kswapd stopped reclaiming at as > * prepare_kswapd_sleep() takes it into account. If another caller > - * entered the allocator slow path while kswapd was awake, order will > + * entered the allqocator slow path while kswapd was awake, order will > * remain at the higher level. > */ > return sc.order; This hunk just adds a typo, drop it. > @@ -4355,7 +4355,7 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat, > static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order, > unsigned int highest_zoneidx) > { > - long remaining = 0; > + long remaining = ~0; > DEFINE_WAIT(wait); > > if (freezing(current) || kthread_should_stop()) While this does avoid calling prepare_kswapd_sleep() twice if the pgdat is balanced on the first try, it then does not restore the vmstat thresholds and doesn't call schedul() for kswapd to go to sleep. I think you did spot a problem but I suspect you want something like the following untested patch diff --git a/mm/vmscan.c b/mm/vmscan.c index 700434db5735..40784693c840 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4355,7 +4355,8 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat, static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order, unsigned int highest_zoneidx) { - long remaining = 0; + long remaining; + bool balanced; DEFINE_WAIT(wait); if (freezing(current) || kthread_should_stop()) @@ -4370,7 +4371,8 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o * eligible zone balanced that it's also unlikely that compaction will * succeed. */ - if (prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) { + balanced = prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx); + if (balanced) { /* * Compaction records what page blocks it recently failed to * isolate pages from and skips them in the future scanning. @@ -4387,6 +4389,10 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o remaining = schedule_timeout(HZ/10); + /* Is pgdat balanced after a short sleep? */ + balanced = prepare_kswapd_sleep(pgdat, reclaim_order, + highest_zoneidx); + /* * If woken prematurely then reset kswapd_highest_zoneidx and * order. The values will either be from a wakeup request or @@ -4406,11 +4412,11 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o } /* - * After a short sleep, check if it was a premature sleep. If not, then - * go fully to sleep until explicitly woken up. + * If balanced to the high watermark, restore vmstat thresholds and + * kswapd goes to sleep. If kswapd remains awake, account whether + * the low or high watermark was hit quickly. */ - if (!remaining && - prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) { + if (balanced) { trace_mm_vmscan_kswapd_sleep(pgdat->node_id); /*