Re: high kswapd CPU usage with symmetrical swap in/out pattern with multi-gen LRU

Yu Zhao <yuzhao@xxxxxxxxxx> · Wed, 3 Jan 2024 20:03:20 -0700

On Wed, Jan 3, 2024 at 2:30 PM Jaroslav Pulchart
<jaroslav.pulchart@xxxxxxxxxxxx> wrote:
>
> >
> > >
> > > Hi yu,
> > >
> > > On 12/2/2023 5:22 AM, Yu Zhao wrote:
> > > > Charan, does the fix previously attached seem acceptable to you? Any
> > > > additional feedback? Thanks.
> > >
> > > First, thanks for taking this patch to upstream.
> > >
> > > A comment in code snippet is checking just 'high wmark' pages might
> > > succeed here but can fail in the immediate kswapd sleep, see
> > > prepare_kswapd_sleep(). This can show up into the increased
> > > KSWAPD_HIGH_WMARK_HIT_QUICKLY, thus unnecessary kswapd run time.
> > > @Jaroslav: Have you observed something like above?
> >
> > I do not see any unnecessary kswapd run time, on the contrary it is
> > fixing the kswapd continuous run issue.
> >
> > >
> > > So, in downstream, we have something like for zone_watermark_ok():
> > > unsigned long size = wmark_pages(zone, mark) + MIN_LRU_BATCH << 2;
> > >
> > > Hard to convince of this 'MIN_LRU_BATCH << 2' empirical value, may be we
> > > should atleast use the 'MIN_LRU_BATCH' with the mentioned reasoning, is
> > > what all I can say for this patch.
> > >
> > > +       mark = sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING ?
> > > +              WMARK_PROMO : WMARK_HIGH;
> > > +       for (i = 0; i <= sc->reclaim_idx; i++) {
> > > +               struct zone *zone = lruvec_pgdat(lruvec)->node_zones + i;
> > > +               unsigned long size = wmark_pages(zone, mark);
> > > +
> > > +               if (managed_zone(zone) &&
> > > +                   !zone_watermark_ok(zone, sc->order, size, sc->reclaim_idx, 0))
> > > +                       return false;
> > > +       }
> > >
> > >
> > > Thanks,
> > > Charan
> >
> >
> >
> > --
> > Jaroslav Pulchart
> > Sr. Principal SW Engineer
> > GoodData
>
>
> Hello,
>
> today we try to update servers to 6.6.9 which contains the mglru fixes
> (from 6.6.8) and the server behaves much much worse.
>
> I got multiple kswapd* load to ~100% imediatelly.
>     555 root      20   0       0      0      0 R  99.7   0.0   4:32.86
> kswapd1
>     554 root      20   0       0      0      0 R  99.3   0.0   3:57.76
> kswapd0
>     556 root      20   0       0      0      0 R  97.7   0.0   3:42.27
> kswapd2
> are the changes in upstream different compared to the initial patch
> which I tested?
>
> Best regards,
> Jaroslav Pulchart

Hi Jaroslav,

My apologies for all the trouble!

Yes, there is a slight difference between the fix you verified and
what went into 6.6.9. The fix in 6.6.9 is disabled under a special
condition which I thought wouldn't affect you.

Could you try the attached fix again on top of 6.6.9? It removed that
special condition.

Thanks!
Attachment:
mglru-fix-6.6.9.patch

Description: Binary data