Re: high kswapd CPU usage with symmetrical swap in/out pattern with multi-gen LRU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 9, 2023 at 3:58 AM Jaroslav Pulchart
<jaroslav.pulchart@xxxxxxxxxxxx> wrote:
>
> >
> > On Wed, Nov 8, 2023 at 10:39 PM Jaroslav Pulchart
> > <jaroslav.pulchart@xxxxxxxxxxxx> wrote:
> > >
> > > >
> > > > On Wed, Nov 8, 2023 at 12:04 PM Jaroslav Pulchart
> > > > <jaroslav.pulchart@xxxxxxxxxxxx> wrote:
> > > > >
> > > > > >
> > > > > > Hi Jaroslav,
> > > > >
> > > > > Hi Yu Zhao
> > > > >
> > > > > thanks for response, see answers inline:
> > > > >
> > > > > >
> > > > > > On Wed, Nov 8, 2023 at 6:35 AM Jaroslav Pulchart
> > > > > > <jaroslav.pulchart@xxxxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > I would like to report to you an unpleasant behavior of multi-gen LRU
> > > > > > > with strange swap in/out usage on my Dell 7525 two socket AMD 74F3
> > > > > > > system (16numa domains).
> > > > > >
> > > > > > Kernel version please?
> > > > >
> > > > > 6.5.y, but we saw it sooner as it is in investigation from 23th May
> > > > > (6.4.y and maybe even the 6.3.y).
> > > >
> > > > v6.6 has a few critical fixes for MGLRU, I can backport them to v6.5
> > > > for you if you run into other problems with v6.6.
> > > >
> > >
> > > I will give it a try using 6.6.y. When it will work we can switch to
> > > 6.6.y instead of backporting the stuff to 6.5.y.
> > >
> > > > > > > Symptoms of my issue are
> > > > > > >
> > > > > > > /A/ if mult-gen LRU is enabled
> > > > > > > 1/ [kswapd3] is consuming 100% CPU
> > > > > >
> > > > > > Just thinking out loud: kswapd3 means the fourth node was under memory pressure.
> > > > > >
> > > > > > >     top - 15:03:11 up 34 days,  1:51,  2 users,  load average: 23.34,
> > > > > > > 18.26, 15.01
> > > > > > >     Tasks: 1226 total,   2 running, 1224 sleeping,   0 stopped,   0 zombie
> > > > > > >     %Cpu(s): 12.5 us,  4.7 sy,  0.0 ni, 82.1 id,  0.0 wa,  0.4 hi,
> > > > > > > 0.4 si,  0.0 st
> > > > > > >     MiB Mem : 1047265.+total,  28382.7 free, 1021308.+used,    767.6 buff/cache
> > > > > > >     MiB Swap:   8192.0 total,   8187.7 free,      4.2 used.  25956.7 avail Mem
> > > > > > >     ...
> > > > > > >         765 root      20   0       0      0      0 R  98.3   0.0
> > > > > > > 34969:04 kswapd3
> > > > > > >     ...
> > > > > > > 2/ swap space usage is low about ~4MB from 8GB as swap in zram (was
> > > > > > > observed with swap disk as well and cause IO latency issues due to
> > > > > > > some kind of locking)
> > > > > > > 3/ swap In/Out is huge and symmetrical ~12MB/s in and ~12MB/s out
> > > > > > >
> > > > > > >
> > > > > > > /B/ if mult-gen LRU is disabled
> > > > > > > 1/ [kswapd3] is consuming 3%-10% CPU
> > > > > > >     top - 15:02:49 up 34 days,  1:51,  2 users,  load average: 23.05,
> > > > > > > 17.77, 14.77
> > > > > > >     Tasks: 1226 total,   1 running, 1225 sleeping,   0 stopped,   0 zombie
> > > > > > >     %Cpu(s): 14.7 us,  2.8 sy,  0.0 ni, 81.8 id,  0.0 wa,  0.4 hi,
> > > > > > > 0.4 si,  0.0 st
> > > > > > >     MiB Mem : 1047265.+total,  28378.5 free, 1021313.+used,    767.3 buff/cache
> > > > > > >     MiB Swap:   8192.0 total,   8189.0 free,      3.0 used.  25952.4 avail Mem
> > > > > > >     ...
> > > > > > >        765 root      20   0       0      0      0 S   3.6   0.0
> > > > > > > 34966:46 [kswapd3]
> > > > > > >     ...
> > > > > > > 2/ swap space usage is low (4MB)
> > > > > > > 3/ swap In/Out is huge and symmetrical ~500kB/s in and ~500kB/s out
> > > > > > >
> > > > > > > Both situations are wrong as they are using swap in/out extensively,
> > > > > > > however the multi-gen LRU situation is 10times worse.
> > > > > >
> > > > > > From the stats below, node 3 had the lowest free memory. So I think in
> > > > > > both cases, the reclaim activities were as expected.
> > > > >
> > > > > I do not see a reason for the memory pressure and reclaims. This node
> > > > > has the lowest free memory of all nodes (~302MB free) that is true,
> > > > > however the swap space usage is just 4MB (still going in and out). So
> > > > > what can be the reason for that behaviour?
> > > >
> > > > The best analogy is that refuel (reclaim) happens before the tank
> > > > becomes empty, and it happens even sooner when there is a long road
> > > > ahead (high order allocations).
> > > >
> > > > > The workers/application is running in pre-allocated HugePages and the
> > > > > rest is used for a small set of system services and drivers of
> > > > > devices. It is static and not growing. The issue persists when I stop
> > > > > the system services and free the memory.
> > > >
> > > > Yes, this helps.
> > > >  Also could you attach /proc/buddyinfo from the moment
> > > > you hit the problem?
> > > >
> > >
> > > I can. The problem is continuous, it is 100% of time continuously
> > > doing in/out and consuming 100% of CPU and locking IO.
> > >
> > > The output of /proc/buddyinfo is:
> > >
> > > # cat /proc/buddyinfo
> > > Node 0, zone      DMA      7      2      2      1      1      2      1
> > >      1      1      2      1
> > > Node 0, zone    DMA32   4567   3395   1357    846    439    190     93
> > >     61     43     23      4
> > > Node 0, zone   Normal     19    190    140    129    136     75     66
> > >     41      9      1      5
> > > Node 1, zone   Normal    194   1210   2080   1800    715    255    111
> > >     56     42     36     55
> > > Node 2, zone   Normal    204    768   3766   3394   1742    468    185
> > >    194    238     47     74
> > > Node 3, zone   Normal   1622   2137   1058    846    388    208     97
> > >     44     14     42     10
> >
> > Again, thinking out loud: there is only one zone on node 3, i.e., the
> > normal zone, and this excludes the problem commit
> > 669281ee7ef731fb5204df9d948669bf32a5e68d ("Multi-gen LRU: fix per-zone
> > reclaim") fixed in v6.6.
>
> I built vanila 6.6.1 and did the first fast test - spin up and destroy
> VMs only - This test does not always trigger the kswapd3 continuous
> swap in/out  usage but it uses it and it  looks like there is a
> change:
>
>  I can see kswapd non-continous (15s and more) usage with 6.5.y
>  # ps ax | grep [k]swapd
>     753 ?        S      0:00 [kswapd0]
>     754 ?        S      0:00 [kswapd1]
>     755 ?        S      0:00 [kswapd2]
>     756 ?        S      0:15 [kswapd3]    <<<<<<<<<
>     757 ?        S      0:00 [kswapd4]
>     758 ?        S      0:00 [kswapd5]
>     759 ?        S      0:00 [kswapd6]
>     760 ?        S      0:00 [kswapd7]
>     761 ?        S      0:00 [kswapd8]
>     762 ?        S      0:00 [kswapd9]
>     763 ?        S      0:00 [kswapd10]
>     764 ?        S      0:00 [kswapd11]
>     765 ?        S      0:00 [kswapd12]
>     766 ?        S      0:00 [kswapd13]
>     767 ?        S      0:00 [kswapd14]
>     768 ?        S      0:00 [kswapd15]
>
> and none kswapd usage with 6.6.1, that looks to be promising path
>
> # ps ax | grep [k]swapd
>     808 ?        S      0:00 [kswapd0]
>     809 ?        S      0:00 [kswapd1]
>     810 ?        S      0:00 [kswapd2]
>     811 ?        S      0:00 [kswapd3]    <<<< nice
>     812 ?        S      0:00 [kswapd4]
>     813 ?        S      0:00 [kswapd5]
>     814 ?        S      0:00 [kswapd6]
>     815 ?        S      0:00 [kswapd7]
>     816 ?        S      0:00 [kswapd8]
>     817 ?        S      0:00 [kswapd9]
>     818 ?        S      0:00 [kswapd10]
>     819 ?        S      0:00 [kswapd11]
>     820 ?        S      0:00 [kswapd12]
>     821 ?        S      0:00 [kswapd13]
>     822 ?        S      0:00 [kswapd14]
>     823 ?        S      0:00 [kswapd15]
>
> I will install the 6.6.1 on the server which is doing some work and
> observe it later today.

Thanks. Fingers crossed.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux