Re: [External] Re: [RFC PATCH 0/2] Add disable_unmap_file arg to memory.reclaim

Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx> · Thu, 29 Aug 2024 21:15:50 +0800

On Thu, Aug 29, 2024 at 7:51 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Thu 29-08-24 18:37:07, Zhongkun He wrote:
> > On Thu, Aug 29, 2024 at 6:24 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
> > >
> > > On Thu 29-08-24 18:19:16, Zhongkun He wrote:
> > > > This patch proposes augmenting the memory.reclaim interface with a
> > > > disable_unmap_file argument that will skip the mapped pages in
> > > > that reclaim attempt.
> > > >
> > > > For example:
> > > >
> > > > echo "2M disable_unmap_file" > /sys/fs/cgroup/test/memory.reclaim
> > > >
> > > > will perform reclaim on the test cgroup with no mapped file page.
> > > >
> > > > The memory.reclaim is a useful interface. We can carry out proactive
> > > > memory reclaim in the user space, which can increase the utilization
> > > > rate of memory.
> > > >
> > > > In the actual usage scenarios, we found that when there are sufficient
> > > > anonymous pages, mapped file pages with a relatively small proportion
> > > > would still be reclaimed. This is likely to cause an increase in
> > > > refaults and an increase in task delay, because mapped file pages
> > > > usually include important executable codes, data, and shared libraries,
> > > > etc. According to the verified situation, if we can skip this part of
> > > > the memory, the task delay will be reduced.
> > >
> > > Do you have examples of workloads where this is demonstrably helps and
> > > cannot be tuned via swappiness?
> >
> > Sorry, I put the test workload in the second patch. Please have a look.
>
> I have missed those as they are not threaded to the cover letter. You
> can either use --in-reply-to when sending patches separately from the
> cover letter or use can use --compose/--cover-leter when sending patches
> through git-send-email

Got it, thanks. I encountered a problem after sending the cover letter, so
I resent the others without --in-reply-to.

>
> > Even if there are sufficient anonymous pages and a small number of
> > page cache and mapped file pages, mapped file pages will still be reclaimed.
> > Here is an example of anonymous pages being sufficient but mapped
> > file pages still being reclaimed:
> > Swappiness has been set to the maximum value.
> >
> > cat memory.stat | grep -wE 'anon|file|file_mapped'
> > anon 3406462976
> > file 332967936
> > file_mapped 300302336
> >
> > echo 1g > memory.reclaim swappiness=200 > memory.reclaim
> > cat memory.stat | grep -wE 'anon|file|file_mapped'
> > anon 2613276672
> > file 52523008
> > file_mapped 30982144
>
> This seems to be 73% (ano) vs 27% (file) balance. 90% of the
> file LRU seems to be mapped which matches 90% of file LRU reclaimed
> memory to be mapped. So the reclaim is proportional there.
>
> But I do understand that this is still unexpected when swappiness=200
> should make reclaim anon oriented. Is this MGLRU or regular LRU
> implementation?
>

This is a regular LRU implementation and the MGLRU has the same questions
but performs better. Please have a look:

root@vm:/sys/fs/cgroup/test# cat /sys/kernel/mm/lru_gen/enabled
0x0007

root@vm:/sys/fs/cgroup/test# cat memory.stat | grep -wE 'anon|file|file_mapped'
anon 3310338048
file 293498880
file_mapped 273506304

root@vm:/sys/fs/cgroup/test# echo 1g > memory.reclaim swappiness=200 >
memory.reclaim

root@vm:/sys/fs/cgroup/test# cat memory.stat | grep -wE 'anon|file|file_mapped'
anon 2373173248
file 157233152
file_mapped 146173952

root@vm:/sys/fs/cgroup/test# echo 1g > memory.reclaim swappiness=200 >
memory.reclaim
root@vm:/sys/fs/cgroup/test# cat memory.stat | grep -wE 'anon|file|file_mapped'
anon 1370886144
file 85663744
file_mapped 78118912

> Is this some artificial workload or something real world?
>

This is an artificial workload to show the detail of this case more
easily. But we have encountered
this problem on our servers. If the performance of the disk is poor,
like HDD, the situation will
become even worse. The delay of the task becomes more serious because
reading data will be slower.
Hot pages will thrash repeatedly between the memory and the disk. At
this time, the pressure on the
disk will also be greater. If there are many tasks using this disk, it
will also affect other tasks.

That was the background of this case.

>
> --
> Michal Hocko
> SUSE Labs