On Thu, Aug 29, 2024 at 7:51 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Thu 29-08-24 18:37:07, Zhongkun He wrote: > > On Thu, Aug 29, 2024 at 6:24 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > > > On Thu 29-08-24 18:19:16, Zhongkun He wrote: > > > > This patch proposes augmenting the memory.reclaim interface with a > > > > disable_unmap_file argument that will skip the mapped pages in > > > > that reclaim attempt. > > > > > > > > For example: > > > > > > > > echo "2M disable_unmap_file" > /sys/fs/cgroup/test/memory.reclaim > > > > > > > > will perform reclaim on the test cgroup with no mapped file page. > > > > > > > > The memory.reclaim is a useful interface. We can carry out proactive > > > > memory reclaim in the user space, which can increase the utilization > > > > rate of memory. > > > > > > > > In the actual usage scenarios, we found that when there are sufficient > > > > anonymous pages, mapped file pages with a relatively small proportion > > > > would still be reclaimed. This is likely to cause an increase in > > > > refaults and an increase in task delay, because mapped file pages > > > > usually include important executable codes, data, and shared libraries, > > > > etc. According to the verified situation, if we can skip this part of > > > > the memory, the task delay will be reduced. > > > > > > Do you have examples of workloads where this is demonstrably helps and > > > cannot be tuned via swappiness? > > > > Sorry, I put the test workload in the second patch. Please have a look. > > I have missed those as they are not threaded to the cover letter. You > can either use --in-reply-to when sending patches separately from the > cover letter or use can use --compose/--cover-leter when sending patches > through git-send-email Got it, thanks. I encountered a problem after sending the cover letter, so I resent the others without --in-reply-to. > > > Even if there are sufficient anonymous pages and a small number of > > page cache and mapped file pages, mapped file pages will still be reclaimed. > > Here is an example of anonymous pages being sufficient but mapped > > file pages still being reclaimed: > > Swappiness has been set to the maximum value. > > > > cat memory.stat | grep -wE 'anon|file|file_mapped' > > anon 3406462976 > > file 332967936 > > file_mapped 300302336 > > > > echo 1g > memory.reclaim swappiness=200 > memory.reclaim > > cat memory.stat | grep -wE 'anon|file|file_mapped' > > anon 2613276672 > > file 52523008 > > file_mapped 30982144 > > This seems to be 73% (ano) vs 27% (file) balance. 90% of the > file LRU seems to be mapped which matches 90% of file LRU reclaimed > memory to be mapped. So the reclaim is proportional there. > > But I do understand that this is still unexpected when swappiness=200 > should make reclaim anon oriented. Is this MGLRU or regular LRU > implementation? > This is a regular LRU implementation and the MGLRU has the same questions but performs better. Please have a look: root@vm:/sys/fs/cgroup/test# cat /sys/kernel/mm/lru_gen/enabled 0x0007 root@vm:/sys/fs/cgroup/test# cat memory.stat | grep -wE 'anon|file|file_mapped' anon 3310338048 file 293498880 file_mapped 273506304 root@vm:/sys/fs/cgroup/test# echo 1g > memory.reclaim swappiness=200 > memory.reclaim root@vm:/sys/fs/cgroup/test# cat memory.stat | grep -wE 'anon|file|file_mapped' anon 2373173248 file 157233152 file_mapped 146173952 root@vm:/sys/fs/cgroup/test# echo 1g > memory.reclaim swappiness=200 > memory.reclaim root@vm:/sys/fs/cgroup/test# cat memory.stat | grep -wE 'anon|file|file_mapped' anon 1370886144 file 85663744 file_mapped 78118912 > Is this some artificial workload or something real world? > This is an artificial workload to show the detail of this case more easily. But we have encountered this problem on our servers. If the performance of the disk is poor, like HDD, the situation will become even worse. The delay of the task becomes more serious because reading data will be slower. Hot pages will thrash repeatedly between the memory and the disk. At this time, the pressure on the disk will also be greater. If there are many tasks using this disk, it will also affect other tasks. That was the background of this case. > > -- > Michal Hocko > SUSE Labs