Thanks for replying. On Fri, Dec 22, 2023 at 13:14 PM David Rientjes wrote: > - is the lack of predeterministic charging a problem for you? Are you > initially faulting it in a manner that charges it to the "right" memcg > and the refault of it after periodic reclaim can causing the charge to > appear "randomly," i.e. to whichever process happened to access it > next? Actually at begin, all pages got charged to cgroup A, but with memory pressure or after proactive reclaim. Some pages would be dropped or swapped. Task in cgroup B visit this shared memory before task in cgroup A, would make these pages charged to cgroup B. This is common in our enviorment. > - are pages ever shared between different memcg hierarchies? You > mentioned sharing between processes in A and A/B, but I'm wondering > if there is sharing between two different memcg hierarchies where root > is the only common ancestor? Yes, there is a another really common case: If docker graph driver is overlayfs, different docker containers use the same image, or share same low layers, would share file cache of public bin or lib(i.e libc.so). > - do you anticipate a shorter scan period at some point? Proactively > reclaiming all memory colder than one hour is a long time :) Are you > concerned at all about the cost of doing your current idle bit > harvesting approach becoming too expensive if you significantly reduce > the scan period? We don't want the owner of the application to feel a significant performance downgrade when using swap. There is a high risk to reclaim pages which idle age are less than 1 hour. We have internal test and data analysis to support it. We disabled global swappiness and memcg swapinness. Only proactive reclaim can swap anon pages. What's more, we see that mglru has a more efficient way to scan pte access bit. We perferred to use mglru scan help us scan and select idle pages. > - is proactive reclaim being driven by writing to memory.reclaim, by > enforcing a smaller memory.high, or something else? Because all pages info and idle age are stored in userspace, kernel can't get these information directly. We have a private patch include a new reclaim interface to support reclaim pages with specific pfns. -- 2.43.0