On Fri, Dec 8, 2023 at 12:12 AM Henry Huang <henry.hj@xxxxxxxxxxxx> wrote: > > Thanks for replying this RFC. > > > 1. page_idle/bitmap isn't a capable interface at all -- yes, Google > > proposed the idea [1], but we don't really use it anymore because of > > its poor scalability. > > In our environment, we use /sys/kernel/mm/page_idle/bitmap to check > pages whether were accessed during a peroid of time. Is it a production environment? If so, what's your 1. scan interval 2. memory size I'm trying to understand why scalability isn't a problem for you. On an average server, there are hundreds of millions of PFNs, so it'd be very expensive to use that ABI even for a time interval of minutes. > We manage all pages > idle time in userspace. Then use a prediction algorithm to select pages > to reclaim. These pages would more likely be idled for a long time. "There is a system in place now that is based on a user-space process that reads a bitmap stored in sysfs, but it has a high CPU and memory overhead, so a new approach is being tried." https://lwn.net/Articles/787611/ Could you elaborate how you solved this problem? > We only need kernel to tell use whether a page is accessed, a boolean > value in kernel is enough for our case. How do you define "accessed"? I.e., through page tables or file descriptors or both? > > 2. PG_idle/young, being a boolean value, has poor granularity. If > > anyone must use page_idle/bitmap for some specific reason, I'd > > recommend exporting generation numbers instead. > > Yes, at first time, we try using multi-gen LRU proactvie scan and > exporting generation&refs number to do the same thing. > > But there are serveral problems: > > 1. multi-gen LRU only care about self-memcg pages. In our environment, > it's likely to see that different memcg's process share pages. This is related to my question above: are those pages mapped into different memcgs or not? > multi-gen LRU only update gen of pages in current memcg. It's hard to > judge a page whether is accessed depends on gen update. This depends. I'd be glad to elaborate after you clarify the above. > We still have no ideas how to solve this problem. > > 2. We set swappiness 0, and use proactive scan to select cold pages > & proactive reclaim to swap anon pages. But we can't control passive > scan(can_swap = false), which would make anon pages cold/hot inversion > in inc_min_seq. There is an option to prevent the inversion, IIUC, the force_scan option is what you are looking for.