On Fri, Jan 07, 2022 at 10:06:15AM +0100, Michal Hocko wrote: > On Tue 04-01-22 13:22:24, Yu Zhao wrote: > > To exploit spatial locality, the aging prefers to walk page tables to > > search for young PTEs. And this patch paves the way for that. > > > > An mm_struct list is maintained for each memcg, and an mm_struct > > follows its owner task to the new memcg when this task is migrated. > > How does this work actually for the memcg reclaim? I can see you > lru_gen_migrate_mm on the task migration. My concern is, though, that > such a task leaves all the memory behind in the previous memcg (in > cgroup v2, in v1 you can opt in for charge migration). If you move the > mm to a new memcg then you age it somewhere where the memory is not > really consumed. There are two options to gather the accessed bit: page table walks and rmap walks. Page table walks sweep dense hotspots that are NOT misplaced in terms of reclaim scope (lruvec); rmap walks cover what page table walks miss, e.g., misplaced dense hotspots or sparse ones. Dense hotspots are stored in Bloom filters for each lruvec. If an mm leaves everything in the old memcg, page table walks in the new memcg reclaim path basically ignore this mm after the first scan, because everything is misplaced. In the old memcg reclaim path, page table walks won't see this mm at all. But rmap walks will catch everything later in the eviction path, i.e., lru_gen_look_around(). This function is less efficient compared with page table walks because, for each rmap walk of a non-shared page, it only can gather the accessed bit from 64 PTEs at most. But it's still a lot faster than the original rmap, which only gathers the accessed bit from a single PTE, for each walk of a non-shared page.