On Tue, Mar 16, 2021 at 04:34:37PM +0000, Matthew Wilcox wrote: > On Sat, Mar 13, 2021 at 12:57:44AM -0700, Yu Zhao wrote: > > In the page fault path, we want to add pages to the per-zone lists > > index by max_seq as they cannot be evicted without going through > > the aging first. For anon pages, we rename > > lru_cache_add_inactive_or_unevictable() to lru_cache_add_page_vma() > > and add a new parameter, which is set to true in the page fault path, > > to indicate whether they should be added to the per-zone lists index > > by max_seq. For page/swap cache, since we cannot differentiate the > > page fault path from the read ahead path at the time we call > > lru_cache_add() in add_to_page_cache_lru() and > > __read_swap_cache_async(), we have to add a new function > > lru_gen_activate_page(), which is essentially activate_page(), to move > > pages to the per-zone lists indexed by max_seq at a later time. > > Hopefully we would find pages we want to activate in lru_pvecs.lru_add > > and simply set PageActive() on them without having to actually move > > them. > > > > In the reclaim path, pages mapped around a referenced PTE may also > > have been referenced due to spatial locality. We add a new function > > lru_gen_scan_around() to scan the vicinity of such a PTE. > > > > In addition, we add a new function page_is_active() to tell whether a > > page is active. We cannot use PageActive() because it is only set on > > active pages while they are not on multigenerational lru. It is > > cleared while pages are on multigenerational lru, in order to spare > > the aging the trouble of clearing it when an active generation becomes > > inactive. Internally, page_is_active() compares the generation number > > of a page with max_seq and max_seq-1, which are active generations and > > protected from the eviction. Other generations, which may or may not > > exist, are inactive. > > If we go with this multi-LRU approach, it feels like PageActive and > PageInactive should go away as tests. We should have a LRU field in > the page flags with some special values: > > - Not managed through LRU list > - Not currently on any LRU list > - Unevictable > - Active list 1 > - Active list 2 > - ... > - Active list 5 > > Now you don't need any extra bits in the page flags. Or if you want to > have 13 lists instead of 5, you can use just one extra bit. I'm not > quite sure whether it makes sense to have that many lists, so I need > to try to understand that better. Yes, and this would be a lot cleaner. PG_{lru,unevictable,active, referenced,reclaim,workingset,young,idle} could all go away. Look how many bits we've added just for page reclaim. Sigh... > I'd like to echo the comments from others that it'd be nice to split apart > the multigenerational part of this and the physical scanning part of this. > It's possible they don't make performance sense without each other, > but from a review point of view, they seem entirely separate things. Thanks for noticing. I do plan to see if the page table scanning part could be better refactored. (I cut some corners by squashing it while rebasing to latest kernel.)