On Fri, Mar 13, 2020 at 11:34 AM Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> wrote: > > When backporting commit 9c4e6b1a7027 ("mm, mlock, vmscan: no more > skipping pagevecs") to our 4.9 kernel, our test bench noticed around 10% > down with a couple of vm-scalability's test cases (lru-file-readonce, > lru-file-readtwice and lru-file-mmap-read). I didn't see that much down > on my VM (32c-64g-2nodes). It might be caused by the test configuration, > which is 32c-256g with NUMA disabled and the tests were run in root memcg, > so the tests actually stress only one inactive and active lru. It > sounds not very usual in mordern production environment. > > That commit did two major changes: > 1. Call page_evictable() > 2. Use smp_mb to force the PG_lru set visible > > It looks they contribute the most overhead. The page_evictable() is a > function which does function prologue and epilogue, and that was used by > page reclaim path only. However, lru add is a very hot path, so it > sounds better to make it inline. However, it calls page_mapping() which > is not inlined either, but the disassemble shows it doesn't do push and > pop operations and it sounds not very straightforward to inline it. > > Other than this, it sounds smp_mb() is not necessary for x86 since > SetPageLRU is atomic which enforces memory barrier already, replace it > with smp_mb__after_atomic() in the following patch. > > With the two fixes applied, the tests can get back around 5% on that > test bench and get back normal on my VM. Since the test bench > configuration is not that usual and I also saw around 6% up on the > latest upstream, so it sounds good enough IMHO. > > The below is test data (lru-file-readtwice throughput) against the v5.6-rc4: > mainline w/ inline fix > 150MB 154MB > What is the test setup for the above experiment? I would like to get a repro. > With this patch the throughput gets 2.67% up. The data with using > smp_mb__after_atomic() is showed in the following patch. > > Fixes: 9c4e6b1a7027 ("mm, mlock, vmscan: no more skipping pagevecs") > Cc: Shakeel Butt <shakeelb@xxxxxxxxxx> > Cc: Vlastimil Babka <vbabka@xxxxxxx> > Signed-off-by: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> > --- > include/linux/swap.h | 24 +++++++++++++++++++++++- > mm/vmscan.c | 23 ----------------------- > 2 files changed, 23 insertions(+), 24 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index 1e99f7a..297eb66 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -374,7 +374,29 @@ extern unsigned long mem_cgroup_shrink_node(struct mem_cgroup *mem, > #define node_reclaim_mode 0 > #endif > > -extern int page_evictable(struct page *page); > +/* > + * page_evictable - test whether a page is evictable > + * @page: the page to test > + * > + * Test whether page is evictable--i.e., should be placed on active/inactive > + * lists vs unevictable list. > + * > + * Reasons page might not be evictable: > + * (1) page's mapping marked unevictable > + * (2) page is part of an mlocked VMA > + * > + */ > +static inline int page_evictable(struct page *page) > +{ > + int ret; > + > + /* Prevent address_space of inode and swap cache from being freed */ > + rcu_read_lock(); > + ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page); > + rcu_read_unlock(); > + return ret; > +} > + > extern void check_move_unevictable_pages(struct pagevec *pvec); > > extern int kswapd_run(int nid); > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 8763705..855c395 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -4277,29 +4277,6 @@ int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order) > } > #endif > > -/* > - * page_evictable - test whether a page is evictable > - * @page: the page to test > - * > - * Test whether page is evictable--i.e., should be placed on active/inactive > - * lists vs unevictable list. > - * > - * Reasons page might not be evictable: > - * (1) page's mapping marked unevictable > - * (2) page is part of an mlocked VMA > - * > - */ > -int page_evictable(struct page *page) > -{ > - int ret; > - > - /* Prevent address_space of inode and swap cache from being freed */ > - rcu_read_lock(); > - ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page); > - rcu_read_unlock(); > - return ret; > -} > - > /** > * check_move_unevictable_pages - check pages for evictability and move to > * appropriate zone lru list > -- > 1.8.3.1 >