On 3/13/20 7:34 PM, Yang Shi wrote: > Memory barrier is needed after setting LRU bit, but smp_mb() is too > strong. Some architectures, i.e. x86, imply memory barrier with atomic > operations, so replacing it with smp_mb__after_atomic() sounds better, > which is nop on strong ordered machines, and full memory barriers on > others. With this change the vm-calability cases would perform better > on x86, I saw total 6% improvement with this patch and previous inline > fix. > > The test data (lru-file-readtwice throughput) against v5.6-rc4: > mainline w/ inline fix w/ both (adding this) > 150MB 154MB 159MB > > Fixes: 9c4e6b1a7027 ("mm, mlock, vmscan: no more skipping pagevecs") > Cc: Shakeel Butt <shakeelb@xxxxxxxxxx> > Cc: Vlastimil Babka <vbabka@xxxxxxx> > Signed-off-by: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> According to my understanding of Documentation/memory_barriers.txt this would be correct (but it might not say much :) Acked-by: Vlastimil Babka <vbabka@xxxxxxx> But i have some suggestions... > --- > mm/swap.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/mm/swap.c b/mm/swap.c > index cf39d24..118bac4 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -945,20 +945,20 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec, > * #0: __pagevec_lru_add_fn #1: clear_page_mlock > * > * SetPageLRU() TestClearPageMlocked() > - * smp_mb() // explicit ordering // above provides strict > + * MB() // explicit ordering // above provides strict Why MB()? That would be the first appareance of 'MB()' in the whole tree. I think it's fine keeping smp_mb()... > * // ordering > * PageMlocked() PageLRU() > * > * > * if '#1' does not observe setting of PG_lru by '#0' and fails > * isolation, the explicit barrier will make sure that page_evictable > - * check will put the page in correct LRU. Without smp_mb(), SetPageLRU > + * check will put the page in correct LRU. Without MB(), SetPageLRU ... same here ... > * can be reordered after PageMlocked check and can make '#1' to fail > * the isolation of the page whose Mlocked bit is cleared (#0 is also > * looking at the same page) and the evictable page will be stranded > * in an unevictable LRU. Only here I would note that SetPageLRU() is an atomic bitop so we can use the __after_atomic() variant. And I would move the actual SetPageLRU() call from above the comment here right before the barrier. > */ > - smp_mb(); > + smp_mb__after_atomic(); Thanks. > > if (page_evictable(page)) { > lru = page_lru(page); >