On Thu, Nov 26, 2020 at 02:39:03PM +0800, Alex Shi wrote: > > > 在 2020/11/26 下午12:52, Yu Zhao 写道: > >> */ > >> void __pagevec_lru_add(struct pagevec *pvec) > >> { > >> - int i; > >> - struct lruvec *lruvec = NULL; > >> + int i, nr_lruvec; > >> unsigned long flags = 0; > >> + struct page *page; > >> + struct lruvecs lruvecs; > >> > >> - for (i = 0; i < pagevec_count(pvec); i++) { > >> - struct page *page = pvec->pages[i]; > >> + nr_lruvec = sort_page_lruvec(&lruvecs, pvec); > > Simply looping pvec multiple times (15 at most) for different lruvecs > > would be better because 1) it requires no extra data structures and > > therefore has better cache locality (theoretically faster) 2) it only > > loops once when !CONFIG_MEMCG and !CONFIG_NUMA and therefore has no > > impact on Android and Chrome OS. > > > > With multiple memcgs, it do help a lot, I had gotten 30% grain on readtwice > case. but yes, w/o MEMCG and NUMA, it's good to keep old behavior. So > would you like has a proposal for this? Oh, no, I'm not against your idea. I was saying it doesn't seem necessary to sort -- a nested loop would just do the job given pagevec is small. diff --git a/mm/swap.c b/mm/swap.c index cb3794e13b48..1d238edc2907 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -996,15 +996,26 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec) */ void __pagevec_lru_add(struct pagevec *pvec) { - int i; + int i, j; struct lruvec *lruvec = NULL; unsigned long flags = 0; for (i = 0; i < pagevec_count(pvec); i++) { struct page *page = pvec->pages[i]; + if (!page) + continue; + lruvec = relock_page_lruvec_irqsave(page, lruvec, &flags); - __pagevec_lru_add_fn(page, lruvec); + + for (j = i; j < pagevec_count(pvec); j++) { + if (page_to_nid(pvec->pages[j]) != page_to_nid(page) || + page_memcg(pvec->pages[j]) != page_memcg(page)) + continue; + + __pagevec_lru_add_fn(pvec->pages[j], lruvec); + pvec->pages[j] = NULL; + } } if (lruvec) unlock_page_lruvec_irqrestore(lruvec, flags);