On 03/13/2018 08:04 AM, Aaron Lu wrote: > On Tue, Mar 13, 2018 at 11:35:19AM +0800, Aaron Lu wrote: >> On Mon, Mar 12, 2018 at 10:32:32AM -0700, Dave Hansen wrote: >>> On 03/09/2018 12:24 AM, Aaron Lu wrote: >>>> + /* >>>> + * We are going to put the page back to the global >>>> + * pool, prefetch its buddy to speed up later access >>>> + * under zone->lock. It is believed the overhead of >>>> + * an additional test and calculating buddy_pfn here >>>> + * can be offset by reduced memory latency later. To >>>> + * avoid excessive prefetching due to large count, only >>>> + * prefetch buddy for the last pcp->batch nr of pages. >>>> + */ >>>> + if (count > pcp->batch) >>>> + continue; >>>> + pfn = page_to_pfn(page); >>>> + buddy_pfn = __find_buddy_pfn(pfn, 0); >>>> + buddy = page + (buddy_pfn - pfn); >>>> + prefetch(buddy); >>> >>> FWIW, I think this needs to go into a helper function. Is that possible? >> >> I'll give it a try. >> >>> >>> There's too much logic happening here. Also, 'count' going from >>> batch_size->0 is totally non-obvious from the patch context. It makes >>> this hunk look totally wrong by itself. > > I tried to avoid adding one more local variable but looks like it caused > a lot of pain. What about the following? It doesn't use count any more > but prefetch_nr to indicate how many prefetches have happened. > > Also, I think it's not worth the risk of disordering pages in free_list > by changing list_add_tail() to list_add() as Andrew reminded so I > dropped that change too. Looks fine, you can add Acked-by: Vlastimil Babka <vbabka@xxxxxxx> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index dafdcdec9c1f..00ea4483f679 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1099,6 +1099,15 @@ static bool bulkfree_pcp_prepare(struct page *page) > } > #endif /* CONFIG_DEBUG_VM */ > > +static inline void prefetch_buddy(struct page *page) > +{ > + unsigned long pfn = page_to_pfn(page); > + unsigned long buddy_pfn = __find_buddy_pfn(pfn, 0); > + struct page *buddy = page + (buddy_pfn - pfn); > + > + prefetch(buddy); > +} > + > /* > * Frees a number of pages from the PCP lists > * Assumes all pages on list are in same zone, and of same order. > @@ -1115,6 +1124,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, > { > int migratetype = 0; > int batch_free = 0; > + int prefetch_nr = 0; > bool isolated_pageblocks; > struct page *page, *tmp; > LIST_HEAD(head); > @@ -1150,6 +1160,18 @@ static void free_pcppages_bulk(struct zone *zone, int count, > continue; > > list_add_tail(&page->lru, &head); > + > + /* > + * We are going to put the page back to the global > + * pool, prefetch its buddy to speed up later access > + * under zone->lock. It is believed the overhead of > + * an additional test and calculating buddy_pfn here > + * can be offset by reduced memory latency later. To > + * avoid excessive prefetching due to large count, only > + * prefetch buddy for the first pcp->batch nr of pages. > + */ > + if (prefetch_nr++ < pcp->batch) > + prefetch_buddy(page); > } while (--count && --batch_free && !list_empty(list)); > } > >