On Sat, Jun 29, 2024 at 08:44:11AM +0000, Wei Yang wrote: > On Sat, Jun 29, 2024 at 04:21:59AM +0100, Matthew Wilcox wrote: > >On Sat, Jun 29, 2024 at 01:33:22AM +0000, Wei Yang wrote: > >> +++ b/mm/page_alloc.c > >> @@ -1232,10 +1232,8 @@ void __meminit __free_pages_core(struct page *page, unsigned int order) > >> prefetchw(p); > >> for (loop = 0; loop < (nr_pages - 1); loop++, p++) { > >> prefetchw(p + 1); > >> - __ClearPageReserved(p); > >> set_page_count(p, 0); > >> } > >> - __ClearPageReserved(p); > >> set_page_count(p, 0); > > > >Is the prefetchw() still useful? Any remotely competent CPU should > >be able to figure out this loop ... > > Hi, Matthew > > Thanks for your question. But to be honest, I am not fully understand it. Let's try this question: If you remove the prefetchw() line, does the performance change? > Per my understanding, prefetchw() is trying to load data to cache before we > really accessing it. By doing so, we won't hit a cache miss when we really > need it. Yes, but the CPU can also do this by itself without needing an explicit hint from software. It can notice that we have a loop that's accessing successive cachelines for write. This is approximately the easiest prefetcher to design.