On Tue, Jul 02, 2024 at 08:57:57AM +0200, David Hildenbrand wrote: > > I did 10 round bootup tests before and after this change, the data > > doesn't prove prefetchw() help speeding up bootmem freeing. The sum of > > the 10 round bootmem freeing time after prefetchw() removal even 5.2% > > faster than before. > > I suspect this is noise, though. I think it's real, though small. Each prefetchw() is an instruction, and if we can avoid issuing an instruction, we should. > Something like: > > for (;;) { > ... > if (++loop >= nr_pages) > break; > p++; > } > > > Might generate slightly better code, because we know that we execute the > loop body at least once. We use that in set_ptes(), for example. I don't think it's worth doing. Keep the loop simple and obvious. set_ptes() is different because we actually expect to execute the loop exactly once (ie most folios are small). So two compares per call to set_ptes() instead of one makes a difference. Here, we're expecting to execute this loop, what, a million times? Doing a million-and-one compares instead of a million makes no observable difference. I would like to see v2 of this patch dropped, please Andrew.