On Sat, Jun 29, 2024 at 05:28:34PM +0100, Matthew Wilcox wrote: > On Sat, Jun 29, 2024 at 08:44:11AM +0000, Wei Yang wrote: > > Per my understanding, prefetchw() is trying to load data to cache before we > > really accessing it. By doing so, we won't hit a cache miss when we really > > need it. > > Yes, but the CPU can also do this by itself without needing an explicit > hint from software. It can notice that we have a loop that's accessing > successive cachelines for write. This is approximately the easiest > prefetcher to design. I tracked down prefetchw() being added: commit 3b901ea58a56 Author: Josh Aas <josha@xxxxxxx> Date: Mon Aug 23 21:26:54 2004 -0700 [PATCH] improve speed of freeing bootmem Attached is a patch that greatly improves the speed of freeing boot memory. On ia64 machines with 2GB or more memory (I didn't test with less, but I can't imagine there being a problem), the speed improvement is about 75% for the function free_all_bootmem_core. This translates to savings on the order of 1 minute / TB of memory during boot time. That number comes from testing on a machine with 512GB, and extrapolating based on profiling of an unpatched 4TB machine. For 4 and 8 TB machines, the time spent in this function is about 1 minutes/TB, which is painful especially given that there is no indication of what is going on put to the console (this issue to possibly be addressed later). The basic idea is to free higher order pages instead of going through every single one. Also, some unnecessary atomic operations are done away with and replaced with non-atomic equivalents, and prefetching is done where it helps the most. For a more in-depth discusion of this patch, please see the linux-ia64 archives (topic is "free bootmem feedback patch"). (quoting the entire commit message because it's buried in linux-fullhistory, being a pre-git patch). For the thread he's referring to, see https://lore.kernel.org/linux-ia64/40F46962.4090604@xxxxxxx/ Itanium CPUs of this era had no prefetchers.