On Sat, Jun 29, 2024 at 05:45:53PM +0100, Matthew Wilcox wrote: >On Sat, Jun 29, 2024 at 05:28:34PM +0100, Matthew Wilcox wrote: >> On Sat, Jun 29, 2024 at 08:44:11AM +0000, Wei Yang wrote: >> > Per my understanding, prefetchw() is trying to load data to cache before we >> > really accessing it. By doing so, we won't hit a cache miss when we really >> > need it. >> >> Yes, but the CPU can also do this by itself without needing an explicit >> hint from software. It can notice that we have a loop that's accessing >> successive cachelines for write. This is approximately the easiest >> prefetcher to design. This is my first intuition on reading the code, while I *believed* there is a reason for the prefetchw() to be put here. > >I tracked down prefetchw() being added: > >commit 3b901ea58a56 >Author: Josh Aas <josha@xxxxxxx> >Date: Mon Aug 23 21:26:54 2004 -0700 > > [PATCH] improve speed of freeing bootmem > > Attached is a patch that greatly improves the speed of freeing boot memory. > On ia64 machines with 2GB or more memory (I didn't test with less, but I > can't imagine there being a problem), the speed improvement is about 75% > for the function free_all_bootmem_core. This translates to savings on the > order of 1 minute / TB of memory during boot time. That number comes from > testing on a machine with 512GB, and extrapolating based on profiling of an > unpatched 4TB machine. For 4 and 8 TB machines, the time spent in this > function is about 1 minutes/TB, which is painful especially given that > there is no indication of what is going on put to the console (this issue > to possibly be addressed later). > > The basic idea is to free higher order pages instead of going through every > single one. Also, some unnecessary atomic operations are done away with > and replaced with non-atomic equivalents, and prefetching is done where it > helps the most. For a more in-depth discusion of this patch, please see > the linux-ia64 archives (topic is "free bootmem feedback patch"). > >(quoting the entire commit message because it's buried in linux-fullhistory, >being a pre-git patch). For the thread he's referring to, see >https://lore.kernel.org/linux-ia64/40F46962.4090604@xxxxxxx/ > Thanks for digging the ancient history. And now I know the linux-fullhistory tree :-) >Itanium CPUs of this era had no prefetchers. Oops, so sad to hear it. So it looks the most helpful change is: * free higher order pages * non-atomic equivalent A quick test on my 6G qemu virtual machine shows the prefetchw() here seems not helpful. After removal, the meminit process even a little bit faster. From 135ms to 121ms as the sum of 3 times bootup test. If you think it is ok, I would wrap up one patch on current mm-stable for more audience to review. -- Wei Yang Help you, Help me