On 10.06.20 09:20, David Hildenbrand wrote: > On 10.06.20 00:54, Daniel Jordan wrote: >> Some of our servers spend significant time at kernel boot initializing >> memory block sysfs directories and then creating symlinks between them >> and the corresponding nodes. The slowness happens because the machines >> get stuck with the smallest supported memory block size on x86 (128M), >> which results in 16,288 directories to cover the 2T of installed RAM. >> The search for each memory block is noticeable even with >> commit 4fb6eabf1037 ("drivers/base/memory.c: cache memory blocks in >> xarray to accelerate lookup"). >> >> Commit 078eb6aa50dc ("x86/mm/memory_hotplug: determine block size based >> on the end of boot memory") chooses the block size based on alignment >> with memory end. That addresses hotplug failures in qemu guests, but >> for bare metal systems whose memory end isn't aligned to even the >> smallest size, it leaves them at 128M. >> >> Make kernels that aren't running on a hypervisor use the largest >> supported size (2G) to minimize overhead on big machines. Kernel boot >> goes 7% faster on the aforementioned servers, shaving off half a second. >> >> Signed-off-by: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> >> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >> Cc: Andy Lutomirski <luto@xxxxxxxxxx> >> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> >> Cc: David Hildenbrand <david@xxxxxxxxxx> >> Cc: Michal Hocko <mhocko@xxxxxxxxxx> >> Cc: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> >> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> >> Cc: Steven Sistare <steven.sistare@xxxxxxxxxx> >> Cc: linux-mm@xxxxxxxxx >> Cc: linux-kernel@xxxxxxxxxxxxxxx >> --- >> >> Applies to 5.7 and today's mainline >> >> arch/x86/mm/init_64.c | 10 ++++++++++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c >> index 8b5f73f5e207c..906fbdb060748 100644 >> --- a/arch/x86/mm/init_64.c >> +++ b/arch/x86/mm/init_64.c >> @@ -55,6 +55,7 @@ >> #include <asm/uv/uv.h> >> #include <asm/setup.h> >> #include <asm/ftrace.h> >> +#include <asm/hypervisor.h> >> >> #include "mm_internal.h" >> >> @@ -1390,6 +1391,15 @@ static unsigned long probe_memory_block_size(void) >> goto done; >> } >> >> + /* >> + * Use max block size to minimize overhead on bare metal, where >> + * alignment for memory hotplug isn't a concern. >> + */ >> + if (hypervisor_is_type(X86_HYPER_NATIVE)) { >> + bz = MAX_BLOCK_SIZE; >> + goto done; >> + } > > I'd assume that bioses on physical machines >= 64GB will not align > bigger (>= 2GB) DIMMs to something < 2GB. > > Acked-by: David Hildenbrand <david@xxxxxxxxxx> FTWT, setup_arch() does the init_hypervisor_platform() call. I assume that should be early enough. We should really look into factoring out memory_block_size_bytes() into common code, turning into a simple global variable read. Then, we should provide an interface to configure the memory block size during boot from arch code (set_memory_block_size()). -- Thanks, David / dhildenb