Re: [PATCH] x86/mm: use max memory block size with unaligned memory end

David Hildenbrand <david@xxxxxxxxxx> · Thu, 4 Jun 2020 19:45:40 +0200

On 04.06.20 19:22, Daniel Jordan wrote:
> On Thu, Jun 04, 2020 at 09:22:03AM +0200, David Hildenbrand wrote:
>> On 04.06.20 05:54, Daniel Jordan wrote:
>>> Some of our servers spend 14 out of the 21 seconds of kernel boot
>>> initializing memory block sysfs directories and then creating symlinks
>>> between them and the corresponding nodes.  The slowness happens because
>>> the machines get stuck with the smallest supported memory block size on
>>> x86 (128M), which results in 16,288 directories to cover the 2T of
>>> installed RAM, and each of these paths does a linear search of the
>>> memory blocks for every block id, with atomic ops at each step.
>>
>> With 4fb6eabf1037 ("drivers/base/memory.c: cache memory blocks in xarray
>> to accelerate lookup") merged by Linus' today (strange, I thought this
>> would be long upstream)
> 
> Ah, thanks for pointing this out!  It was only posted to LKML so I missed it.
> 
>> all linear searches should be gone and at least
>> the performance observation in this patch no longer applies.
> 
> The performance numbers as stated, that's certainly true, but this patch on top
> still improves kernel boot by 7%.  It's a savings of half a second -- I'll take
> it.
> 
> IMHO the root cause of this is really the small block size.  Building a cache
> on top to avoid iterating over tons of small blocks seems like papering over
> the problem, especially when one of the two affected paths in boot is a

The memory block size dictates your memory hot(un)plug granularity.
E.g., on powerpc that's 16MB so they have *a lot* of memory blocks.
That's why that's not papering over the problem. Increasing the memory
block size isn't always the answer.

(there are other, still fairly academic approaches to power down memory
banks where you also want small memory blocks instead)

> cautious check that might be ready to be removed by now[0]:

Yeah, we discussed that somewhere already. My change only highlighted
the problem. And now that it's cheap, it can just stay unless there is a
very good reason not to do it.

> 
>     static int init_memory_block(struct memory_block **memory,
>     			     unsigned long block_id, unsigned long state)
>     {
>             ...
>     	mem = find_memory_block_by_id(block_id);
>     	if (mem) {
>     		put_device(&mem->dev);
>     		return -EEXIST;
>     	}
> 
> Anyway, I guess I'll redo the changelog and post again.
> 
>> The memmap init should nowadays consume most time.
> 
> Yeah, but of course it's not as bad as it was now that it's fully parallelized.

Right. I also observed that computing if a zone is contiguous can be
expensive.

-- 
Thanks,

David / dhildenb