RE: [PATCH part5 0/7] Arrange hotpluggable memory as ZONE_MOVABLE.

"Luck, Tony" <tony.luck@xxxxxxxxx> · Mon, 12 Aug 2013 20:49:42 +0000

>> This point, I don't quite agree. What you said is highly likely, but
>> not definitely. Users may find they lost hotpluggable memory.
>
> I'm having difficult time buying that.  NUMA node granularity is
> usually pretty large - it's in the range of gigabytes.  By comparison,
> the area occupied by the kernel image is *tiny* and it's just highly
> unlikely that allocating a bit more memory afterwards would lead to
> any meaningful difference in hotunplug support.  The amount of memory
> we're talking about is likely to be less than a meg, right?

Pretty safe to assume double-digit gigabytes for a removable chunk
(8G DIMMs are fast becoming standard, and there are typically 4 channels
to populate with at least one DIMM each). 16G and 32G DIMMs are pricey,
but moving in too.  So I don't think we need to assume that early allocations
are limited to some tiny amount measured in single digit megabytes. We'd
be safe even with some small number of gigabytes.

> I don't think it's a better solution.  It's fragile and fiddly and
> without much, if any, additional benefit.  Why should we do that when
> we can almost trivially solve the problem almost in memblock proper in
> a way which is completely firmware-agnostic?

So we do need to make sure that early memory allocations do happen from
the free areas adjacent to the kernel - and document that as a requirement
so we don't have people coming along later with a "allocate from top of memory
downwards" or other strategy that would break this assumption.  If we do that,
then I think I stand with Tejun that there is little benefit to parsing the SRAT
earlier.

The only fly I see in the ointment here is the crazy fragmentation of physical
memory below 4G on X86 systems.  Typically it will all be on the same node.
But I don't know if there is any specification that requires it be that way. If some
"helpful" OEM decided to make some "lowmem" (below 4G) be available on
every node, they might in theory do something truly awesomely strange.  But
even here - the granularity of such mappings tends to be large enough that
the "allocate near where the kernel was loaded" should still work to make those
allocations be on the same node for the "few megabytes" level of allocations.

-Tony

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href