Re: CXL Boot to Bash - Section 3: Memory (block) Hotplug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 18.02.25 21:25, Gregory Price wrote:
On Tue, Feb 18, 2025 at 08:25:59PM +0100, David Hildenbrand wrote:
On 18.02.25 19:04, Gregory Price wrote:

Hm?

If you enable memmap_on_memory, we will place the memmap on that carved-out
region, independent of ZONE_NORMAL/ZONE_MOVABLE etc. It's the "altmap".

Reason that we can place the memmap on a ZONE_MOVABLE is because, although
it is "unmovable", we told memory offlining code that it doesn't have to
care about offlining that memmap carveout, there is no migration to be done.
Just offline the block (memmap gets stale) and remove that block (memmap
gets removed).

If there is a reason where we carve out the memmap and *not* use it, that
case must be fixed.


Hm, I managed to trace down the wrong path on this particular code.

I will go back and redo my tests to sanity check, but here's what I
would expect to see:

1) if memmap_on_memory is off, and hotplug capacity (node1) is
    zone_movable - then zone_normal (node0) should have N pages
    accounted in nr_memmap_pages

Right, we'll allocate the memmap from the buddy, which ends up
allocating from ZONE_NORMAL on that node.


    1a) when dropping these memory blocks, I should see node0 memory
        use drop by 4GB - since this is just GFP_KERNEL pages.

I assume you mean "when hotunplugging them". Yes, we should be freeing the memmap back to the buddy.


2) if memmap_on_memory is on, and hotplug capacity (node1) is
    zone_movable - then each memory block (256MB) should appear
    as 252MB (-4MB of 64-byte page structs).  For 256GB (my system)
    I should see a total of 252GB of onlined memory (-4GB of page struct)

In memory_block_online(), we have:

	/*
	 * Account once onlining succeeded. If the zone was unpopulated, it is
	 * now already properly populated.
	 */
	if (nr_vmemmap_pages)
		adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
					  nr_vmemmap_pages);

So we'd add the vmemmap pages to
* zone->present_pages
* zone->zone_pgdat->node_present_pages

(mhp_init_memmap_on_memory() moved the vmemmap pages to ZONE_MOVABLE)

However, we don't add them to
* zone->managed_pages
* totalram pages

/proc/zoneinfo would show them as present but not managed.
/proc/meminfo would not include them in MemTotal

We could adjust the latter two, if there is a problem.
(just needs some adjust_managed_page_count() calls)

So yes, staring at MemTotal, you should see an increase by 252 MiB right now.


    2a) when dropping these memory blocks, I should see node0 memory use
        stay the same - since it was vmemmap usage.

Yes.


I will double check that this isn't working as expected, and i'll double
check for a build option as well.

stupid question - it sorta seems like you'd want this as the default
setting for driver-managed hotplug memory blocks, but I suppose for
very small blocks there's problems (as described in the docs).

The issue is that it is per-memblock. So you'll never have 1 GiB ranges
of consecutive usable memory (e.g., 1 GiB hugetlb page).


:thinking: - is it silly to suggest maybe a per-driver memmap_on_memory
setting rather than just a global setting?  For CXL capacity, this seems
like a no-brainer since blocks can't be smaller than 256MB (per spec).

I thought we had that? See MHP_MEMMAP_ON_MEMORY set by dax/kmem.

IIRC, the global toggle must be enabled for the driver option to be considered.

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux