On Tue, Feb 18, 2025 at 09:57:06PM +0100, David Hildenbrand wrote: > > > > 2) if memmap_on_memory is on, and hotplug capacity (node1) is > > zone_movable - then each memory block (256MB) should appear > > as 252MB (-4MB of 64-byte page structs). For 256GB (my system) > > I should see a total of 252GB of onlined memory (-4GB of page struct) > > In memory_block_online(), we have: > > /* > * Account once onlining succeeded. If the zone was unpopulated, it is > * now already properly populated. > */ > if (nr_vmemmap_pages) > adjust_present_page_count(pfn_to_page(start_pfn), mem->group, > nr_vmemmap_pages); > I've validated the behavior on my system, I just mis-read my results. memmap_on_memory works as suggested. What's mildly confusing is for pages used for altmap to be accounted for as if it's an allocation in vmstat - but for that capacity to be chopped out of the memory-block (it "makes sense" it's just subtly misleading). I thought the system was saying i'd allocated memory (from the 'free' capacity) instead of just reducing capacity. Thank you for clearing this up. > > > > stupid question - it sorta seems like you'd want this as the default > > setting for driver-managed hotplug memory blocks, but I suppose for > > very small blocks there's problems (as described in the docs). > > The issue is that it is per-memblock. So you'll never have 1 GiB ranges > of consecutive usable memory (e.g., 1 GiB hugetlb page). > That makes sense, i had not considered this. Although it only applies for small blocks - which is basically an indictment of this suggestion: https://lore.kernel.org/linux-mm/20250127153405.3379117-1-gourry@xxxxxxxxxx/ So I'll have to consider this and whether this should be a default. It's probably this is enough to nak this entirely. ... that said .... Interestingly, when I tried allocating 1GiB hugetlb pages on a dax device in ZONE_MOVABLE (without memmap_on_memory) - the allocation fails silently regardless of block size (tried both 2GB and 256MB). I can't find a reason why this would be the case in the existing documentation. (note: hugepage migration is enabled in build config, so it's not that) If I enable one block (256MB) into ZONE_NORMAL, and the remainder in movable (with memmap_on_memory=n) the allocation still fails, and: nr_slab_unreclaimable 43 in node1/vmstat - where previously there was nothing. Onlining the dax devices into ZONE_NORMAL successfully allowed 1GiB huge pages to allocate. This used the /sys/bus/node/devices/node1/hugepages/* interfaces to test Using the /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages with interleave mempolicy - all hugepages end up on ZONE_NORMAL. (v6.13 base kernel) This behavior is *curious* to say the least. Not sure if bug, or some nuance missing from the documentation - but certainly glad I caught it. > I thought we had that? See MHP_MEMMAP_ON_MEMORY set by dax/kmem. > > IIRC, the global toggle must be enabled for the driver option to be considered. Oh, well, that's an extra layer I missed. So there's: build: CONFIG_MHP_MEMMAP_ON_MEMORY=y CONFIG_ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE=y global: /sys/module/memory_hotplug/parameters/memmap_on_memory device: /sys/bus/dax/devices/dax0.0/memmap_on_memory And looking at it - this does seem to be the default for dax. So I can drop the existing `nuance movable/memmap` section and just replace it with the hugetlb subtleties x_x. I appreciate the clarifications here, sorry for the incorrect info and the increasing confusing. ~Gregory