On Fri, 2023-06-16 at 09:44 +0200, David Hildenbrand wrote: > On 16.06.23 00:00, Vishal Verma wrote: > > The dax/kmem driver can potentially hot-add large amounts of memory > > originating from CXL memory expanders, or NVDIMMs, or other 'device > > memories'. There is a chance there isn't enough regular system memory > > available to fit ythe memmap for this new memory. It's therefore > > desirable, if all other conditions are met, for the kmem managed memory > > to place its memmap on the newly added memory itself. > > > > Arrange for this by first allowing for a module parameter override for > > the mhp_supports_memmap_on_memory() test using a flag, adjusting the > > only other caller of this interface in dirvers/acpi/acpi_memoryhotplug.c, > > exporting the symbol so it can be called by kmem.c, and finally changing > > the kmem driver to add_memory() in chunks of memory_block_size_bytes(). > > 1) Why is the override a requirement here? Just let the admin configure > it then then add conditional support for kmem. Configure it in the current way using the module parameter to memory_hotplug? The whole module param check feels a bit awkward, especially since memory_hotplug is builtin, the only way to supply the param is on the kernel command line as opposed to a modprobe config. The goal with extending mhp_supports_memmap_on_memory() to check for support with or without consideration for the module param is that it makes it a bit more flexible for callers like kmem. > 2) I recall that there are cases where we don't want the memmap to land > on slow memory (which online_movable would achieve). Just imagine the > slow PMEM case. So this might need another configuration knob on the > kmem side. > > I recall some discussions on doing that chunk handling internally (so > kmem can just perform one add_memory() and we'd split that up internally). > Another config knob isn't unreasonable - though the thinking in making this behavior the new default policy was that with CXL based memory expanders, the performance delta from main memory wouldn't be as big as the pmem - main memory delta. With pmem devices being phased out, it's not clear we'd need a knob, and it can always be added if it ends up becoming necessary. The other comments about doing the per-memblock loop internally, and fixing up the removal paths all sound good, and I've started reworking those - thanks for taking a look!