On 22.01.19 11:37, Oscar Salvador wrote: > Hi, > > this is the v2 of the first RFC I sent back then in October [1]. > In this new version I tried to reduce the complexity as much as possible, > plus some clean ups. > > [Testing] > > I have tested it on "x86_64" (small/big memblocks) and on "powerpc". > On both architectures hot-add/hot-remove online/offline operations > worked as expected using vmemmap pages, I have not seen any issues so far. > I wanted to try it out on Hyper-V/Xen, but I did not manage to. > I plan to do so along this week (if time allows). > I would also like to test it on arm64, but I am not sure I can grab > an arm64 box anytime soon. > > [Coverletter]: > > This is another step to make the memory hotplug more usable. The primary > goal of this patchset is to reduce memory overhead of the hot added > memory (at least for SPARSE_VMEMMAP memory model). The current way we use > to populate memmap (struct page array) has two main drawbacks: > > a) it consumes an additional memory until the hotadded memory itself is > onlined and > b) memmap might end up on a different numa node which is especially true > for movable_node configuration. > > a) is problem especially for memory hotplug based memory "ballooning" > solutions when the delay between physical memory hotplug and the > onlining can lead to OOM and that led to introduction of hacks like auto > onlining (see 31bc3858ea3e ("memory-hotplug: add automatic onlining > policy for the newly added memory")). > > b) can have performance drawbacks. > > I have also seen hot-add operations failing on powerpc due to the fact > that we try to use order-8 pages when populating the memmap array. > Given 64KB base pagesize, that is 16MB. > If we run out of those, we just fail the operation and we cannot add > more memory. > We could fallback to base pages as x86_64 does, but we can do better. > > One way to mitigate all these issues is to simply allocate memmap array > (which is the largest memory footprint of the physical memory hotplug) > from the hotadded memory itself. VMEMMAP memory model allows us to map > any pfn range so the memory doesn't need to be online to be usable > for the array. See patch 3 for more details. In short I am reusing an > existing vmem_altmap which wants to achieve the same thing for nvdim > device memory. > I only had a quick glimpse. I would prefer if the caller of add_memory() can specify whether it would be ok to allocate vmmap from the range. This e.g. allows ACPI dimm code to allocate from the range, however other machanisms (XEN, hyper-v, virtio-mem) can allow it once they actually support it. Also, while s390x standby memory cannot support allocating from the range, virtio-mem could easily support it on s390x. Not sure how such an interface could look like, but I would really like to have control over that on the add_memory() interface, not per arch. -- Thanks, David / dhildenb