On Thu, Jul 25, 2019 at 06:02:02PM +0200, Oscar Salvador wrote: > Here we go with v3. > > v3 -> v2: > * Rewrite about vmemmap pages handling. > Prior to this version, I was (ab)using hugepages fields > from struct page, while here I am officially adding a new > sub-page type with the fields I need. > > * Drop MHP_MEMMAP_{MEMBLOCK,DEVICE} in favor of MHP_MEMMAP_ON_MEMORY. > While I am still not 100% if this the right decision, and while I > still see some gaining in having MHP_MEMMAP_{MEMBLOCK,DEVICE}, > having only one flag ease the code. > If the user wants to allocate memmaps per memblock, it'll > have to call add_memory() variants with memory-block granularity. > > If we happen to have a more clear usecase MHP_MEMMAP_MEMBLOCK > flag in the future, so user does not have to bother about the way > it calls add_memory() variants, but only pass a flag, we can add it. > Actually, I already had the code, so add it in the future is going to be > easy. > > * Granularity check when hot-removing memory. > Just checking that the granularity is the same. > > [Testing] > > - x86_64: small and large memblocks (128MB, 1G and 2G) > > So far, only acpi memory hotplug uses the new flag. > The other callers can be changed depending on their needs. > > [Coverletter] > > This is another step to make memory hotplug more usable. The primary > goal of this patchset is to reduce memory overhead of the hot-added > memory (at least for SPARSEMEM_VMEMMAP memory model). The current way we use > to populate memmap (struct page array) has two main drawbacks: > > a) it consumes an additional memory until the hotadded memory itself is > onlined and > b) memmap might end up on a different numa node which is especially true > for movable_node configuration. > > a) it is a problem especially for memory hotplug based memory "ballooning" > solutions when the delay between physical memory hotplug and the > onlining can lead to OOM and that led to introduction of hacks like auto > onlining (see 31bc3858ea3e ("memory-hotplug: add automatic onlining > policy for the newly added memory")). > > b) can have performance drawbacks. > > One way to mitigate all these issues is to simply allocate memmap array > (which is the largest memory footprint of the physical memory hotplug) > from the hot-added memory itself. SPARSEMEM_VMEMMAP memory model allows > us to map any pfn range so the memory doesn't need to be online to be > usable for the array. See patch 3 for more details. > This feature is only usable when CONFIG_SPARSEMEM_VMEMMAP is set. > > [Overall design]: > > Implementation wise we reuse vmem_altmap infrastructure to override > the default allocator used by vmemap_populate. Once the memmap is > allocated we need a way to mark altmap pfns used for the allocation. > If MHP_MEMMAP_ON_MEMORY flag was passed, we set up the layout of the > altmap structure at the beginning of __add_pages(), and then we call > mark_vmemmap_pages(). > > MHP_MEMMAP_ON_MEMORY flag parameter will specify to allocate memmaps > from the hot-added range. > If callers wants memmaps to be allocated per memory block, it will > have to call add_memory() variants in memory-block granularity > spanning the whole range, while if it wants to allocate memmaps > per whole memory range, just one call will do. > > Want to add 384MB (3 sections, 3 memory-blocks) > e.g: > > add_memory(0x1000, size_memory_block); > add_memory(0x2000, size_memory_block); > add_memory(0x3000, size_memory_block); > > or > > add_memory(0x1000, size_memory_block * 3); > > One thing worth mention is that vmemmap pages residing in movable memory is not a > show-stopper for that memory to be offlined/migrated away. > Vmemmap pages are just ignored in that case and they stick around until sections > referred by those vmemmap pages are hot-removed. Gentle ping :-) -- Oscar Salvador SUSE L3