On Wed, Jul 26, 2017 at 10:33:31AM +0200, Michal Hocko wrote: > From: Michal Hocko <mhocko@xxxxxxxx> > > Physical memory hotadd has to allocate a memmap (struct page array) for > the newly added memory section. kmalloc is currantly used for those > allocations. > > This has some disadvantages a) an existing memory is consumed for > that purpose (~2MB per 128MB memory section) and b) if the whole node > is movable then we have off-node struct pages which has performance > drawbacks. > > a) has turned out to be a problem for memory hotplug based ballooning > because the userspace might not react in time to online memory while > to memory consumed during physical hotadd consumes enough memory to push > system to OOM. 31bc3858ea3e ("memory-hotplug: add automatic onlining > policy for the newly added memory") has been added to workaround that > problem. > > We can do much better when CONFIG_SPARSEMEM_VMEMMAP=y because vmemap > page tables can map arbitrary memory. That means that we can simply > use the beginning of each memory section and map struct pages there. > struct pages which back the allocated space then just need to be treated > carefully so that we know they are not usable. > > Add {_Set,_Clear}PageVmemmap helpers to distinguish those pages in pfn > walkers. We do not have any spare page flag for this purpose so use the > combination of PageReserved bit which already tells that the page should > be ignored by the core mm code and store VMEMMAP_PAGE (which sets all > bits but PAGE_MAPPING_FLAGS) into page->mapping. > > On the memory hotplug front reuse vmem_altmap infrastructure to override > the default allocator used by __vmemap_populate. Once the memmap is > allocated we need a way to mark altmap pfns used for the allocation > and this is done by a new vmem_altmap::flush_alloc_pfns callback. > mark_vmemmap_pages implementation then simply __SetPageVmemmap all > struct pages backing those pfns. The callback is called from > sparse_add_one_section after the memmap has been initialized to 0. > > We also have to be careful about those pages during online and offline > operations. They are simply ignored. > > Finally __ClearPageVmemmap is called when the vmemmap page tables are > torn down. > > Please note that only the memory hotplug is currently using this > allocation scheme. The boot time memmap allocation could use the same > trick as well but this is not done yet. Which kernel are these patches based on? I tried linux-next and Linus' vanilla tree, however the series does not apply. In general I do like your idea, however if I understand your patches correctly we might have an ordering problem on s390: it is not possible to access hot-added memory on s390 before it is online (MEM_GOING_ONLINE succeeded). On MEM_GOING_ONLINE we ask the hypervisor to back the potential available hot-added memory region with physical pages. Accessing those ranges before that will result in an exception. However with your approach the memory is still allocated when add_memory() is being called, correct? That wouldn't be a change to the current behaviour; except for the ordering problem outlined above. Just trying to make sure I get this right :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>