On Fri, Nov 27, 2020 at 04:15:36PM +0100, Michal Hocko wrote: > > Vmemap page tables can map arbitrary memory. > > That means that we can simply use the beginning of each memory section and > > map struct pages there. > > Did you mean each memory block rather than section? Yes, sorry, I did not update that part. > > struct pages which back the allocated space then just need to be treated > > carefully. > > > > Implementation wise we will reuse vmem_altmap infrastructure to override > > the default allocator used by __populate_section_memmap. Once the memmap is > > allocated, we are going to need a way to mark altmap pfns used for the allocation. > > If MHP_MEMMAP_ON_MEMORY flag was passed, we will set up the layout of the > > altmap structure in add_memory_resouce(), and then we will call > > mhp_mark_vmemmap_pages() to properly mark those pages. > > > > Online/Offline: > > > > In the memory_block structure, a new field is created in order to > > store the number of vmemmap_pages. > > Is this really needed? We know how many pfns are required for a block of > a specific size, right? > > I have only glanced through the patch so I might be missing something > but I am really wondering why you haven't chosen to use altmap directly > here. Well, this is my bad, I did not update the changelog wrt. to the previous version so it might be confusing. I will make sure to update it for the next submission, but let me explain it here to shed some light. We no longer need to use mhp_mark_vmemap_pages to mark vmemmap pages pages. Prior to online_pages(), the whole range is offline, so no one should be messing with any pages within that range. The initialization of the pages takes places in online_pages(): We have: start_pfn = first_pfn_of_the_range buddy_start_pfn = first_pfn_of_the_range + nr_vmemmap_pages We do have: + if (nr_vmemmap_pages) + move_pfn_range_to_zone(zone, pfn, nr_vmemmap_pages, NULL, + MIGRATE_UNMOVABLE); + move_pfn_range_to_zone(zone, buddy_start_pfn, buddy_nr_pages, NULL, + MIGRATE_ISOLATE); Now, all the range is initialized and marked PageReserved, but we only send to buddy (buddy_start_pfn, end_pfn]. And so, (start_pfn, buddy_start_pfn - 1] reamins PageReserved. And we know that pfn walkers to skip Reserved pages. About the altmap part. Altmap is used in the hot-add phase in add_memory_resource. The thing is, we could avoid adding the memory_block's nr_vmemmap_pages field , but we would have to mark the vmemmap pages as we used to do in previous implementations (see [1]), but I find this way cleaner, and it adds much less code (previous implementions can be see in [2]), and as a starter I find it much better. > It would be also good to describe how does a pfn walker recognize such a > page? Most of them will simply ignore it but e.g. hotplug walker will > need to skip over those because they are not preventing offlining as > they will go away with the memory block together. Wrt. hotplug walker, same as above, we only care to migrate the (buddy_start_pfn, end_pfn], so the first pfn to migrate and isolate is set to buddy_start_pfn. Other pfns walkers should merely skip vmemmap pages because they are Reserved. > Some basic description of testing done would be suitable as well. Well, that is: - Hot-add memory to a specific numa node - Online memory - numactl -H and /proc/zoneinfo reflects the truth and nr_vmemmap_pages are extracted where they have to be - Start a memory stress program and bind it to the numa node we added memory so we make sure it gets exercised - Wait for a while and when node's free pages have decreased considerably, offline memory - Check that memory went offline and check /proc/zoneinfo and numactl -H again - Hot-remove range [1] https://patchwork.kernel.org/project/linux-mm/patch/20201022125835.26396-3-osalvador@xxxxxxx/ [2] https://patchwork.kernel.org/project/linux-mm/cover/20201022125835.26396-1-osalvador@xxxxxxx/ -- Oscar Salvador SUSE L3