On Fri 22-03-19 11:49:30, Anshuman Khandual wrote: > > > On 03/21/2019 02:06 PM, Michal Hocko wrote: > > On Thu 21-03-19 13:38:20, Anshuman Khandual wrote: > >> Memory hot remove uses get_nid_for_pfn() while tearing down linked sysfs > >> entries between memory block and node. It first checks pfn validity with > >> pfn_valid_within() before fetching nid. With CONFIG_HOLES_IN_ZONE config > >> (arm64 has this enabled) pfn_valid_within() calls pfn_valid(). > >> > >> pfn_valid() is an arch implementation on arm64 (CONFIG_HAVE_ARCH_PFN_VALID) > >> which scans all mapped memblock regions with memblock_is_map_memory(). This > >> creates a problem in memory hot remove path which has already removed given > >> memory range from memory block with memblock_[remove|free] before arriving > >> at unregister_mem_sect_under_nodes(). > > > > Could you be more specific on what is the actual problem please? It > > would be also helpful to mention when is the memblock[remove|free] > > called actually. > > The problem is in unregister_mem_sect_under_nodes() as it skips calling into both > instances of sysfs_remove_link() which removes node-memory block sysfs symlinks. > The node enumeration of the memory block still remains in sysfs even if the memory > block itself has been removed. > > This happens because get_nid_for_pfn() returns -1 for a given pfn even if it has > a valid associated struct page to fetch the node ID from. > > On arm64 (with CONFIG_HOLES_IN_ZONE) > > get_nid_for_pfn() -> pfn_valid_within() -> pfn_valid -> memblock_is_map_memory() > > At this point memblock for the range has been removed. > > __remove_memory() > memblock_free() > memblock_remove() --------> memblock has already been removed > arch_remove_memory() > __remove_pages() > __remove_section() > unregister_memory_section() > remove_memory_section() > unregister_mem_sect_under_nodes() > > There is a dependency on memblock (after it has been removed) through pfn_valid(). Can we reorganize or rework the code that the memblock is removed later? I guess this is what Oscar was suggesting. Or ... > >> During runtime memory hot remove get_nid_for_pfn() needs to validate that > >> given pfn has a struct page mapping so that it can fetch required nid. This > >> can be achieved just by looking into it's section mapping information. This > >> adds a new helper pfn_section_valid() for this purpose. Its same as generic > >> pfn_valid(). > > > > I have to say I do not like this. Having pfn_section_valid != pfn_valid_within > > is just confusing as hell. pfn_valid_within should return true whenever > > a struct page exists and it is sensible (same like pfn_valid). So it > > seems that this is something to be solved on that arch specific side of > > pfn_valid. > > At present arm64's pfn_valid() implementation validates the pfn inside sparse > memory section mapping as well memblock. The memblock search excludes memory > with MEMBLOCK_NOMAP attribute. But in this particular instance during hotplug > only section mapping validation for the pfn is good enough. > > IIUC the current arm64 pfn_valid() already extends the definition beyond the > availability of a valid struct page to operate on. is there any way to record that nomap information into the section instead. It looks rather weird that this information is spread into two data structures and it makes pfn_valid more expensive at the same time. -- Michal Hocko SUSE Labs