On 03/22/2019 05:32 PM, Michal Hocko wrote: > On Fri 22-03-19 11:49:30, Anshuman Khandual wrote: >> >> On 03/21/2019 02:06 PM, Michal Hocko wrote: >>> On Thu 21-03-19 13:38:20, Anshuman Khandual wrote: >>>> Memory hot remove uses get_nid_for_pfn() while tearing down linked sysfs >>>> entries between memory block and node. It first checks pfn validity with >>>> pfn_valid_within() before fetching nid. With CONFIG_HOLES_IN_ZONE config >>>> (arm64 has this enabled) pfn_valid_within() calls pfn_valid(). >>>> >>>> pfn_valid() is an arch implementation on arm64 (CONFIG_HAVE_ARCH_PFN_VALID) >>>> which scans all mapped memblock regions with memblock_is_map_memory(). This >>>> creates a problem in memory hot remove path which has already removed given >>>> memory range from memory block with memblock_[remove|free] before arriving >>>> at unregister_mem_sect_under_nodes(). >>> Could you be more specific on what is the actual problem please? It >>> would be also helpful to mention when is the memblock[remove|free] >>> called actually. >> The problem is in unregister_mem_sect_under_nodes() as it skips calling into both >> instances of sysfs_remove_link() which removes node-memory block sysfs symlinks. >> The node enumeration of the memory block still remains in sysfs even if the memory >> block itself has been removed. >> >> This happens because get_nid_for_pfn() returns -1 for a given pfn even if it has >> a valid associated struct page to fetch the node ID from. >> >> On arm64 (with CONFIG_HOLES_IN_ZONE) >> >> get_nid_for_pfn() -> pfn_valid_within() -> pfn_valid -> memblock_is_map_memory() >> >> At this point memblock for the range has been removed. >> >> __remove_memory() >> memblock_free() >> memblock_remove() --------> memblock has already been removed >> arch_remove_memory() >> __remove_pages() >> __remove_section() >> unregister_memory_section() >> remove_memory_section() >> unregister_mem_sect_under_nodes() >> >> There is a dependency on memblock (after it has been removed) through pfn_valid(). > Can we reorganize or rework the code that the memblock is removed later? > I guess this is what Oscar was suggesting. I could get it working with the following re-order of memblock_[free|remove] and arch_remove_memory(). I did not observe any other adverse side affect because of this change. Does it look okay ? --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1863,11 +1863,11 @@ void __ref __remove_memory(int nid, u64 start, u64 size) /* remove memmap entry */ firmware_map_remove(start, start + size, "System RAM"); + arch_remove_memory(nid, start, size, NULL); + memblock_free(start, size); memblock_remove(start, size); - arch_remove_memory(nid, start, size, NULL); - try_offline_node(nid); mem_hotplug_done();