The patch titled Subject: mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock has been added to the -mm tree. Its filename is mm-memory_hotplug-fix-online-offline_pages-called-wo-mem_hotplug_lock.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-memory_hotplug-fix-online-offline_pages-called-wo-mem_hotplug_lock.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-memory_hotplug-fix-online-offline_pages-called-wo-mem_hotplug_lock.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: David Hildenbrand <david@xxxxxxxxxx> Subject: mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock There seem to be some problems as result of 30467e0b3be ("mm, hotplug: fix concurrent memory hot-add deadlock"), which tried to fix a possible lock inversion reported and discussed in [1] due to the two locks a) device_lock() b) mem_hotplug_lock While add_memory() first takes b), followed by a) during bus_probe_device(), onlining of memory from user space first took a), followed by b), exposing a possible deadlock. In [1], and it was decided to not make use of device_hotplug_lock, but rather to enforce a locking order. The problems I spotted related to this: 1. Memory block device attributes: While .state first calls mem_hotplug_begin() and the calls device_online() - which takes device_lock() - .online does no longer call mem_hotplug_begin(), so effectively calls online_pages() without mem_hotplug_lock. 2. device_online() should be called under device_hotplug_lock, however onlining memory during add_memory() does not take care of that. In addition, I think there is also something wrong about the locking in 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages() without locks. This was introduced after 30467e0b3be. And skimming over the code, I assume it could need some more care in regards to locking (e.g. device_online() called without device_hotplug_lock. This will be addressed in the following patches. Now that we hold the device_hotplug_lock when - adding memory (e.g. via add_memory()/add_memory_resource()) - removing memory (e.g. via remove_memory()) - device_online()/device_offline() We can move mem_hotplug_lock usage back into online_pages()/offline_pages(). Why is mem_hotplug_lock still needed? Essentially to make get_online_mems()/put_online_mems() be very fast (relying on device_hotplug_lock would be very slow), and to serialize against addition of memory that does not create memory block devices (hmm). [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/ 2015-February/065324.html This patch is partly based on a patch by Vitaly Kuznetsov. Link: http://lkml.kernel.org/r/20180925091457.28651-4-david@xxxxxxxxxx Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> Reviewed-by: Pavel Tatashin <pavel.tatashin@xxxxxxxxxxxxx> Reviewed-by: Rashmica Gupta <rashmica.g@xxxxxxxxx> Cc: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> Cc: Paul Mackerras <paulus@xxxxxxxxx> Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx> Cc: "Rafael J. Wysocki" <rjw@xxxxxxxxxxxxx> Cc: Len Brown <lenb@xxxxxxxxxx> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> Cc: "K. Y. Srinivasan" <kys@xxxxxxxxxxxxx> Cc: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> Cc: Stephen Hemminger <sthemmin@xxxxxxxxxxxxx> Cc: Martin Schwidefsky <schwidefsky@xxxxxxxxxx> Cc: Heiko Carstens <heiko.carstens@xxxxxxxxxx> Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx> Cc: Juergen Gross <jgross@xxxxxxxx> Cc: Rashmica Gupta <rashmica.g@xxxxxxxxx> Cc: Michael Neuling <mikey@xxxxxxxxxxx> Cc: Balbir Singh <bsingharora@xxxxxxxxx> Cc: Kate Stewart <kstewart@xxxxxxxxxxxxxxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Philippe Ombredanne <pombredanne@xxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxx> Cc: Pavel Tatashin <pavel.tatashin@xxxxxxxxxxxxx> Cc: Vlastimil Babka <vbabka@xxxxxxx> Cc: Dan Williams <dan.j.williams@xxxxxxxxx> Cc: Oscar Salvador <osalvador@xxxxxxx> Cc: YASUAKI ISHIMATSU <yasu.isimatu@xxxxxxxxx> Cc: Mathieu Malaterre <malat@xxxxxxxxxx> Cc: John Allen <jallen@xxxxxxxxxxxxxxxxxx> Cc: Jonathan Corbet <corbet@xxxxxxx> Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> Cc: Nathan Fontenot <nfont@xxxxxxxxxxxxxxxxxx> Cc: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> Cc: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- drivers/base/memory.c | 13 +------------ mm/memory_hotplug.c | 28 ++++++++++++++++++++-------- 2 files changed, 21 insertions(+), 20 deletions(-) --- a/drivers/base/memory.c~mm-memory_hotplug-fix-online-offline_pages-called-wo-mem_hotplug_lock +++ a/drivers/base/memory.c @@ -228,7 +228,6 @@ static bool pages_correctly_probed(unsig /* * MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is * OK to have direct references to sparsemem variables in here. - * Must already be protected by mem_hotplug_begin(). */ static int memory_block_action(unsigned long phys_index, unsigned long action, int online_type) @@ -294,7 +293,6 @@ static int memory_subsys_online(struct d if (mem->online_type < 0) mem->online_type = MMOP_ONLINE_KEEP; - /* Already under protection of mem_hotplug_begin() */ ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE); /* clear online_type */ @@ -341,19 +339,11 @@ store_mem_state(struct device *dev, goto err; } - /* - * Memory hotplug needs to hold mem_hotplug_begin() for probe to find - * the correct memory block to online before doing device_online(dev), - * which will take dev->mutex. Take the lock early to prevent an - * inversion, memory_subsys_online() callbacks will be implemented by - * assuming it's already protected. - */ - mem_hotplug_begin(); - switch (online_type) { case MMOP_ONLINE_KERNEL: case MMOP_ONLINE_MOVABLE: case MMOP_ONLINE_KEEP: + /* mem->online_type is protected by device_hotplug_lock */ mem->online_type = online_type; ret = device_online(&mem->dev); break; @@ -364,7 +354,6 @@ store_mem_state(struct device *dev, ret = -EINVAL; /* should never happen */ } - mem_hotplug_done(); err: unlock_device_hotplug(); --- a/mm/memory_hotplug.c~mm-memory_hotplug-fix-online-offline_pages-called-wo-mem_hotplug_lock +++ a/mm/memory_hotplug.c @@ -838,7 +838,6 @@ static struct zone * __meminit move_pfn_ return zone; } -/* Must be protected by mem_hotplug_begin() or a device_lock */ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_type) { unsigned long flags; @@ -850,6 +849,8 @@ int __ref online_pages(unsigned long pfn struct memory_notify arg; struct memory_block *mem; + mem_hotplug_begin(); + /* * We can't use pfn_to_nid() because nid might be stored in struct page * which is not yet initialized. Instead, we find nid from memory block. @@ -914,6 +915,7 @@ int __ref online_pages(unsigned long pfn if (onlined_pages) memory_notify(MEM_ONLINE, &arg); + mem_hotplug_done(); return 0; failed_addition: @@ -921,6 +923,7 @@ failed_addition: (unsigned long long) pfn << PAGE_SHIFT, (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); memory_notify(MEM_CANCEL_ONLINE, &arg); + mem_hotplug_done(); return ret; } #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */ @@ -1125,20 +1128,20 @@ int __ref add_memory_resource(int nid, s /* create new memmap entry */ firmware_map_add_hotplug(start, start + size, "System RAM"); + /* device_online() will take the lock when calling online_pages() */ + mem_hotplug_done(); + /* online pages if requested */ if (online) walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL, online_memory_block); - goto out; - + return ret; error: /* rollback pgdat allocation and others */ if (new_node) rollback_node_hotadd(nid); memblock_remove(start, size); - -out: mem_hotplug_done(); return ret; } @@ -1555,10 +1558,16 @@ static int __ref __offline_pages(unsigne return -EINVAL; if (!IS_ALIGNED(end_pfn, pageblock_nr_pages)) return -EINVAL; + + mem_hotplug_begin(); + /* This makes hotplug much easier...and readable. we assume this for now. .*/ - if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start, &valid_end)) + if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start, + &valid_end)) { + mem_hotplug_done(); return -EINVAL; + } zone = page_zone(pfn_to_page(valid_start)); node = zone_to_nid(zone); @@ -1567,8 +1576,10 @@ static int __ref __offline_pages(unsigne /* set above range as isolated */ ret = start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE, true); - if (ret) + if (ret) { + mem_hotplug_done(); return ret; + } arg.start_pfn = start_pfn; arg.nr_pages = nr_pages; @@ -1639,6 +1650,7 @@ repeat: writeback_set_ratelimit(); memory_notify(MEM_OFFLINE, &arg); + mem_hotplug_done(); return 0; failed_removal: @@ -1648,10 +1660,10 @@ failed_removal: memory_notify(MEM_CANCEL_OFFLINE, &arg); /* pushback to free area */ undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); + mem_hotplug_done(); return ret; } -/* Must be protected by mem_hotplug_begin() or a device_lock */ int offline_pages(unsigned long start_pfn, unsigned long nr_pages) { return __offline_pages(start_pfn, start_pfn + nr_pages); _ Patches currently in -mm which might be from david@xxxxxxxxxx are mm-memory_hotplug-make-remove_memory-take-the-device_hotplug_lock.patch mm-memory_hotplug-make-add_memory-take-the-device_hotplug_lock.patch mm-memory_hotplug-fix-online-offline_pages-called-wo-mem_hotplug_lock.patch powerpc-powernv-hold-device_hotplug_lock-when-calling-device_online.patch powerpc-powernv-hold-device_hotplug_lock-when-calling-memtrace_offline_pages.patch memory-hotplugtxt-add-some-details-about-locking-internals.patch