In some LPAR migration scenarios, device-tree modifications are made to the affinity of the memory in the system. For instance, it may occur that memory is installed to nodes 0,3 on a source system, and to nodes 0,2 on a target system. Node 2 may not have been initialized/allocated on the target system. After migration, if a RTAS PRRN memory remove is made to a memory block that was in node 3 on the source system, then try_offline_node tries to remove it from node 2 on the target. The NODE_DATA(2) block would not be initialized on the target, and there is no validation check in the current code to prevent the use of a NULL pointer. Call traces such as the following may be observed: A similar problem of moving memory to an unitialized node has also been observed on systems where multiple PRRN events occur prior to a complete update of the device-tree. pseries-hotplug-mem: Attempting to update LMB, drc index 80000002 Offlined Pages 4096 ... Oops: Kernel access of bad area, sig: 11 [#1] ... Workqueue: pseries hotplug workque pseries_hp_work_fn ... NIP [c0000000002bc088] try_offline_node+0x48/0x1e0 LR [c0000000002e0b84] remove_memory+0xb4/0xf0 Call Trace: [c0000002bbee7a30] [c0000002bbee7a70] 0xc0000002bbee7a70 (unreliable) [c0000002bbee7a70] [c0000000002e0b84] remove_memory+0xb4/0xf0 [c0000002bbee7ab0] [c000000000097784] dlpar_remove_lmb+0xb4/0x160 [c0000002bbee7af0] [c000000000097f38] dlpar_memory+0x328/0xcb0 [c0000002bbee7ba0] [c0000000000906d0] handle_dlpar_errorlog+0xc0/0x130 [c0000002bbee7c10] [c0000000000907d4] pseries_hp_work_fn+0x94/0xa0 [c0000002bbee7c40] [c0000000000e1cd0] process_one_work+0x1a0/0x4e0 [c0000002bbee7cd0] [c0000000000e21b0] worker_thread+0x1a0/0x610 [c0000002bbee7d80] [c0000000000ea458] kthread+0x128/0x150 [c0000002bbee7e30] [c00000000000982c] ret_from_kernel_thread+0x5c/0xb0 This patch adds a check for an incorrectly initialized to the beginning of try_offline_node, and exits the routine. Another patch is being developed for powerpc to track the node Id to which an LMB belongs, so that we can remove the LMB from there instead of the nid as currently interpreted from the device tree. Signed-off-by: Michael Bringmann <mwb@xxxxxxxxxxxxxxxxxx> --- mm/memory_hotplug.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 38d94b7..e48a4d0 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1831,10 +1831,16 @@ static int check_and_unmap_cpu_on_node(pg_data_t *pgdat) void try_offline_node(int nid) { pg_data_t *pgdat = NODE_DATA(nid); - unsigned long start_pfn = pgdat->node_start_pfn; - unsigned long end_pfn = start_pfn + pgdat->node_spanned_pages; + unsigned long start_pfn; + unsigned long end_pfn; unsigned long pfn; + if (WARN_ON(pgdat == NULL)) + return; + + start_pfn = pgdat->node_start_pfn; + end_pfn = start_pfn + pgdat->node_spanned_pages; + for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) { unsigned long section_nr = pfn_to_section_nr(pfn);