On 20-06-22 09:16:20, David Hildenbrand wrote: > On 20.06.20 01:12, Ben Widawsky wrote: > > When working with very large nodes, poisoning the struct pages (for > > which there will be very many) can take a very long time. If the system > > is using voluntary preemptions, the software watchdog will not be able > > to detect forward progress. This patch addresses this issue by offering > > to give up time like __remove_pages() does. This behavior was > > introduced in v5.6 with: > > commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()") > > We try to stay <= 72 chars in the commit message (except for kernel splats). > Thanks. checkpatch does complain fwiw, I just somehow missed i. > > > > Alternately, init_page_poison could do this cond_resched(), but it seems > > to me that the caller of init_page_poison() is what actually knows > > whether or not it should relax its own priority. > > > > Based on Dan's notes, I think this is perfectly safe: > > commit f931ab479dd2 ("mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}") > > We shouldn't be holding any spin locks, so it's safe. (we could think > about doing this outside of the memory hotplug lock in the case of > devmem, however, it's only a debugging feature so we most probably don't > care). > > > > > Aside from fixing the lockup, it is also a friendlier thing to do on > > lower core systems that might wipe out large chunks of hotplug memory > > (probably not a very common case). > > It really only is an issue for devmem. Ordinary hotplugged system memory > is not affected (onlined/offlined in memory block granularity). Could you explain this a bit? I was fixing the issue found on PMEM systems, but it seems like regularly memory hotplug was potentially a victim. I think one of the reasons PMEM might be more likely is the time it takes to work with any data structures store in the PMEM itself is slower (just a guess). > > > > > Fixes this kind of splat: > > [ 352.142079] watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [daxctl:9922] > > [ 352.150067] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security rfkill ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad vfat fat kmem intel_rapl_msr intel_rapl_common rpcrdma sunrpc rdma_ucm ib_iser isst_if_common rdma_cm skx_edac iw_cm ib_cm x86_pkg_temp_thermal libiscsi intel_powerclamp scsi_transport_iscsi coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i40iw intel_cstate iTCO_wdt ib_uverbs iTCO_vendor_support device_dax ib_core joydev intel_uncore i2c_i801 lpc_ich ipmi_ssif ioatdma dca wmi ipmi_si ipmi_devintf ipmi_msghandler dax_pmem > > [ 352.150123] dax_pmem_core acpi_pad acpi_power_meter drm ip_tables xfs libcrc32c nd_pmem nd_btt i40e e1000e crc32c_intel nfit > > [ 352.260774] irq event stamp: 138450 > > [ 352.264692] hardirqs last enabled at (138449): [<ffffffffa1001f26>] trace_hardirqs_on_thunk+0x1a/0x1c > > [ 352.275134] hardirqs last disabled at (138450): [<ffffffffa1001f42>] trace_hardirqs_off_thunk+0x1a/0x1c > > [ 352.285662] softirqs last enabled at (138448): [<ffffffffa1e00347>] __do_softirq+0x347/0x456 > > [ 352.295233] softirqs last disabled at (138443): [<ffffffffa10c416d>] irq_exit+0x7d/0xb0 > > [ 352.304210] CPU: 46 PID: 9922 Comm: daxctl Not tainted 5.7.0-BEN-14238-g373c6049b336 #30 > > [ 352.313283] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYXCRB1.86B.0578.D07.1902280810 02/28/2019 > > [ 352.324308] RIP: 0010:memset_erms+0x9/0x10 > > [ 352.328905] Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01 > > [ 352.349953] RSP: 0018:ffffc90021b5fd50 EFLAGS: 00010202 ORIG_RAX: ffffffffffffff13 > > [ 352.358450] RAX: 00000000000000ff RBX: ffff88983ffd5000 RCX: 0000000065df0e40 > > [ 352.366457] RDX: 00000003a0000000 RSI: 00000000ffffffff RDI: ffffea039b20f1c0 > > [ 352.374465] RBP: ffff88983ffd6c00 R08: 0000000000000000 R09: ffffea0061000000 > > [ 352.382473] R10: 0000000000000001 R11: 0000000000000000 R12: ffffea006f808000 > > [ 352.390480] R13: 0000000001840000 R14: 000000000e800000 R15: ffff8997d7b764e0 > > [ 352.398487] FS: 00007f724ef81780(0000) GS:ffff8997df100000(0000) knlGS:0000000000000000 > > [ 352.407562] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 352.414011] CR2: 00007ffcd4070768 CR3: 000001178c722002 CR4: 00000000003606e0 > > [ 352.422056] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 352.430092] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > [ 352.438115] Call Trace: > > [ 352.440865] remove_pfn_range_from_zone+0x3a/0x380 > > [ 352.446244] ? cpumask_next+0x17/0x20 > > [ 352.450361] memunmap_pages+0x17f/0x280 > > [ 352.454670] release_nodes+0x22a/0x260 > > [ 352.458888] __device_release_driver+0x172/0x220 > > [ 352.464070] device_driver_detach+0x3e/0xa0 > > [ 352.468753] unbind_store+0x113/0x130 > > [ 352.472868] kernfs_fop_write+0xdc/0x1c0 > > [ 352.477273] vfs_write+0xde/0x1d0 > > [ 352.482218] ksys_write+0x58/0xd0 > > [ 352.487207] do_syscall_64+0x5a/0x120 > > [ 352.492529] entry_SYSCALL_64_after_hwframe+0x49/0xb3 > > [ 352.499446] RIP: 0033:0x7f724f40b5e7 > > [ 352.504673] Code: Bad RIP value. > > [ 352.509484] RSP: 002b:00007ffcd40738f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > > [ 352.519213] RAX: ffffffffffffffda RBX: 00007f724ef816a8 RCX: 00007f724f40b5e7 > > [ 352.528410] RDX: 0000000000000007 RSI: 00005617d7cd1277 RDI: 0000000000000003 > > [ 352.537573] RBP: 0000000000000003 R08: 00000000ffffffff R09: 00007ffcd40737d0 > > [ 352.546764] R10: 0000000000000000 R11: 0000000000000246 R12: 00005617d7cd1277 > > [ 352.555929] R13: 0000000000000000 R14: 0000000000000007 R15: 00005617d7cd1230 > > [ 370.353742] Built 2 zonelists, mobility grouping on. Total pages: 49050381 > > [ 370.373317] Policy zone: Normal > > [ 374.948164] Built 3 zonelists, mobility grouping on. Total pages: 49312525 > > [ 375.017496] Policy zone: Normal > > I'd have stripped this to a reasonable minimum. > > > > > Fixes: commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()") > > Reported-by: "Scargall, Steve" <steve.scargall@xxxxxxxxx> > > Reported-by: Ben Widawsky <ben.widawsky@xxxxxxxxx> > > Cc: Dan Williams <dan.j.williams@xxxxxxxxx> > > Cc: David Hildenbrand <david@xxxxxxxxxx> > > Cc: Vishal Verma <vishal.l.verma@xxxxxxxxx> > > Signed-off-by: Ben Widawsky <ben.widawsky@xxxxxxxxx> > > --- > > mm/memory_hotplug.c | 13 +++++++++++-- > > 1 file changed, 11 insertions(+), 2 deletions(-) > > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > > index 9b34e03e730a..da374cd3d45b 100644 > > --- a/mm/memory_hotplug.c > > +++ b/mm/memory_hotplug.c > > @@ -471,11 +471,20 @@ void __ref remove_pfn_range_from_zone(struct zone *zone, > > unsigned long start_pfn, > > unsigned long nr_pages) > > { > > + const unsigned long end_pfn = start_pfn + nr_pages; > > struct pglist_data *pgdat = zone->zone_pgdat; > > - unsigned long flags; > > + unsigned long pfn, cur_nr_pages, flags; > > > > /* Poison struct pages because they are now uninitialized again. */ > > - page_init_poison(pfn_to_page(start_pfn), sizeof(struct page) * nr_pages); > > + for (pfn = start_pfn; pfn < end_pfn; pfn += cur_nr_pages) { > > + cond_resched(); > > + > > + /* Select all remaining pages up to the next section boundary */ > > + cur_nr_pages = > > + min(end_pfn - pfn, SECTION_ALIGN_UP(pfn + 1) - pfn); > > Nit: I'd have used the same indentation as the code you copied this from. > > > + page_init_poison(pfn_to_page(pfn), > > + sizeof(struct page) * cur_nr_pages); > > + } > > > > #ifdef CONFIG_ZONE_DEVICE > > /* > > > > Thanks! > > Acked-by: David Hildenbrand <david@xxxxxxxxxx> > > -- > Thanks, > > David / dhildenb